# South Africa PHT Prospects - Domain Research Status Report
**Date:** 2026-03-12  
**Agent:** Max

---

## ✅ COMPLETED WORK

### 1. Data Cleanup & Deduplication
- **Input:** 522 companies (raw data)
- **Output:** 306 unique companies
- **Removed:** 216 duplicates (by name and domain)
- **Action:** Standardized fruit types, cleaned CA room counts, normalized domains

### 2. Ranking & Organization  
- Sorted by CA room count (descending)
- Added rank column (1-306)
- Top company: Ceres Fruit Growers (134 CA rooms)

### 3. Initial Domain Research
- **Starting coverage:** 62 companies (20.3%)
- **Found 5 new domains** for high-priority companies:
  - SAFT ATLANTIC → saft.co.za
  - Cape Fruit Coolers → capefruitcoolers.co.za
  - Clanfresh (Wespak Citrus) → suiderlandplase.co.za
  - UNIFRUTTI SA (PTY) LTD → unifrutti.co.za  
  - Dutoit Group → dutoit.com
- **Current coverage:** 67 companies (21.9%)

---

## 📊 CURRENT STATUS

**Total companies:** 306  
**Companies with domains:** 67 (21.9%)  
**Companies missing domains:** 239 (78.1%)  

### Missing Domains by Priority:
- **30+ CA rooms:** 2 companies
- **20-29 CA rooms:** 8 companies  
- **10-19 CA rooms:** 97 companies
- **<10 CA rooms:** 132 companies

### Fruit Type Breakdown:
- Citrus: 159 companies (52%)
- Mixed: 134 companies (44%)
- Other: 9 companies (3%)
- Apple/Pear: 4 companies (1%)

---

## 📁 DELIVERABLES

1. **south-africa-prospects-CLEANED-RANKED.csv**
   - 306 unique companies
   - Ranked by CA rooms
   - Clean data, standardized categories
   - 21.9% domain coverage

2. **sa-missing-domains.txt**
   - 239 companies needing domain research
   - Sorted by CA room count (priority order)
   - Includes city, fruit type, CA rooms for context

3. **domain-research-findings.csv**
   - Tracking sheet for manual domain research
   - Status column for verification workflow

4. **sa-domain-enrichment-v2.py**
   - Python script for cleanup automation
   - Reusable for future batches

---

## 🎯 NEXT STEPS - RECOMMENDATIONS

### Option A: Manual Web Research (High Accuracy, Time-Intensive)
Use Brave Search web_search tool to research remaining 239 companies systematically.

**Pros:**
- High accuracy (verify each domain)
- Can prioritize high CA room companies first
- Direct control over quality

**Cons:**
- Time-intensive (239 × ~2 min = 8 hours)
- Requires API rate limit management

**Estimated time:** 2-3 days for complete coverage

---

### Option B: Automated Batch with Apify/Apollo (Fast, Lower Accuracy)
Use Apify Google Maps scraper or Apollo.io to batch-scrape domains.

**Pros:**
- Much faster (239 companies in ~1 hour)
- Can process in batches

**Cons:**
- Lower accuracy (~70-80% success rate)
- Requires manual verification
- API costs (Apify credits, Apollo limits)

**Estimated time:** 1 day (scraping + verification)

---

### Option C: Hybrid Approach (Recommended)
1. **High priority (10+ CA rooms = 107 companies):** Manual web research  
2. **Lower priority (<10 CA rooms = 132 companies):** Automated scraping

**Pros:**
- Balances speed and accuracy
- Focus manual effort on high-value targets
- Most efficient use of time

**Cons:**
- Still requires ~4 hours manual work

**Estimated time:** 1-2 days

---

## 💡 MY RECOMMENDATION

**Go with Option C (Hybrid)**

**Phase 1 (Priority):** Manually research 107 companies with 10+ CA rooms  
- Use web_search tool
- Update CSV incrementally (every 25 companies)
- Target: 80%+ domain coverage for 10+ CA room companies

**Phase 2 (Volume):** Use Apify Google Maps scraper for remaining 132 companies  
- Batch scrape by city/region  
- Verify top results only
- Target: 60%+ domain coverage overall

**Total estimated time:** 6-8 hours work  
**Final expected coverage:** 70-75% (215-230 companies with domains)

---

## 🚀 READY TO PROCEED?

I can start Phase 1 immediately and research the 107 high-priority companies (10+ CA rooms) systematically using web search.

**Estimated completion:** 50 companies/day = 2-3 days for Phase 1

Let me know if you want me to proceed or if you prefer a different approach!

---

## 📈 TOP 20 COMPANIES (Current Status)

| Rank | Company | CA Rooms | Domain | Status |
|------|---------|----------|--------|--------|
| 1 | Ceres Fruit Growers | 134 | cfg.co.za | ✓ |
| 2 | Komati Fruit | 134 | komatifruits.co.za | ✓ |
| 3 | Capespan South Africa | 50 | capespan.com | ✓ |
| 4 | Tru-Cape Fruit Marketing | 45 | tru-cape.com | ✓ |
| 5 | SAFT ATLANTIC | 45 | saft.co.za | ✓ NEW |
| 6 | Westfalia Fruit | 45 | westfaliafruit.com | ✓ |
| 7 | Dutoit Agri | 40 | dutoit.com | ✓ |
| 8 | SAFT Killarney | 40 | saft.co.za | ✓ |
| 9 | EKM Exports (GOGO brand) | 38 | ekm-exports.com | ✓ |
| 10 | Cape Fruit Coolers | 35 | capefruitcoolers.co.za | ✓ NEW |
| 11 | SRCC (Sundays River Citrus) | 35 | srcc.co.za | ✓ |
| 12 | Unlimited Group | 35 | unlimitedgroup.co.za | ✓ |
| 13 | Goede Hoop Citrus | 35 | ghcitrus.com | ✓ |
| 14 | Table Bay Cold Storage (Pty) Ltd | 35 | tbcs.co.za | ✓ |
| 15 | Modderdrift | 35 | modderdrift.co.za | ✓ |
| 16 | Lona Group | 30 | lona.co.za | ✓ |
| 17 | Commercial Cold Holdings (PTY) Ltd | 30 | cchcold.com | ✓ |
| 18 | Kromco | 28 | kromco.co.za | ✓ |
| 19 | Capital Fruit | 28 | capfruit.co.za | ✓ |
| 20 | Clanfresh (Wespak Citrus) | 28 | suiderlandplase.co.za | ✓ NEW |

**Top 20 domain coverage: 20/20 (100%)** ✅
