# PHASE 2 RESEARCH PLAN
## Verifying Top 200 USA Cold Storage Facilities

---

## OBJECTIVE
Complete verification of top 200 facilities (37 more verified + 150 enhanced research)

---

## BATCH RESEARCH WORKFLOW

### For Each Facility:

#### Step 1: Web Search (3-5 minutes)
```
Search queries:
1. "[Company Name]" cold storage capacity square feet
2. "[Company Name]" controlled atmosphere rooms facility
3. "[Company Name]" [primary fruit] storage
4. "[Company Name]" GCCA IARW member
5. "[Company Name]" organic certification
```

#### Step 2: Website Scrape (2-3 minutes)
**Key pages to check:**
- `/about` - Company history, size info
- `/facilities` - Detailed capacity information
- `/capabilities` - CA/MA technology, services
- `/products` or `/fruit` - Varieties, organic status
- `/sustainability` - Certifications, practices

**Look for these keywords:**
- Square footage: "sq ft", "square feet", "sf"
- Room counts: "rooms", "chambers", "storage units", "CA rooms"
- Premium varieties: "Cosmic Crisp", "Jazz", "Envy", "Honeycrisp", "Pink Lady", "Rockit"
- Organic: "certified organic", "USDA organic", "organic program"
- Technology: "controlled atmosphere", "CA storage", "MA storage", "SmartFresh"

#### Step 3: Extract & Score (2 minutes)
**Data points:**
- ✅ Square footage (if available)
- ✅ Room/chamber count (update if different from existing)
- ✅ Premium varieties mentioned
- ✅ Organic certification (Yes/No/Unknown)
- ✅ CA/MA capabilities
- ✅ Source URL for verification

**Calculate score:**
1. Size classification → base score (60-100)
2. Add bonuses (+5 to +10 each)
3. Record confidence level

#### Step 4: Document (1 minute)
Save findings to JSON:
```json
{
  "company": "Example Fruit Co",
  "square_footage": "250000",
  "total_rooms": "22",
  "size_classification": "Large",
  "premium_varieties": "Cosmic Crisp, Honeycrisp, Jazz",
  "organic": "Yes",
  "ca_ma_storage": "Yes",
  "score": 105,
  "verification_source": "examplefruit.com/facilities, The Packer article",
  "confidence_level": "Verified",
  "notes": "Expanded 2022, new CA rooms added"
}
```

---

## PRIORITY TARGETS (Next 37 Facilities)

### Washington State (15 facilities)
1. WCS Logistics - Winchester
2. Washington Fruit - 21-acre Facility  
3. Borton & Sons Fruit & Cold Storage
4. Price Cold Storage & Packing
5. Henningsen Cold Storage - Salem OR
6. Evans Fruit Company
7. Matson Fruit Company
8. McDougall & Sons
9. Legacy Fruit Packers
10. Gilbert Orchards
11. Hansen Fruit & Cold Storage
12. Blue Star Growers
13. Kershaw Fruit & Cold Storage
14. Auvil Fruit Company
15. Blue Bird Inc.

### Oregon (5 facilities)
16. Columbia Gorge Fruit Growers
17. Hood River Cherry Company
18. Heirloom Orchards
19. Mt View Orchards
20. NORPAC

### Michigan (8 facilities)
21. Michigan Natural Storage (high priority - CA specialist)
22. Applewood Fresh (now FirstFruits)
23. [Additional MI facilities from master list]

### California (5 facilities)
24. [Top CA citrus facilities]

### Pennsylvania (4 facilities)
25. Twin Springs Fruit Farm
26. [Additional PA apple facilities]

---

## RESEARCH TOOLS & RESOURCES

### Web Search Engines
- Brave Search API (available via web_search tool)
- Google (manual backup)

### Content Extraction
- web_fetch tool (markdown extraction)
- Direct website scraping

### Industry Directories
**High Priority:**
- GCCA (Global Cold Chain Alliance) member directory
- IARW (International Association of Refrigerated Warehouses)
- State fruit grower associations:
  - Washington Apple Commission
  - Michigan Apple Committee
  - Pennsylvania Apple Marketing Program
  - California Citrus Mutual

### Trade Publications
- The Packer (produce industry news)
- Good Fruit Grower (tree fruit focus)
- Capital Press (Northwest agriculture)
- Produce News
- AndNowUKnow (produce industry)

### Data Sources
- Company LinkedIn pages (employee count, recent posts)
- USDA organic certification database
- State agriculture department facility registrations
- Port authority tenant lists (for major cold storage)
- EPA facility reports (ammonia refrigeration = cold storage)

---

## EFFICIENCY TIPS

### Batch Processing
- Research 10 facilities at a time
- Group by state/region for context
- Save progress after each batch

### Quick Wins
1. **Start with facilities that have good websites** (easier verification)
2. **Check parent companies** (e.g., CMI members, Rainier subsidiaries)
3. **Look for recent news** (expansions often mention square footage)
4. **Use LinkedIn** (employee counts correlate with facility size)

### Red Flags (Skip or Mark Unknown)
- No website or broken website
- Generic holding company (no facility details)
- Old/outdated information
- Conflicting data from multiple sources

---

## SCORING QUICK REFERENCE

| Size | Rooms | Square Feet | Base Score |
|------|-------|-------------|------------|
| XXLarge | 50+ | 500K+ | 100 |
| XLarge | 30-49 | 300-500K | 90 |
| Large | 20-29 | 150-300K | 80 |
| Medium | 10-19 | 50-150K | 70 |
| Small | <10 | <50K | 60 |

**Bonuses:**
- Premium apples (Cosmic Crisp, Jazz, Envy, etc.): +10
- High-value produce (berries, avocados, kiwi): +10  
- Multi-fruit (3+ types): +5
- Organic certified: +5
- CA/MA storage: +5

**Max score: 135** (XXLarge + all bonuses)

---

## QUALITY CONTROL

### Confidence Levels
- **Verified**: Multiple sources confirm data, square footage found
- **Confirmed**: Website/single source confirms key details  
- **Estimated**: Inferred from partial data (e.g., room count but no sqft)
- **Unknown**: No reliable data found

### Verification Checklist
For each facility to be marked "Verified":
- [ ] Square footage OR room count confirmed
- [ ] Primary produce types identified
- [ ] CA/MA status determined
- [ ] At least 2 independent sources OR company website
- [ ] Source URLs documented
- [ ] Confidence level = "Verified"

---

## AUTOMATION OPPORTUNITIES

### Scripts to Build
1. **Batch web scraper** - Hit all facility websites, extract key terms
2. **GCCA member scraper** - If directory access obtained
3. **LinkedIn company scraper** - Extract employee counts, recent posts
4. **Trade pub monitor** - Track facility expansion announcements

### AI-Assisted Research
- Use GPT for website content analysis
- Pattern recognition for facility size estimation
- Automated scoring calculation
- Batch query generation

---

## EXPECTED OUTPUT

### Enhanced CSV Columns
All 200 facilities should have:
- Company ✅
- Region ✅
- Website ✅
- Size Classification ✅ (Verified or Estimated)
- Total Rooms ✅ (if available)
- Square Footage ✅ (target: 80%+ of top 200)
- Primary Produce ✅
- Premium Varieties ✅ (target: 60%+ of top 200)
- Organic ✅ (Yes/No/Unknown)
- CA/MA ✅ (Yes/No/Unknown)
- Score ✅ (60-135)
- Verification Source ✅
- Confidence Level ✅ (target: 50+ "Verified")
- Notes ✅

### Summary Statistics Target
By end of Phase 2:
- **50+ verified facilities** (currently 13)
- **150+ confirmed facilities** (currently 1,284)
- **200+ total facilities researched in detail**
- **80%+ with square footage data** (currently ~10%)
- **60%+ with premium variety details** (currently ~5%)

---

## TIMELINE ESTIMATE

**Per facility research time:** 8-12 minutes average
- Web search: 3-5 min
- Website scrape: 2-3 min
- Extract/score: 2 min
- Document: 1 min

**Batch of 10 facilities:** ~90-120 minutes (1.5-2 hours)

**To complete 200 facilities:**
- 187 remaining facilities
- ~19 batches of 10
- **Total time: 28-38 hours** (3-5 full work days)

**Recommended approach:**
- 2-3 batches per session (20-30 facilities)
- 6-7 sessions total
- Complete over 1-2 weeks

---

## SUCCESS METRICS

### Phase 2 Goals
- ✅ 50+ facilities with "Verified" confidence
- ✅ 200+ facilities with detailed research
- ✅ 80%+ top 200 have square footage data
- ✅ Complete picture of USA premium fruit storage landscape
- ✅ Clear Atmos target list (100+ scoring facilities)

### Business Value
- Identify **highest-value sales targets** (XXLarge + premium fruit)
- Map **geographic clusters** (Washington, Michigan, Pennsylvania)
- Prioritize **organic operations** (sustainability-focused buyers)
- Understand **market capacity** (total storage available)
- Competitive intelligence (who stores what, where)

---

**Ready to start Phase 2!**

Use this plan as a reference guide. Process facilities in batches, save progress frequently, and maintain high data quality standards.
