Investigation Plan: AWS Hosting + BizSite Replacement¶
Date: 2026-04-06 Author: Cathal Dempsey + Claude Status: Proposed — awaiting Phase 1 kickoff
Background¶
Current State¶
- Replatform pipeline: Scrapes Wix/Mono/WP sites → Claude builds Astro pages → deploys to Cloudflare Workers/Pages
- BizSite: Internal Node.js tool that pulls content from Yext/Saymore → renders JS templates to flat HTML → deploys to S3 bucket
- Hosting: Split across Cloudflare (replatform) + AWS (BizSite + API server)
Problems¶
- Two separate systems doing similar things (generate static sites from content)
- Hosting split across two providers — two failure points
- Cloudflare has a 100 project/account hard cap — need 18 accounts for 1800 sites
- BizSite is a separate codebase to maintain
Proposed End State¶
One unified pipeline that can source content from either scraping (Wix migrations) or Yext API (existing BizSite clients), builds Astro sites, deploys to S3+CloudFront. BizSite retired.
Scale¶
- 1800 sites total
- 40 staff
- 150 sites/month migration pace (12-month Wix contract cycle)
- Build cost: ~$0.30/site (Claude API)
Phase 1: Audit BizSite + Yext Integration¶
Goal: Understand what BizSite does so we know what to replicate.
- Get access to BizSite repo, read the Node.js codebase
- Map the Yext data model — what fields are pulled? (name, address, hours, services, photos, reviews, social links, etc.)
- Document the Yext/Saymore API endpoints and auth (API key? OAuth?)
- Catalogue the BizSite JS templates — how many, what page types?
- Identify which of the 1800 clients currently have Yext data vs which are Wix-only
- Understand the current S3 deploy — bucket structure, CloudFront setup, custom domains, SSL
Output: A mapping doc showing BizSite field → Yext field → equivalent in our content-map.json schema
Phase 2: Add Yext as a Content Source¶
Goal: The replatform pipeline currently only scrapes. Add Yext API as an alternative content source that produces the same content-map.json format.
- Build
lib/yext-source.js— pulls a client's Yext listing, transforms to our content-map schema - Map Yext fields to our node types:
| Yext Field | Content Map Node |
|---|---|
name |
heading |
address, phone |
text / contact section |
hours |
structured data / text |
description |
text nodes |
photos |
image nodes |
logo |
header image |
socialProfiles |
socialLinks array |
reviews |
reviews array |
services |
cards / list |
categories |
nav items |
- Add a
--source yext --yext-id <entity-id>flag tocrawl.jsso the pipeline can be triggered either way - Test: generate content-map.json from Yext for a client that currently has a BizSite, compare output
Output: Same content-map.json format whether content comes from scraping or Yext. Build agent doesn't need to know the difference.
Phase 3: Migrate Hosting to S3 + CloudFront¶
Goal: Replace Cloudflare Workers/Pages with AWS. Single account, no project cap.
Architecture¶
[Astro Build Output]
|
v
S3 Bucket: sites.fcrweb.ie
/waterfordcountypainters.ie/
index.html
assets/
farm-painting/index.html
...
/trimtech.ie/
index.html
...
|
v
CloudFront Distribution
- Custom domain per client (CNAME)
- ACM SSL cert (free)
- Origin: S3 bucket with path prefix
Tasks¶
- Audit BizSite's existing S3 bucket structure — can we reuse it or need a new setup?
- Design bucket layout: single bucket with
/{domain}/prefix per client - Set up CloudFront distribution with per-client custom domains
- SSL via ACM (free) — wildcard cert for
*.fcrweb.ie+ per-client custom domain certs - Build
lib/deploy-s3.jsto replacewrangler pages deploy: - DNS: clients CNAME their domain to CloudFront distribution
- Test: deploy one Astro build to S3+CloudFront, verify it serves correctly
Output: node deploy.js waterfordcountypainters.ie deploys to S3+CloudFront instead of Cloudflare
Phase 4: Unified Pipeline¶
Goal: One command, two content sources, one hosting target.
CLI Interface¶
# Wix migration (scrape + build + deploy)
node orchestrate.js --url https://www.example.ie --deploy s3
# Yext client (API + build + deploy)
node orchestrate.js --source yext --yext-id 12345 --deploy s3
# BizSite replacement (bulk migrate existing BizSite clients)
node migrate-bizsite.js --all
Tasks¶
- Update
orchestrate.jsto support--source yextflag - Build
migrate-bizsite.js— iterates existing BizSite clients, pulls from Yext, builds Astro, deploys to same S3 bucket (in-place replacement) - Update dashboard to show content source (scraped vs Yext) per project
- QA: side-by-side compare BizSite output vs Astro output for 5 clients
Phase 5: Retire BizSite¶
- Migrate all BizSite clients to Astro builds (batched over weeks)
- Verify no regressions — QA agent compares old BizSite vs new Astro for each client
- Decommission BizSite codebase
- Remove Cloudflare account(s) if no longer needed
- Update staff training/docs for new pipeline
Cost Analysis¶
Hosting¶
| Cloudflare (current) | AWS S3+CloudFront (proposed) | |
|---|---|---|
| 1800 sites | $0 but 18 accounts | ~$200-400/mo |
| Per-site cost | $0 | ~$0.15/mo |
| Project cap | 100/account | Unlimited |
| Accounts needed | 18 | 1 |
| Static bandwidth | Unlimited | ~$0.085/GB (first 10TB) |
| SSL | Free, auto | Free via ACM |
| Admin overhead | High (18 accounts) | Low (1 account) |
Build Pipeline¶
| Item | Cost |
|---|---|
| Claude API (builds) | ~$0.30/site × 150/month = ~$45/mo |
| EC2 (API + scraper) | Already running |
| Yext API | Already paying |
Total at Scale¶
| Item | Monthly |
|---|---|
| AWS hosting (1800 sites) | ~$300 |
| Claude API (150 builds/mo) | ~$45 |
| EC2 | Existing |
| Total | ~$345/mo |
vs maintaining two systems (BizSite + replatform) across two providers
Key Questions to Resolve¶
- How many clients are on Yext today? Determines size of Phase 2 effort
- Is the BizSite S3 bucket + CloudFront setup reusable? Could skip most of Phase 3
- Does Yext have rate limits? Matters for bulk migration
- Do BizSite clients have custom domains on CloudFront already? If yes, migration is just swapping HTML files — zero DNS changes
- Can we get read access to the BizSite repo? Needed for Phase 1
- Are there BizSite features beyond content rendering? (analytics, tracking, integrations we'd need to replicate)
Timeline Estimate¶
| Phase | Duration | Dependencies |
|---|---|---|
| Phase 1: Audit | 1 week | BizSite repo access |
| Phase 2: Yext source | 1-2 weeks | Phase 1 complete |
| Phase 3: S3+CF hosting | 1 week | Can run parallel to Phase 2 |
| Phase 4: Unified pipeline | 1 week | Phases 2+3 complete |
| Phase 5: BizSite retirement | 4-8 weeks | Phased migration |