Skip to content

Phase 3 batch run — 2026-05-04

20-site discovery run for ADR-0003 Phase 3. Sample picked deterministically (tmp/phase3-sample.json, seed 20260504, Wix + replatformReady=true, excludes the 5 pilots). Runner: scripts/run-fulldev-batch.sh on EC2. Pipeline: quick-crawl → dom-pipeline (--no-build --max-pages 1) → capture-layout → theme-extractor → assembler-fulldev (--slug index).

Top-line (after cta-as-array fix)

Initial run After fix
Sites attempted 20 20
Pipeline reached assembler 20/20 20/20
Assembler succeeded 14/20 20/20
Theme phase succeeded 9/20 17/20
Hero matcher fired 2/20 2/20
Gallery matcher fired 0/20 3/20
Reviews matcher fired 0/20 2/20
TopBar matcher fired 0/20 0/20
Total wall time ~22 min ~25 min (re-run all 20)

Single-bug → 6 site recovery

All 6 assembler crashes are the same TypeError at lib/assembler-fulldev/translate.js:228:

const links = (props.cta || []).map(c => ({ text: c.text, href: c.href }));
                                  ^
TypeError: (props.cta || []).map is not a function

Root cause: Header.js's extract() emits cta as a single {text,href} object when the header has exactly one CTA (typically a mailto: link), instead of a one-element array. The translator is written assuming an array.

The 6 affected sites all had a header with a single email CTA: askeatyres, blarneyoilsltd, mcguigansculptors, molansmotors, patriciaokeeffe, protechsecurity.

Fix options — pick one: 1. Header.js always emits cta as Array<{text,href}>. Cleaner per ADR-0003 (matcher contract is source of truth). 2. translateHeader accepts either shape: Array.isArray(props.cta) ? props.cta : props.cta ? [props.cta] : []. Defensive, no matcher change.

Option 1 is right; option 2 should also land as belt-and-braces. Same non-array-when-singleton pattern is likely lurking on other matchers.

Theme-extractor networkidle timeout — 11/20 sites

11 sites failed the theme phase with the same 30-31s networkidle timeout. Already documented; see commit d6e8150 theme-extractor: revert to networkidle wait. Notably, the assembler still succeeds without theme.json (graceful fallback via theme = fs.existsSync(...) ? JSON.parse(...) : {}). But the rendered site won't have brand colours/fonts.

Fixing theme reliability is in scope for ADR-0003 Phase 3. Options: - Lower the wait condition from networkidle to domcontentloaded + a short fixed delay. - Race networkidle against a 10-15s budget then accept whatever palette we have. - Read the saved DOM (already in body.html) instead of re-fetching live.

The 30s timeout means we're spending ~5.5 min of the 22 min batch on doomed network waits — annoying.

Variant-picker findings

From the 14 sites where the assembler completed:

Hero matcher recall is poor — 2/14

Only garvanbay.ie and greenfelltreeservices.ie got a hero picked (both hero-4). The other 12 either have no detectable hero or the matcher mis-classifies the hero strip as content-1.

Looking at picked sequences: most sites jump straight from header-1 to content-1 or features-* with no hero — implausible. Worth eyeballing the body.html for one of these to see what shape the hero takes that the matcher misses.

Likely candidates for missed-hero pattern: - Hero with no slideshow / no full-bleed bg image (text-only with a side image — ADR-0003 already notes hero-1 is deferred for this case) - Hero where the H1 lives outside the matcher-recognised container

TopBar matcher: 0/14

No site picked topbar-wcp. Wix sites with phone/email/social bars are common. Either the TopBar isn't being matched, or its content is being folded into header-1 (which absorbs all top-of-page content).

No site picked gallery-wcp. For the verticals in this sample (plumbing, trades, tyres, security) galleries may genuinely be rare — but mcguigansculptors.ie (sculptors) and webb.ie (interior design) almost certainly have galleries. webb only picked header-1, logos-2, features-1, footer-1 — likely missing its gallery.

Other matchers — looks healthy

  • header-1, footer-1: 13/14 (jkterrazzo.ie missed both — outlier worth a separate look)
  • content-1: heavily used — possibly over-used (absorbing things hero/about/services should be matching)
  • features-1, features-3: well represented
  • cta-wcp: 5 sites
  • faqs-1: 3 sites (assuredqualityplumbing has 2 — odd; same FAQ matched twice?)
  • contact-1: 3 sites
  • map-wcp: 3 sites — works
  • logos-2: 1 site (webb)

38 unmatched section instances across 17 sites

The unmatched warnings flow from sections the matchers don't recognise. Sample of unmatched section text (from logs): - "087 929 0006 086 274 7813 094 962 0216" — phone-strip section (TopBar? CTA?) - "McDonagh Funeral Directors are here for whatever you may need..." — About-shaped but not detected - "PATRICIA O'KEEFFE Patricia O'Keeffe is a Professional Energy Therapist..." — Hero-shaped (heading + bio) - "About Bio Energy Our physical body is permeated with bio-energy..." — About - "ALARM SYSTEMS We offer house alarm systems and commercial intruder alarms..." — service / About hybrid

Pattern: most unmatched sections look like About/Hero content with non-standard structure — text-heavy sections without the explicit container shape the matchers expect.

  1. Fix Header.js cta-as-array contract — recovers 6 sites (14→20 success).
  2. Triage the missing-hero pattern — the biggest gap in coverage. Look at body.html for 3-4 affected sites, identify the structural pattern Hero.js misses.
  3. Triage the TopBar/Gallery zero-fire — possibly being absorbed into Header/Features. Read structure-matcher.js dedupe logic.
  4. Fix theme-extractor networkidle timeout — cuts 5+ min off batch wall time and gives 11 more sites correct theming.
  5. Re-run the 20-site batch after fixes — validate the fix delta, look for new patterns now that the head of the funnel works.
  6. Then matcher signal additions per ADR-0003 (image dims, hero luminance, nav grouping) — once the recall problem is fixed.

Per-site results (ok=true highlighted)

Host OK Theme Picked components Unmatched
airconditioningandrefrigeration.ie ok header-1, content-1, features-1, footer-1 2
amsecurity.ie ok header-1, features-3, cta-wcp, content-1, footer-1 0
askeatyres.ie fail (cta crash) 0
assuredqualityplumbing.ie ok header-1, features-1, content-1×3, cta-wcp, faqs-1×2, footer-1 4
blarneyoilsltd.ie fail (cta crash) 1
coachhiredublin.ie fail header-1, contact-1, cta-wcp×2, content-1×2, features-3, features-1, footer-1 0
drainunblockingrathmines.ie fail header-1, content-1×6, footer-1 1
europalletsolutions.ie fail header-1, content-1×2, cta-wcp×2, features-1, footer-1 1
garvanbay.ie fail header-1, hero-4, content-1×6, cta-wcp, contact-1, footer-1 0
greenfelltreeservices.ie fail header-1, hero-4, content-1, features-3, faqs-1, footer-1 2
jkterrazzo.ie ok content-1×3, features-3, features-1, map-wcp 6
jrpianolessons.ie ok header-1, content-1×4, map-wcp, contact-1, footer-1 3
mcdonaghfuneraldirectors.ie ok header-1, content-1×2, features-1×2, contact-1, footer-1 2
mcguigansculptors.ie fail (cta crash) 2
molansmotors.ie fail (cta crash) 5
patriciaokeeffe.ie ok (cta crash) 3
protechsecurity.ie ok (cta crash) 1
rsplumbingandheating.ie fail header-1, features-1, content-1×7, map-wcp, footer-1 3
seanmurrayplumbingandheating.ie fail header-1, cta-wcp, content-1×2, features-1×3, faqs-1, footer-1 1
webb.ie ok header-1, logos-2, features-1, footer-1 1

Artifacts

  • tmp/phase3-sample.txt / tmp/phase3-sample.json — site list
  • EC2: ~/replatform-dashboard/tmp/phase3-run/<host>/{stdout.log,stderr.log,status.json} — per-site logs
  • EC2: ~/replatform-dashboard/builds/<host>/assembled-fulldev/ — assembler output (when ok=true)
  • Runner is resumable — re-running skips hosts whose status.json records ok=true.