Phase 3 batch run — 2026-05-04¶
20-site discovery run for ADR-0003 Phase 3. Sample picked deterministically
(tmp/phase3-sample.json, seed 20260504, Wix + replatformReady=true,
excludes the 5 pilots). Runner: scripts/run-fulldev-batch.sh on EC2.
Pipeline: quick-crawl → dom-pipeline (--no-build --max-pages 1) → capture-layout → theme-extractor → assembler-fulldev (--slug index).
Top-line (after cta-as-array fix)¶
| Initial run | After fix | |
|---|---|---|
| Sites attempted | 20 | 20 |
| Pipeline reached assembler | 20/20 | 20/20 |
| Assembler succeeded | 14/20 | 20/20 |
| Theme phase succeeded | 9/20 | 17/20 |
| Hero matcher fired | 2/20 | 2/20 |
| Gallery matcher fired | 0/20 | 3/20 |
| Reviews matcher fired | 0/20 | 2/20 |
| TopBar matcher fired | 0/20 | 0/20 |
| Total wall time | ~22 min | ~25 min (re-run all 20) |
Single-bug → 6 site recovery¶
All 6 assembler crashes are the same TypeError at lib/assembler-fulldev/translate.js:228:
const links = (props.cta || []).map(c => ({ text: c.text, href: c.href }));
^
TypeError: (props.cta || []).map is not a function
Root cause: Header.js's extract() emits cta as a single {text,href}
object when the header has exactly one CTA (typically a mailto: link),
instead of a one-element array. The translator is written assuming an array.
The 6 affected sites all had a header with a single email CTA: askeatyres, blarneyoilsltd, mcguigansculptors, molansmotors, patriciaokeeffe, protechsecurity.
Fix options — pick one:
1. Header.js always emits cta as Array<{text,href}>. Cleaner per ADR-0003 (matcher contract is source of truth).
2. translateHeader accepts either shape: Array.isArray(props.cta) ? props.cta : props.cta ? [props.cta] : []. Defensive, no matcher change.
Option 1 is right; option 2 should also land as belt-and-braces. Same non-array-when-singleton pattern is likely lurking on other matchers.
Theme-extractor networkidle timeout — 11/20 sites¶
11 sites failed the theme phase with the same 30-31s networkidle timeout.
Already documented; see commit d6e8150 theme-extractor: revert to
networkidle wait. Notably, the assembler still succeeds without theme.json
(graceful fallback via theme = fs.existsSync(...) ? JSON.parse(...) : {}).
But the rendered site won't have brand colours/fonts.
Fixing theme reliability is in scope for ADR-0003 Phase 3. Options:
- Lower the wait condition from networkidle to domcontentloaded + a short fixed delay.
- Race networkidle against a 10-15s budget then accept whatever palette we have.
- Read the saved DOM (already in body.html) instead of re-fetching live.
The 30s timeout means we're spending ~5.5 min of the 22 min batch on doomed network waits — annoying.
Variant-picker findings¶
From the 14 sites where the assembler completed:
Hero matcher recall is poor — 2/14¶
Only garvanbay.ie and greenfelltreeservices.ie got a hero picked
(both hero-4). The other 12 either have no detectable hero or the
matcher mis-classifies the hero strip as content-1.
Looking at picked sequences: most sites jump straight from header-1 to
content-1 or features-* with no hero — implausible. Worth eyeballing
the body.html for one of these to see what shape the hero takes that
the matcher misses.
Likely candidates for missed-hero pattern:
- Hero with no slideshow / no full-bleed bg image (text-only with a side image — ADR-0003 already notes hero-1 is deferred for this case)
- Hero where the H1 lives outside the matcher-recognised container
TopBar matcher: 0/14¶
No site picked topbar-wcp. Wix sites with phone/email/social bars are
common. Either the TopBar isn't being matched, or its content is being
folded into header-1 (which absorbs all top-of-page content).
Gallery matcher: 0/14¶
No site picked gallery-wcp. For the verticals in this sample (plumbing,
trades, tyres, security) galleries may genuinely be rare — but
mcguigansculptors.ie (sculptors) and webb.ie (interior design) almost
certainly have galleries. webb only picked header-1, logos-2, features-1, footer-1 — likely missing its gallery.
Other matchers — looks healthy¶
header-1,footer-1: 13/14 (jkterrazzo.ie missed both — outlier worth a separate look)content-1: heavily used — possibly over-used (absorbing things hero/about/services should be matching)features-1,features-3: well representedcta-wcp: 5 sitesfaqs-1: 3 sites (assuredqualityplumbing has 2 — odd; same FAQ matched twice?)contact-1: 3 sitesmap-wcp: 3 sites — workslogos-2: 1 site (webb)
38 unmatched section instances across 17 sites¶
The unmatched warnings flow from sections the matchers don't recognise. Sample of unmatched section text (from logs): - "087 929 0006 086 274 7813 094 962 0216" — phone-strip section (TopBar? CTA?) - "McDonagh Funeral Directors are here for whatever you may need..." — About-shaped but not detected - "PATRICIA O'KEEFFE Patricia O'Keeffe is a Professional Energy Therapist..." — Hero-shaped (heading + bio) - "About Bio Energy Our physical body is permeated with bio-energy..." — About - "ALARM SYSTEMS We offer house alarm systems and commercial intruder alarms..." — service / About hybrid
Pattern: most unmatched sections look like About/Hero content with non-standard structure — text-heavy sections without the explicit container shape the matchers expect.
Recommended next moves (in priority order)¶
- Fix
Header.jscta-as-array contract — recovers 6 sites (14→20 success). - Triage the missing-hero pattern — the biggest gap in coverage. Look at body.html for 3-4 affected sites, identify the structural pattern Hero.js misses.
- Triage the TopBar/Gallery zero-fire — possibly being absorbed into Header/Features. Read structure-matcher.js dedupe logic.
- Fix theme-extractor
networkidletimeout — cuts 5+ min off batch wall time and gives 11 more sites correct theming. - Re-run the 20-site batch after fixes — validate the fix delta, look for new patterns now that the head of the funnel works.
- Then matcher signal additions per ADR-0003 (image dims, hero luminance, nav grouping) — once the recall problem is fixed.
Per-site results (ok=true highlighted)¶
| Host | OK | Theme | Picked components | Unmatched |
|---|---|---|---|---|
| airconditioningandrefrigeration.ie | ✓ | ok | header-1, content-1, features-1, footer-1 | 2 |
| amsecurity.ie | ✓ | ok | header-1, features-3, cta-wcp, content-1, footer-1 | 0 |
| askeatyres.ie | ✗ | fail | (cta crash) | 0 |
| assuredqualityplumbing.ie | ✓ | ok | header-1, features-1, content-1×3, cta-wcp, faqs-1×2, footer-1 | 4 |
| blarneyoilsltd.ie | ✗ | fail | (cta crash) | 1 |
| coachhiredublin.ie | ✓ | fail | header-1, contact-1, cta-wcp×2, content-1×2, features-3, features-1, footer-1 | 0 |
| drainunblockingrathmines.ie | ✓ | fail | header-1, content-1×6, footer-1 | 1 |
| europalletsolutions.ie | ✓ | fail | header-1, content-1×2, cta-wcp×2, features-1, footer-1 | 1 |
| garvanbay.ie | ✓ | fail | header-1, hero-4, content-1×6, cta-wcp, contact-1, footer-1 | 0 |
| greenfelltreeservices.ie | ✓ | fail | header-1, hero-4, content-1, features-3, faqs-1, footer-1 | 2 |
| jkterrazzo.ie | ✓ | ok | content-1×3, features-3, features-1, map-wcp | 6 |
| jrpianolessons.ie | ✓ | ok | header-1, content-1×4, map-wcp, contact-1, footer-1 | 3 |
| mcdonaghfuneraldirectors.ie | ✓ | ok | header-1, content-1×2, features-1×2, contact-1, footer-1 | 2 |
| mcguigansculptors.ie | ✗ | fail | (cta crash) | 2 |
| molansmotors.ie | ✗ | fail | (cta crash) | 5 |
| patriciaokeeffe.ie | ✗ | ok | (cta crash) | 3 |
| protechsecurity.ie | ✗ | ok | (cta crash) | 1 |
| rsplumbingandheating.ie | ✓ | fail | header-1, features-1, content-1×7, map-wcp, footer-1 | 3 |
| seanmurrayplumbingandheating.ie | ✓ | fail | header-1, cta-wcp, content-1×2, features-1×3, faqs-1, footer-1 | 1 |
| webb.ie | ✓ | ok | header-1, logos-2, features-1, footer-1 | 1 |
Artifacts¶
tmp/phase3-sample.txt/tmp/phase3-sample.json— site list- EC2:
~/replatform-dashboard/tmp/phase3-run/<host>/{stdout.log,stderr.log,status.json}— per-site logs - EC2:
~/replatform-dashboard/builds/<host>/assembled-fulldev/— assembler output (whenok=true) - Runner is resumable — re-running skips hosts whose
status.jsonrecordsok=true.