Retro — converge loop vs deterministic perimeter¶
Session: 2026-05-04 → 2026-05-05.
Subject site: garvanbay.ie → master.garvanbay-fulldev.pages.dev.
Architecture pivot: moved from a full LLM-driven convergence loop to a layered deterministic verifier perimeter with LLM as last-resort.
What we built (in order)¶
lib/visual-critic.js— Gemini 2.5-pro, full-page screenshots of LIVE vs OURS, structured-JSON output (per-region scores + atomic-decomposed issue list with file paths).lib/visual-fix-agent.js— per-issue Claude sub-agent withread_file/edit_file/run_bash/finishtools, severity-tiered turn budgets (high=18, medium=12, low=8), parent-issue-aware system prompt.lib/converge.js— orchestrator: critic → fix-agent → re-assemble → re-build → wrangler deploy → re-critic, throwaway commits per iter for clean diffs, retry-on-malformed-JSON, plateau / agent-no-progress / max-iters stop conditions.lib/verify-content.js— Playwright walks LIVE and OURS, structural diff (heading text alignment + h/p/img/a/button counts per section).lib/verify-matcher.js— runs matchers against extracted body.html, asserts every DOM signal accounted for in matcher output (per-matcher gap report).lib/verify-translator.js— runs matchers + dedupe + variant-picker, asserts every signal survives the translator's prop mapping.
What the LLM loop actually achieved¶
- End-to-end: critic → agent → assemble → build → deploy → re-critic ran without human intervention. Architecturally proven.
- Score did not climb monotonically: 4 iters across
garvanbay.ieproduced ~50KB of agent edits per iter. Issue count drifted 17 → 16 → 17 → 14 while overall score stayed in the 2-4/10 band (Gemini omittedoverall3 of 4 times). - Regressions: agent removed the welcome H1 to "fix" a styling issue; introduced a broken-alt image; missed the
MANAGEMENT-ACCOUNTINGsection entirely. Each iter sometimes undid prior iter's work. - False-positive convergence: a critic call returned
issues: []while its own summary said "missing images, buttons, entire layout structures." Loop stopped at a clearly-wrong "0 open issues." - Cost: ~4 hours wall-time, multi-iteration Gemini + Claude API spend, 245KB cumulative agent edits, no clear forward progress for the user's eyeballs.
What the deterministic perimeter achieved¶
After pivoting, in ~30 minutes with zero LLM cost the verifiers surfaced and we fixed:
| Bug | Surfaced by | Fix size |
|---|---|---|
dom-pipeline regex eats ") of url(...) in CSS — corrupts split-layout bgMedia on every site |
verify-matcher | regex char class +2 chars |
About imgSeen.add(src) placed BEFORE the bgMedia exclusion — silently dropped 10 service-section images on garvanbay.ie |
verify-matcher | move 1 line |
About bgMedia regex required literal ) — couldn't recover from the truncated CSS |
verify-matcher | change regex to file-extension-anchored |
| About missed per-section CTA buttons (PAYROLL → "Payroll" link, etc.) | verify-content | 12-line add to extract() |
cta-wcp had no image prop, translator silently dropped backgroundImage |
verify-translator | 5-line component change + 3-line translator change |
garvanbay content-verify gap delta: 14 → 6 (4 of the remaining 6 are verify-script counting artifacts, not real content drops). Verify gates after fixes: matcher 0/12 gaps, translator 0/1 gap.
The architecture insight¶
The convergence loop assumed the matchers + translators + components produced close-to-correct output and the LLM only needed to nudge it. In practice the foundations had real bugs that no amount of LLM-driven CSS tweaking could fix — the 10 missing service-section images were a Phase-1 regex eating a ), not a styling issue. Gemini saw the symptom (missing images), the agent edited components (wrong layer), and the bug remained.
The deterministic perimeter inverts the contract:
extract → MATCHER VERIFY ────────────► matcher gaps fixed deterministically
↓
translate → TRANSLATOR VERIFY ► prop-flow gaps fixed deterministically
↓
emit + build → CONTENT VERIFY ──► rendering gaps (component-side)
↓
DESIGN VERIFY (Gemini, scoped) ─► colour / typography drift
↓
INTERACTION VERIFY (Playwright) ─► hover / click / lightbox
↓
deploy → FULL-PAGE QA (Gemini) ──► cross-region issues only
Each gate has a clear owner (matcher / translator / component / theme / interactions). When a gate fails, the agent's blast radius is small and the fix routes to one file. The LLM only sees the long tail that deterministic rules cannot catch.
Cost / leverage comparison¶
| Converge loop | Deterministic perimeter | |
|---|---|---|
| Time to find a class of bug | hours, multi-iter | seconds |
| Cost per run | ~$X Gemini + Claude (multi-iter) | $0 |
| Bug routing | "fix something in these N files" | precise file + line + rule |
| Generalizes across sites | low — patches are site-specific | high — same regex fix paid forward to ~50% of FCR portfolio |
| Risk of regressions | high — agent un-does prior fixes | low — verifier catches deviations from the spec |
| Determinism | low — Gemini variance per call | high — same input → same output |
What the LLM is good at¶
The retro isn't "no LLMs." Real strengths surfaced:
- Visual / design diffs: catching colour, typography, spacing drift that DOM counts cannot. (Place at the design gate, scoped to one region per call.)
- Long-tail content classification: "this section's heading is in lowercase but live has it in caps" — semantic patterns deterministic checks cannot encode.
- Final QA: cross-region hierarchy / flow issues after deterministic gates have green-lit each region.
The LLM's wrong job was being the FIRST line of defense for matcher-extraction bugs.
Portfolio-wide gap distribution (verifiers run across 20-site sample)¶
After applying the deterministic fixes from this session and re-running
verify-matcher.js + verify-translator.js across the original
tmp/phase3-sample.txt set:
Per-matcher gap totals across the 20 sites:
| Matcher | Gaps | Class of issue |
|---|---|---|
| (unmatched) | 35 | section enumerated but no matcher fired — silent content drop |
| ServiceGrid | 9 | per-item href / image extraction incomplete |
| Gallery | 5 | image grids not detected |
| Footer | 5 | column / address extraction |
| USPBar | 4 | pinned-strip recognition |
| TopBar | 4 | scalar vs array contract drift |
| LogoStrip | 2 | |
| About / Contact / Testimonials | 1 each |
garvanbay.ie is now 0/0 (matcher + translator both clean). Other 19
sites still have real bugs the verifiers will route accurately when
worked on.
The "unmatched" bucket dominates — ~17% of all extracted content silently dropped because no matcher recognises the pattern. Sample headings:
"Contact Us"sections without form inputs (Contact matcher requires<textarea>or email-type input)"PLUMBER NEAR ME"SEO blocks (no current matcher for SEO H1 strips)"Thinking of completing a heating controls upgrade?"CTA-shaped sections without heading (CTAStrip wants a heading)
The three highest-leverage portfolio fixes after garvanbay:
- Contact matcher — accept phone+email as a Contact signal (no form required) so "Contact Us" sections render on phone-only sites
- CTAStrip matcher — accept heading-less variants when text length is short and there's a visible button
- A new "ImageStrip" / "Decorative" matcher — many
(no heading)unmatched sections are decorative image rows that should still emit an image grid
Forward agenda¶
| Next deterministic gate | Owner caught | |
|---|---|---|
| 1 | (built) verify-matcher | Matchers — extraction completeness |
| 2 | (built) verify-translator | Translators — prop-flow |
| 3 | (built) verify-content | Components — rendering completeness |
| 4 | TODO verify-design.js |
Theme + component CSS — use Gemini, scoped per region |
| 5 | TODO verify-interactions.js |
Component JS — Playwright assertions |
Two real architectural items still open after this session:
- MANAGEMENT-ACCOUNTING dedupe collapse — investigate dedupeLayoutMap keeping the wrong sibling
- FAQ-blog absorption — Wix Blog feeds nest inside FAQ sections; needs section split or a BlogPosts matcher
Standing principles for follow-on work¶
- Build a verify gate before touching the LLM. If you can express "what should be true" deterministically, do.
- Each gate's failure must route to one owner. "Fix something somewhere" is the failure mode the LLM loop kept hitting.
- Prefer fixing the matcher / library component before chasing CSS. A foundation fix pays forward across the portfolio; a per-site CSS patch does not.
- Pre-decompose multi-file work into atomic single-file changes. The fix-agent stalls on coordinated multi-file edits; the critic prompt's
parent_iddecomposition pattern is the right shape. - Don't trust the LLM's "looks fine" signal. Trust deterministic gates' green/red.
Commits this session (in order, master)¶
Highlights:
- d5fbf17 matchers: emit Header cta as array, recovers 6/20 phase-3 sites
- 568a700 matchers/About: emit plain text, drop double-escape that broke fulldev path
- 89e5b60 dom-pipeline: download canonical raw URL, not first-seen variant
- 0c351c0 verify-{content,matcher}: deterministic gates that catch what Gemini missed
- 4e8114d verify-translator: deterministic gate over the matcher → translator → component flow
- 99104c6 matchers/About + translate: capture & render per-section CTA buttons
The verifiers are the load-bearing artifacts going forward. Run them before any agent run, before any deploy, on every site.