Skip to content

Retro — converge loop vs deterministic perimeter

Session: 2026-05-04 → 2026-05-05. Subject site: garvanbay.iemaster.garvanbay-fulldev.pages.dev. Architecture pivot: moved from a full LLM-driven convergence loop to a layered deterministic verifier perimeter with LLM as last-resort.

What we built (in order)

  1. lib/visual-critic.js — Gemini 2.5-pro, full-page screenshots of LIVE vs OURS, structured-JSON output (per-region scores + atomic-decomposed issue list with file paths).
  2. lib/visual-fix-agent.js — per-issue Claude sub-agent with read_file/edit_file/run_bash/finish tools, severity-tiered turn budgets (high=18, medium=12, low=8), parent-issue-aware system prompt.
  3. lib/converge.js — orchestrator: critic → fix-agent → re-assemble → re-build → wrangler deploy → re-critic, throwaway commits per iter for clean diffs, retry-on-malformed-JSON, plateau / agent-no-progress / max-iters stop conditions.
  4. lib/verify-content.js — Playwright walks LIVE and OURS, structural diff (heading text alignment + h/p/img/a/button counts per section).
  5. lib/verify-matcher.js — runs matchers against extracted body.html, asserts every DOM signal accounted for in matcher output (per-matcher gap report).
  6. lib/verify-translator.js — runs matchers + dedupe + variant-picker, asserts every signal survives the translator's prop mapping.

What the LLM loop actually achieved

  • End-to-end: critic → agent → assemble → build → deploy → re-critic ran without human intervention. Architecturally proven.
  • Score did not climb monotonically: 4 iters across garvanbay.ie produced ~50KB of agent edits per iter. Issue count drifted 17 → 16 → 17 → 14 while overall score stayed in the 2-4/10 band (Gemini omitted overall 3 of 4 times).
  • Regressions: agent removed the welcome H1 to "fix" a styling issue; introduced a broken-alt image; missed the MANAGEMENT-ACCOUNTING section entirely. Each iter sometimes undid prior iter's work.
  • False-positive convergence: a critic call returned issues: [] while its own summary said "missing images, buttons, entire layout structures." Loop stopped at a clearly-wrong "0 open issues."
  • Cost: ~4 hours wall-time, multi-iteration Gemini + Claude API spend, 245KB cumulative agent edits, no clear forward progress for the user's eyeballs.

What the deterministic perimeter achieved

After pivoting, in ~30 minutes with zero LLM cost the verifiers surfaced and we fixed:

Bug Surfaced by Fix size
dom-pipeline regex eats ") of url(...) in CSS — corrupts split-layout bgMedia on every site verify-matcher regex char class +2 chars
About imgSeen.add(src) placed BEFORE the bgMedia exclusion — silently dropped 10 service-section images on garvanbay.ie verify-matcher move 1 line
About bgMedia regex required literal ) — couldn't recover from the truncated CSS verify-matcher change regex to file-extension-anchored
About missed per-section CTA buttons (PAYROLL → "Payroll" link, etc.) verify-content 12-line add to extract()
cta-wcp had no image prop, translator silently dropped backgroundImage verify-translator 5-line component change + 3-line translator change

garvanbay content-verify gap delta: 14 → 6 (4 of the remaining 6 are verify-script counting artifacts, not real content drops). Verify gates after fixes: matcher 0/12 gaps, translator 0/1 gap.

The architecture insight

The convergence loop assumed the matchers + translators + components produced close-to-correct output and the LLM only needed to nudge it. In practice the foundations had real bugs that no amount of LLM-driven CSS tweaking could fix — the 10 missing service-section images were a Phase-1 regex eating a ), not a styling issue. Gemini saw the symptom (missing images), the agent edited components (wrong layer), and the bug remained.

The deterministic perimeter inverts the contract:

extract → MATCHER VERIFY ────────────►  matcher gaps fixed deterministically
        translate → TRANSLATOR VERIFY ►  prop-flow gaps fixed deterministically
        emit + build → CONTENT VERIFY ──►  rendering gaps (component-side)
        DESIGN VERIFY (Gemini, scoped) ─►  colour / typography drift
        INTERACTION VERIFY (Playwright) ─►  hover / click / lightbox
        deploy → FULL-PAGE QA (Gemini) ──►  cross-region issues only

Each gate has a clear owner (matcher / translator / component / theme / interactions). When a gate fails, the agent's blast radius is small and the fix routes to one file. The LLM only sees the long tail that deterministic rules cannot catch.

Cost / leverage comparison

Converge loop Deterministic perimeter
Time to find a class of bug hours, multi-iter seconds
Cost per run ~$X Gemini + Claude (multi-iter) $0
Bug routing "fix something in these N files" precise file + line + rule
Generalizes across sites low — patches are site-specific high — same regex fix paid forward to ~50% of FCR portfolio
Risk of regressions high — agent un-does prior fixes low — verifier catches deviations from the spec
Determinism low — Gemini variance per call high — same input → same output

What the LLM is good at

The retro isn't "no LLMs." Real strengths surfaced:

  • Visual / design diffs: catching colour, typography, spacing drift that DOM counts cannot. (Place at the design gate, scoped to one region per call.)
  • Long-tail content classification: "this section's heading is in lowercase but live has it in caps" — semantic patterns deterministic checks cannot encode.
  • Final QA: cross-region hierarchy / flow issues after deterministic gates have green-lit each region.

The LLM's wrong job was being the FIRST line of defense for matcher-extraction bugs.

Portfolio-wide gap distribution (verifiers run across 20-site sample)

After applying the deterministic fixes from this session and re-running verify-matcher.js + verify-translator.js across the original tmp/phase3-sample.txt set:

TOTAL: 20 sites, matcher=67 gaps, translator=24 gaps

Per-matcher gap totals across the 20 sites:

Matcher Gaps Class of issue
(unmatched) 35 section enumerated but no matcher fired — silent content drop
ServiceGrid 9 per-item href / image extraction incomplete
Gallery 5 image grids not detected
Footer 5 column / address extraction
USPBar 4 pinned-strip recognition
TopBar 4 scalar vs array contract drift
LogoStrip 2
About / Contact / Testimonials 1 each

garvanbay.ie is now 0/0 (matcher + translator both clean). Other 19 sites still have real bugs the verifiers will route accurately when worked on.

The "unmatched" bucket dominates — ~17% of all extracted content silently dropped because no matcher recognises the pattern. Sample headings:

  • "Contact Us" sections without form inputs (Contact matcher requires <textarea> or email-type input)
  • "PLUMBER NEAR ME" SEO blocks (no current matcher for SEO H1 strips)
  • "Thinking of completing a heating controls upgrade?" CTA-shaped sections without heading (CTAStrip wants a heading)

The three highest-leverage portfolio fixes after garvanbay:

  1. Contact matcher — accept phone+email as a Contact signal (no form required) so "Contact Us" sections render on phone-only sites
  2. CTAStrip matcher — accept heading-less variants when text length is short and there's a visible button
  3. A new "ImageStrip" / "Decorative" matcher — many (no heading) unmatched sections are decorative image rows that should still emit an image grid

Forward agenda

Next deterministic gate Owner caught
1 (built) verify-matcher Matchers — extraction completeness
2 (built) verify-translator Translators — prop-flow
3 (built) verify-content Components — rendering completeness
4 TODO verify-design.js Theme + component CSS — use Gemini, scoped per region
5 TODO verify-interactions.js Component JS — Playwright assertions

Two real architectural items still open after this session: - MANAGEMENT-ACCOUNTING dedupe collapse — investigate dedupeLayoutMap keeping the wrong sibling - FAQ-blog absorption — Wix Blog feeds nest inside FAQ sections; needs section split or a BlogPosts matcher

Standing principles for follow-on work

  1. Build a verify gate before touching the LLM. If you can express "what should be true" deterministically, do.
  2. Each gate's failure must route to one owner. "Fix something somewhere" is the failure mode the LLM loop kept hitting.
  3. Prefer fixing the matcher / library component before chasing CSS. A foundation fix pays forward across the portfolio; a per-site CSS patch does not.
  4. Pre-decompose multi-file work into atomic single-file changes. The fix-agent stalls on coordinated multi-file edits; the critic prompt's parent_id decomposition pattern is the right shape.
  5. Don't trust the LLM's "looks fine" signal. Trust deterministic gates' green/red.

Commits this session (in order, master)

e1f7855..99104c6

Highlights: - d5fbf17 matchers: emit Header cta as array, recovers 6/20 phase-3 sites - 568a700 matchers/About: emit plain text, drop double-escape that broke fulldev path - 89e5b60 dom-pipeline: download canonical raw URL, not first-seen variant - 0c351c0 verify-{content,matcher}: deterministic gates that catch what Gemini missed - 4e8114d verify-translator: deterministic gate over the matcher → translator → component flow - 99104c6 matchers/About + translate: capture & render per-section CTA buttons

The verifiers are the load-bearing artifacts going forward. Run them before any agent run, before any deploy, on every site.