Self-correction record

What our gates caught.

The record of what our own review gates caught — and the three things they missed. This is our own adjudicated record of our own gates, not an external audit.

Window: 2026-06-04 → 2026-06-16 · 48 adjudicated §4a / §4b gate decisions · source: ai_ops control-effectiveness adjudication, 2026-06-16

What this page is

Our gates, on the record.

CaliperForge's contribution workflow runs every artifact through two independent review gates before anything ships — §4a for content, factual claims, and source quality; §4b for code and CI-runnable property tests. Reviewer is never the drafter. Each gate either passes, fails, or fails-with-fixes, and every verdict is logged.

This page publishes the adjudicated record: what those gates caught, what they missed, and where the misses were caught downstream. It updates as new adjudications land. The misses column is part of the design — a self-correction record that only shows wins is worthless.

The numbers

48 adjudicated decisions, in receipts.

Adjudicated against the documented downstream record — a gate PASS counts as confirmed clean only if no later round caught something it should have. The figures below cover the 48 §4a / §4b gate decisions in the window. The needs_ceo HOLD mechanism is a separate, still-being-adjudicated track and is not folded into these numbers.

16 defects caught · 0 false positives · 3 documented misses · 84% catch rate (16 / 19) · 100% precision (16 / 16) · 0% false-positive rate (0 / 29). Across the 48 §4a / §4b decisions adjudicated 2026-06-16. The denominators are the published record — nothing rounded, nothing aggregated up.

Why these numbers mean something

Planted twins on CI, independent reviewer on every gate.

Every CaliperForge harness ships with two CI legs — a clean reference where the invariant holds (0 violations required) and a planted-bug twin where it must fire (≥1 violation required). The CI job fails if either side doesn't behave. That's the structural floor: the catches and the misses live above a build that already gates itself. On top of that, two independent review gates fire before publish — §4a on content (factual claims, source quality, register), §4b on code (correctness, anti-bloat, CI integrity). The reviewer for each gate is never the drafter, by policy. The figures above are what that two-leg-CI plus independent-review system caught and what it missed.

The honest column

The three misses, published.

Three gate PASSes in the window were later contradicted by a downstream catch. Each is cited below: which gate missed it, what the artifact carried, which downstream gate caught it, and where the fix landed. Caught by our own gates, fixed same-day or next-day in every case — not hidden, not softened.

Scope & honesty

What this is, what it isn't.

Source memo: ops/audits/ai_ops/control_effectiveness_adjudication_2026-06-16.md — the per-decision evidence table backing every figure on this page. Internal path today; the publishable summary link gets added when Director picks the public surface.