What this page is
Our gates, on the record.
CaliperForge's contribution workflow runs every artifact through two independent review gates before anything ships — §4a for content, factual claims, and source quality; §4b for code and CI-runnable property tests. Reviewer is never the drafter. Each gate either passes, fails, or fails-with-fixes, and every verdict is logged.
This page publishes the adjudicated record: what those gates caught, what they missed, and where the misses were caught downstream. It updates as new adjudications land. The misses column is part of the design — a self-correction record that only shows wins is worthless.
The numbers
48 adjudicated decisions, in receipts.
Adjudicated against the documented downstream record — a
gate PASS counts as confirmed clean only if no later round
caught something it should have. The figures below cover the
48 §4a / §4b gate decisions in the window. The
needs_ceo HOLD mechanism is a separate, still-being-adjudicated
track and is not folded into these numbers.
- 16 True-positive catches: the gate fired, the artifact had a real defect, the fix landed before publish. 4 in §4b code-quality, 12 in §4a content-QA.
- 0 Zero cases where the gate flagged a non-issue and the artifact shipped unchanged on review. Of the 16 stops, every one was a real defect.
- 3 Three gate PASSes were later contradicted by a downstream catch. All three are published below with the artifact, the missed defect, and where it was finally caught.
- 84% 16 true-positive catches against 19 real defects (16 caught at the gate + 3 caught one step downstream). Precision 100% on what fired; false-positive rate 0% across 29 true negatives.
16 defects caught · 0 false positives · 3 documented misses · 84% catch rate (16 / 19) · 100% precision (16 / 16) · 0% false-positive rate (0 / 29). Across the 48 §4a / §4b decisions adjudicated 2026-06-16. The denominators are the published record — nothing rounded, nothing aggregated up.
Why these numbers mean something
Planted twins on CI, independent reviewer on every gate.
Every CaliperForge harness ships with two CI legs — a clean reference where the invariant holds (0 violations required) and a planted-bug twin where it must fire (≥1 violation required). The CI job fails if either side doesn't behave. That's the structural floor: the catches and the misses live above a build that already gates itself. On top of that, two independent review gates fire before publish — §4a on content (factual claims, source quality, register), §4b on code (correctness, anti-bloat, CI integrity). The reviewer for each gate is never the drafter, by policy. The figures above are what that two-leg-CI plus independent-review system caught and what it missed.
The honest column
The three misses, published.
Three gate PASSes in the window were later contradicted by a downstream catch. Each is cited below: which gate missed it, what the artifact carried, which downstream gate caught it, and where the fix landed. Caught by our own gates, fixed same-day or next-day in every case — not hidden, not softened.
-
FN-1 — Governance-pack email policy
§4b miss
PASS 2026-06-10
Caught 2026-06-12
The §4b reviewer audited the cf-invariants
community-health pack (CODE_OF_CONDUCT,
CONTRIBUTING, GOVERNANCE, SECURITY — 455 lines
across 4 files) for canonical-string consistency,
including email, and passed it. Two days later, the
§4a community-health gate caught
security@caliperforge.comat 4 sites in violation of thefeedback_email_caliperforge_onlypolicy and contradicting the live README's published security contact,michael@caliperforge.com. Email-string consistency was declared in §4b's scope; the policy mismatch was a §4b miss. Fix landed at commit0bff2c4via the §4a community-health regate. Caught downstream by §4a community-health (TP). Both gate verdicts and the regate are in the public record. -
FN-2 — hyperevm-safety README over-claim
§4b miss
PASS 2026-06-11
Caught same-day
The §4b preflip-regate passed the
hyperevm-safety README with the note "20 referenced
paths exist." The README was then pushed to the
public repo at the flip carrying a factual
over-claim — "each invariant paired with a
planted-bug twin that fires on CI" — that was
not tree-true (two of the six invariants ship with
deterministic broken-reference tests, not planted
twins; two more landed only at the M2 milestone).
The over-claim text was live on the public
GitHub README for roughly three hours
before our own §4a cascade caught it
(FAIL-WITH-FIXES, same day at 17:42). Root-cause fix
pushed public 18:01 at commit
e46fe9c. Caught by our own gates, fixed same-day — but the public repo carried the over-claim for those three hours. That's the honest framing. Caught downstream by §4a cascade (TP). Window of public exposure: ~3 hours. Root-cause fix ate46fe9c. -
FN-3 — verus-bridge source-quality issues
§4a miss
PASS 2026-06-11
Caught 2026-06-12
The §4a preflip-regate on the verus-bridge
writeup confirmed that 10 prior fix-loop items had
landed (API-arity mismatches, broken twins, etc.)
and passed. The same writeup still carried sourcing
issues — an ExVul citation that did not
resolve (sweep returned zero hits), Blockaid /
PeckShield wording that over-stated firm advisories
as post-mortems, and an imprecise count framing.
Caught next day by a follow-on §4a
sources-regate. Fix landed at commit
cfb0677: ExVul dropped, the firm advisories downgraded with pinned URLs + dates, the count framing removed. Factual-accuracy and source-verification were declared in §4a's scope; this was a §4a-scope miss. Caught downstream by §4a sources-regate (TP). CEO Option-C rewrite directed 2026-06-12.
Scope & honesty
What this is, what it isn't.
- What this is. Our own adjudicated record of our own §4a content and §4b code review gates — the 48 decisions logged 2026-06-04 through 2026-06-16. Every figure traces to a documented decision with a cited verdict, fix landing, and downstream outcome.
- What it isn't. Not an external audit. Not a formal-verification claim. Not a coverage metric on the underlying fuzz engines (snforge on Cairo, Crucible on Solana / Anchor, Echidna / Medusa on EVM — those have their own coverage stories). Not a claim that gates guarantee correctness; they catch what they catch, and the misses column is what they didn't.
-
Excluded. The
needs_ceoHOLD mechanism — the third class of decisions in our queue — is deliberately not folded into the catch-rate. It's a separate, still-being-adjudicated track. When it's settled, it'll get its own published record. - Growing. The window grows as new adjudications land. The misses column grows with it, by design. A self-correction record that only shows wins isn't a self-correction record.
Source memo: ops/audits/ai_ops/control_effectiveness_adjudication_2026-06-16.md
— the per-decision evidence table backing every figure
on this page. Internal path today; the publishable summary
link gets added when Director picks the public surface.