Agent reliability
The Enforcement Layer
570 emails, judge-gated decision node. Residual leak rate → 0% at 17.5% intervention rate.
Read writeupSupervision, enforcement, evals, stochastic behavior
Swarms, orchestration, delegation, conflict resolution
Orthogonal objectives, evaluator deception, jailbreak dynamics
Logging, replay, memory, permissions, policy engines
Future-state simulation, decision trees, game dynamics
Agent reliability
570 emails, judge-gated decision node. Residual leak rate → 0% at 17.5% intervention rate.
Read writeupAgent reliability
Unsupervised vs supervised across 19 scenarios. Policy cuts leaks ~64% — 19% still violate rules.
Read writeupAgent reliability
19 scenarios, GPT-4o-mini, no rules. ~1 in 5 emails contained risky content.
Read writeup