Experiment 39 min read
We told the agent what not to do. It ignored us. So we stopped trusting it.
Reliability doesn't come from the model — it comes from the system around it. 570 emails across three layers: unsupervised (53.3% leak rate) → supervised (17.5%) → enforced (0%, at the cost of intervening on 1-in-6 sends).
by Naren SathiyaRead the full experiment →
