UCogNet's self-improvement is not unconstrained. Every mutation, every execution, and every deployment is bounded by statistical gates, budget limits, anomaly detection, and automatic rollback. This page documents the safety architecture in detail.
Six interlocking mechanisms that prevent uncontrolled self-improvement.
Every policy mutation must pass A/B statistical gates with bootstrap confidence intervals before deployment. Mutations deploy gradually: 10% → 30% → 100% traffic.
Reward spikes greater than 3σ from rolling mean trigger automatic audit and halt. Prevents reward hacking and distribution shift exploitation.
Every execution operates under token, time, cost, and tool call budgets. Overruns trigger immediate rollback — no runaway inference.
All tool calls execute in isolated sandboxes with strict permissions. No ambient authority — tools declare required capabilities upfront.
Every response carries structured claims with provenance. Outputs without evidence are flagged and cannot be trusted by downstream consumers.
If any gate fails at any deployment stage, the system reverts to the previous policy within one evaluation cycle. No human intervention required.
Anti-reward-hacking mechanisms at the reward layer.

What the system can and cannot do — honestly mapped.

Every candidate policy must clear every gate in sequence. One failure triggers full rollback.
Improvement threshold
Candidate must exceed baseline by a statistically significant margin (bootstrap CI).
Cost constraint
New mutation cannot exceed 1.2× the cost of current best policy.
Safety anomaly check
Reward spikes > 3σ from rolling mean trigger automatic audit and halt.
Gradual rollout
10% → 30% → 100% traffic with gates at each stage.
Rollback guarantee
If any gate fails, system reverts to previous policy within one evaluation cycle.

Evidence asymmetry

Reward composition
Full safety documentation is available under NDA for qualified partners and investors.