Core principle

Safety & guardrails

UCogNet's self-improvement is not unconstrained. Every mutation, every execution, and every deployment is bounded by statistical gates, budget limits, anomaly detection, and automatic rollback. This page documents the safety architecture in detail.

Safety pillars

Six interlocking mechanisms that prevent uncontrolled self-improvement.

Gated evolution

Every policy mutation must pass A/B statistical gates with bootstrap confidence intervals before deployment. Mutations deploy gradually: 10% → 30% → 100% traffic.

Anomaly detection

Reward spikes greater than 3σ from rolling mean trigger automatic audit and halt. Prevents reward hacking and distribution shift exploitation.

Cost caps & budgets

Every execution operates under token, time, cost, and tool call budgets. Overruns trigger immediate rollback — no runaway inference.

Sandboxed execution

All tool calls execute in isolated sandboxes with strict permissions. No ambient authority — tools declare required capabilities upfront.

Evidence auditing

Every response carries structured claims with provenance. Outputs without evidence are flagged and cannot be trusted by downstream consumers.

Automatic rollback

If any gate fails at any deployment stage, the system reverts to the previous policy within one evaluation cycle. No human intervention required.

Shaping guardrails

Anti-reward-hacking mechanisms at the reward layer.

Shaping cap and anomaly detection

Capability boundaries

What the system can and cannot do — honestly mapped.

Capability boundaries honesty map

A/B gate protocol

Every candidate policy must clear every gate in sequence. One failure triggers full rollback.

01

Improvement threshold

Candidate must exceed baseline by a statistically significant margin (bootstrap CI).

02

Cost constraint

New mutation cannot exceed 1.2× the cost of current best policy.

03

Safety anomaly check

Reward spikes > 3σ from rolling mean trigger automatic audit and halt.

04

Gradual rollout

10% → 30% → 100% traffic with gates at each stage.

05

Rollback guarantee

If any gate fails, system reverts to previous policy within one evaluation cycle.

Evidence asymmetry analysis

Evidence asymmetry

Reward composition breakdown

Reward composition

Want to audit our safety architecture?

Full safety documentation is available under NDA for qualified partners and investors.