UCogNet (Universal Cognition Network) is a modular metacognitive AI platform with 10 cognitive modules that routes tasks to the right solving mode, executes with verifiable evidence, and evolves via gated experiments under strict safety budgets.

How does UCogNet ensure AI safety?

UCogNet enforces safety through 6 pillars: A/B gated evolution (new behaviors must outperform baselines), cognitive budgets (hard compute limits), shaping guardrails (reward boundaries), evidence architecture (every decision produces frozen audit trails), capability boundaries (explicit skill boundaries), and reproducibility requirements (all runs are frozen and replayable).

What benchmarks has UCogNet been tested on?

UCogNet has been evaluated on BCI decoding (BNCI2014001 public dataset, 9 subjects, cross-session) achieving competitive accuracy with 9/9 subject threshold pass, and parametric physics control (Module 5) across 5 out-of-distribution campaigns with 3 seeds, where the cognitive controller outperforms PID and LQR baselines under regime shifts.

Is UCogNet open source?

UCogNet follows an evidence-first approach. Technical documentation and benchmark data are available. Contact samuel@ucognet.pro for research collaboration, licensing, or integration inquiries.

UCogNet

Technical note Request a demo

Applications · Safety Architecture

Auditable continual adaptation of language reasoners, gated by six independent safety pillars.

A research module that captures every reasoner inference as an immutable record, links each decision to its verifiable downstream outcome, and promotes a parameter-efficient adapter only after a six-pillar evaluation gate that combines hard harm constraints, calibration improvement, bootstrap confidence intervals, behavioural-envelope stability, adversarial non-regression and entropy-floor preservation. Every promotion decision produces a cryptographically chained audit record (SHA-256, RFC 6962-style) and is bit-exactly replayable from disk alone.

65 / 65

self-tests passing

modules

18 / 18

compliance verified

safety pillars

3 indep.

kill switches

2026-05-23

pre-registered

The deployment problem

Continual learning meets regulated environments

Continual adaptation of deployed language reasoners — updating model parameters online from observed outcomes — is an established research direction [Parisi 2019; Wang 2023]. Its trustworthiness depends on how candidate updates are evaluated, what baselines are preserved, and whether the optimisation signal is structurally vulnerable to Goodhart’s law [Manheim & Garrabrant 2018; Krakovna 2020; Hendrycks 2021].

Current practice — RLHF and Constitutional AI [Ouyang 2022; Bai 2022] — addresses evaluation via human labels and model-as-judge rubrics, but does not produce the cryptographically frozen, replayable evidence trails required by safety-critical deployments and post-hoc regulatory audits.

Our contribution

Evidence-first, gate-before-promote

Three load-bearing properties that, to our knowledge, no current open framework combines:

Training data is the system’s own immutable audit history, not an exogenous corpus.
Six orthogonal safety constraints enforced jointly with formal failure semantics (any single pillar failing rejects).
SHA-256-chained audit trail (Certificate-Transparency-style), making any post-hoc alteration cryptographically detectable.

Method

The six-pillar promotion gate

A candidate adapter δ_c is promoted over the currently active adapter δ_a if and only if all six pillars hold on the held-out evaluation set &Eopf;_eval:

Hard harm constraints

EvidenceEngine.check_promotion(...) → promote = True

HARD

Calibration improvement

ΔECE ≤ −ε with ε = 0.01

HARD

Bootstrap CI on improvement

95% percentile-bootstrap CI on ΔECE (5000 resamples) excludes zero

HARD

Behavioural envelope (semantic drift)

cos(embed(δₐ(p)), embed(δ_c(p))) ≥ τ = 0.85 on every p in canary set

HARD

Adversarial robustness non-regression

Refusal rate on adversarial probe set does not decrease

HARD

Output entropy floor

Mean per-token entropy ≥ η = 0.5 nats / token (anti collapse)

HARD

A single failing pillar produces a rejection with the pillar identifier and the quantitative failure value recorded in the audit trail. Rejected adapters are archived but never loaded into the live system.

Why these six

Orthogonality

No two pillars are reducible to a third. A candidate cannot pass the gate by trading a fail in one for a pass in another.

Goodhart firewall

Pillar 4 (behavioural envelope) and Pillar 6 (entropy floor) catch a candidate that improves a number while changing the meaning of responses.

Statistical floor

Pillar 3 (bootstrap CI) prevents promotion on noise. The 95% CI on ΔECE must lie strictly below zero.

Composable safety

Pillar 1 reuses the existing EvidenceEngine harm-constraints check; the gate extends rather than replaces.

Metrics v0.2

Extended technical metric suite (16 pre-registered thresholds)

The six-pillar gate is necessary but not sufficient. A wider metric suite is observed continuously and routed through telemetry → playbooks. Tier-1 (HARD) metrics block promotion immediately; Tier-2 (SOFT) metrics emit alerts that human operators investigate.

Robustness

Corrupted-input equivalence rate (SOFT, ≥ 0.85)
Adversarial attack success (SOFT, ≤ 0.10)

Hendrycks 2019

OOD

Confidence-based AUROC (SOFT, ≥ 0.70)

Hendrycks & Gimpel 2017

Calibration

ECE (SOFT, ≤ 0.10) · ACE (SOFT, ≤ 0.08) · MCE (SOFT, ≤ 0.20)

Naeini 2015 · Nixon 2019

Privacy

PII leak rate (HARD, = 0.00)
Membership-inference advantage (HARD, ≤ 0.10)

Carlini 2022 · Mireshghallah 2022

Fairness

Disparate impact ratio (HARD, ∈ [0.80, 1.25])
Equalized-odds delta (HARD, ≤ 0.10)

Hardt 2016 · Feldman 2015

Topological

Intra-cluster cosine (collapse, SOFT, ≤ 0.95)
Symmetric KL drift (SOFT, ≤ 0.50)
n-gram repetition (loops, SOFT, ≤ 0.15)

Holtzman 2020

Sustainability

Energy kWh / 1k tok (HARD, ≤ 0.05)
gCO₂e / 1k tok (HARD, ≤ 25)
Throughput tok/s (HARD, ≥ 5)

Schwartz 2020 · Henderson 2020

Operations

Telemetry, mitigation playbooks, kill switches

Telemetry

Push-style monitoring

Every metric observation is appended to metrics.jsonl. Severity-mapped alerts are deduplicated by (name, threshold, value, day-bucket) and routed to alerts.jsonl with the suggested mitigation playbook. Sliding-window buffers per metric expose OLS slope for trend-based escalation.

Playbooks

Four named mitigations

rollback_and_escalate — privacy/fairness/topology breaches
disable_surface — adversarial attack success
throttle_or_offload — sustainability breach
investigate — soft alerts, human review only

Kill switches

Three independent

File sentinel rsi/DISABLED
Environment variable UCOGNET_RSI_DISABLED=1
Daily rejection-quota auto-pause (≥10 in 24h)

No API in the module can disable any of the three from inside the loop; suspension is a physical / human act.

Compliance

Executable mapping to international frameworks (18 / 18)

Calling compliance.verify_all() executes 18 verification hooks mapping the implementation to:

NIST AI RMF 1.0

8 clauses: GOVERN-1.4 / 1.6, MAP-2.3, MEASURE-2.5 / 2.7 / 3.1, MANAGE-2.3 / 4.1.NIST AI 100-1, January 2023.

ISO / IEC 42001 : 2023

5 clauses: 6.1 risk planning, 8.2 operational control, 9.1 monitoring, 10.2 corrective action, A.6.2.6 impact assessment.AI Management System, first certifiable.

EU AI Act — Reg. (EU) 2024/1689

5 articles for the high-risk path: Art. 9 (risk management), 10 (data governance), 12 (record-keeping), 14 (human oversight), 15 (accuracy & cybersecurity).High-risk obligations applicable from 2026-08-02.

The coverage report is produced as a single JSON file suitable for ingestion by an external auditor. Each requirement points to a concrete code location (file + symbol) and a verification hook.

Reproducibility

How to independently verify

The pre-registered protocol guarantees deterministic replay. To independently verify the module on your hardware:

# Run every self-test (must report PASS for all 11 modules) for m in safety trace_store interceptor outcome_attrib gate finetune \ loop metrics telemetry playbooks compliance; do python -m ucognet.modules.rsi.$m done # Generate the compliance coverage report (must show 18/18 verified) python -c "from ucognet.modules.rsi import compliance; \ print(compliance.verify_all().to_dict())"

A failing self-test on your hardware is itself useful data: report it via/contact and we will treat it as a protocol deviation in our audit trail.

Hypotheses

Pre-registered (PROTOCOL.md § 3, frozen 2026-05-23)

Calibration improves over K cycles

Test: Wilcoxon paired signed-rank, p < 0.01, ≥ 5 seeds.

Harm-regression rate ≤ 5% within 24h post-promotion

Test: Binomial 95% CI upper bound ≤ 0.05.

Promotion rate converges (monotonically decreasing)

Test: Mann-Kendall trend test, p < 0.05.

100% of promotions are bit-exactly replayable

Test: Sample 20, run loop.replay() on each.

Open to collaboration

Universities · AI safety programmes · cybersecurity groups · strategic investors

The module is released under MIT (code) and CC-BY-NC-SA 4.0 (experimental data and adapters). We welcome second-site replication of the pre-registered protocol, adversarial probe submissions to strengthen Pillar 5 evaluation, inclusion in safety / robustness benchmark suites, and pilot integrations with regulated-industry deployments under NDA with right-to-publish the safety architecture results.

Contact the lab See the safety page

How to cite

Cite this work

If you reference this module in an academic publication, please cite the associated technical note:

@techreport{ucognet_rsi_2026, title = {A Multi-Pillar Safety Architecture for Auditable Continual Adaptation of Language Reasoners, with Reproducible Promotion Decisions}, author = {{UCogNet Lab}}, year = {2026}, month = {May}, number = {UCN-RSI-2026-05}, institution = {Brainstream Lab}, url = {https://ucognet.pro/applications/continual-adaptation}, note = {Pre-registered protocol frozen 2026-05-23. Self-test corpus: 65/65 PASS across 11 modules. Compliance verification: 18/18 requirements against NIST AI RMF 1.0, ISO/IEC 42001:2023, and EU AI Act Reg.~(EU)~2024/1689.} }

Replication queries, request-for-data, or invited-talk inquiries are welcome at samuel@ucognet.pro.

References (working set)

Amodei, D., et al. (2016). Concrete Problems in AI Safety. arXiv:1606.06565.
Hendrycks, D., et al. (2021). Unsolved Problems in ML Safety. arXiv:2109.13916.
Parisi, G. I., et al. (2019). Continual lifelong learning with neural networks: a review. Neural Networks 113:54-71.
Wang, L., et al. (2023). A Comprehensive Survey of Continual Learning. IEEE TPAMI.
Manheim, D. & Garrabrant, S. (2018). Categorizing Variants of Goodhart’s Law. arXiv:1803.04585.
Krakovna, V., et al. (2020). Specification gaming examples in AI. DeepMind.
Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS.
Bai, Y., et al. (2022). Constitutional AI. arXiv:2212.08073.
Hu, E. J., et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. ICLR.
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78:1-3.
Gneiting, T. & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. JASA 102:359-378.
Hendrycks, D. & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. ICLR.
Carlini, N., et al. (2022). Membership inference attacks from first principles. IEEE S&P.
Hardt, M., Price, E., Srebro, N. (2016). Equality of opportunity in supervised learning. NeurIPS.
Holtzman, A., et al. (2020). The curious case of neural text degeneration. ICLR.
Schwartz, R., et al. (2020). Green AI. CACM.
NIST AI 100-1 (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0).
ISO/IEC 42001:2023. Information technology — Artificial intelligence — Management system.
Regulation (EU) 2024/1689. Artificial Intelligence Act. OJ L, 12.7.2024.
RFC 6962 (2013). Certificate Transparency. IETF.