Researches

Detailed benchmark results across eight scientific domains. Same cognitive architecture — rigorously evaluated with traceable, reproducible evidence.

We screen peptide sequences as toxic vs non-toxic using a post-humanist 5-axis perception (chemical-chromatic · spectral · tactile · topological · synaesthetic; zero learned parameters; 64 dimensions) feeding a Deep MLP head, with the existing AnankeProtocol from the U-CogNet integrated cognitive system (DECT + Reverse Observer + ERA, Friston-style risk evaluator) acting as the safety / governance layer. The pipeline runs over the public ToxinPred-2 corpus (Sharma et al. 2022, Briefings in Bioinformatics), main split, 16,466 peptides.

peptides scored

16,466

val accuracy (no length)

76.5 %

topological-axis ablation

−26.5 pp

cognitive modules online

45

Ananke allowed / 64

64 / 64

classifier params

≈ 0.2 M

Figure 1 · Training dynamics — Deep MLP head on the 5-axis perception
Training dynamics — Deep MLP head on the 5-axis perception
Validation accuracy stabilises at 76.5 % after ~80 SGD steps on 16,466 peptides (8,233 toxic + 8,233 non-toxic, ToxinPred-2 main split). Train and validation accuracies track within ±0.5 pp; cross-entropy converges to 0.50.
Figure 2 · Per-axis ablation — which modality carries the discriminative signal
Per-axis ablation — which modality carries the discriminative signal
Each modal axis is replaced by its non-toxic class mean and validation accuracy is re-evaluated. The Topological axis accounts for 26.5 pp (76.5 → 50.0 % when ablated); Chromatic contributes 2.6 pp; the remaining three axes contribute < 0.2 pp each. We interpret the Topological dominance as partial co-linearity with sequence length, since the super-level filtration peak count scales with sequence length (see Figure 3).
Figure 3 · Accuracy stratified by sequence-length quintile
Accuracy stratified by sequence-length quintile
Per-bin validation accuracy together with the per-bin class composition. The shortest bin contains ≈ 88 % toxic peptides; the model predicts toxic there with 92 % accuracy, largely consistent with the class prior. Middle and longer bins (63–77 %) provide the more informative slices for evaluating discriminative ability not driven by length. Mann–Whitney z for the length distributions of the two classes: z = −65.6.
Figure 4 · Principal-component projection of the 64-D perception
Principal-component projection of the 64-D perception
Deterministic PCA projection of the L2-normalised post-humanist embedding to two dimensions. The two classes occupy overlapping but partially separable regions of an arc-shaped low-dimensional manifold. PCA receives no class labels; the figure is for visualisation only.
Figure 5 · Mean Kyte–Doolittle hydrophobicity profile per class
Mean Kyte–Doolittle hydrophobicity profile per class
Per-residue Kyte–Doolittle hydrophobicity (Kyte & Doolittle 1982) averaged across each class after resampling each sequence to 50 relative positions. Toxic peptides exhibit higher mean hydrophobicity in the central segment of the sequence; non-toxic sequences are closer to neutral. Bands are ±1 standard error of the mean.
Figure 6 · Persistent-homology surrogate — Betti curves per class
Persistent-homology surrogate — Betti curves per class
Number of super-level connected components of the normalised hydrophobicity profile as a function of filtration threshold (Edelsbrunner & Harer 2010). Toxic peptides systematically support more components in the 0.30–0.70 range. This is the topological feature ablated in Figure 2.
Figure 7 · Cognitive-module signals across training
Cognitive-module signals across training
Six representative outputs of the 45-module U-CogNet integrated cognitive system at each checkpoint: TDA Betti₁, HoloGenesis fidelity, on-line Lyapunov λ_max, Goodhart divergence, synaesthetic harmony, and pattern-kernel complexity. None of the alert thresholds are crossed throughout training.
Figure 8 · Ananke Protocol — safety verdict timeline
Ananke Protocol — safety verdict timeline
64 validation predictions per checkpoint are passed through the existing AnankeProtocol (Dynamic Ethical Causality Tensor + Reverse Observer + Existential Resilience Axis). All evaluated actions return the allowed verdict at this development stage; the same hook will fire blocked / borderline once connected to an actual synthesis-order stream.
Figure 9 · Reliability diagram and per-class confidence distribution
Reliability diagram and per-class confidence distribution
Reliability diagram (left) and per-class confidence histogram (right). Expected Calibration Error is reported without temperature scaling. Substantial confidence mass in 0.55–0.75 reflects the residual class ambiguity in the 5-axis embedding.
Figure 10 · Hardest misclassifications in PCA space
Hardest misclassifications in PCA space
Validation projected onto PC₁–PC₂; correct predictions faded, misclassifications coloured by predicted class with marker size proportional to confidence. High-confidence misses concentrate near the decision boundary (PC₁ ≈ 0), indicating that residual errors stem from genuine class overlap in the embedding rather than isolated outliers.
Caveats & honest limitations
  • Length confounder is not fully neutralised. The per-axis ablation shows that ≈ 26.5 pp of the 76.5 % accuracy depends on the Topological axis, which is co-linear with sequence length. The length-stratified accuracy (Figure 3) is the honest discriminative read for the mid-length bins (63–77 %).
  • v0.1 is peptide-level only. Synthesis-order screening requires ORF-level reasoning on DNA, which is v0.2 scope (CARD, VFDB, IGSC test sets).
  • AnankeProtocol verdicts are uniform allowed at this stage. The protocol is wired in but has no non-trivial state-transition model to evaluate against until the screening service is connected to actual order traffic.
  • No external pretrained protein model is used. ESM-2 and similar are deliberately excluded from v0.1 to keep the perception layer interpretable. They remain available as a v0.2 ablation if needed.

References & data provenance

· Sharma, N. et al. (2022). ToxinPred-2 — an improved method for predicting toxicity of proteins. Briefings in Bioinformatics 23 (5), bbac174.
· Kyte, J. & Doolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132.
· Edelsbrunner, H. & Harer, J. (2010). Computational Topology — An Introduction. AMS.
· Guo, C. et al. (2017). On Calibration of Modern Neural Networks. arXiv:1706.04599.
· UniProt Consortium (2024). SwissProt release. Background non-toxic corpus.
· Code, ETHICS statement, and full pipeline at experiments/biomolecular_safety in the U-CogNet repository.