Helix — Technology Vision & Architecture

Three principles

Every design decision maps to one of these.

01

Agentic workflow

PCEC is a multi-agent repair pipeline. Four specialized agents collaborate with configurable human-in-the-loop.

02

Proprietary data moat

Gene Map is domain-specific RAG. Every repair generates data that doesn't exist in any LLM's training set.

03

Progressive automation

Day 1: LLM handles 100%. Day 30: Gene Map 90%. Day 180: 99%. Cost decreases with usage.

Perceive → Construct → Evaluate → Commit. Three safety modes provide human-in-the-loop: observe = human reviews, auto = bounded execution, full = human sets cost ceiling.

Gene Map lookup is essentially domain-specific RAG: embed error → retrieve similar repair experience → apply strategy. More agents = more data = better repairs = more agents. The flywheel that creates the moat.

Cost per repair: Day 1 = $0.001 (LLM) → Day 30 = $0.0001 → Day 180 ≈ $0. LLM only called for genuinely novel errors. Gene Map caches everything. Cost approaches zero.

PCEC repair pipeline

Error → diagnose → strategize → rank → execute → learn → immune

Immune fast lane — Gene Map hit → skip pipeline → <1ms → $0. Handles 90%+ recurring errors.

Stage 1 · <0.1ms for layers 1–2

Perceive — 4-layer error classification

Each layer tried in order. First match wins. 90% resolved before LLM.

L1

Adapter

~99%

L2

Embedding

~90%

L3

LLM

~85%

L4

Unknown

Escalate

Adapters cover Tempo (13 patterns), Coinbase (8), Privy (7), Generic HTTP (3). Embedding uses 28 token-weighted signatures for fuzzy matching. LLM result cached in Gene Map — called only once per novel error. Output: failureCode + category + severity + rootCause.

Stage 2 · <1ms

Construct — generate repair candidates

Gene Map history · 26-strategy library · LLM novel strategies · chain detection for compound repairs (nonce+gas → sequential fix).

Strategy chains detected automatically: refresh_nonce → speed_up_transaction. Each candidate has cost estimate, speed score, safety rating, and platform compatibility.

Stage 3 · <1ms

Evaluate — Bayesian Q-value ranking

Q ± σ uncertainty · Thompson Sampling · Adaptive α (new α≈0.15, old α≈0.01) · Context-aware by gas/time/chain · A/B test override.

Score = w_q·Q(s) + w_c·(1-cost/budget) + w_s·speed + w_h·safety. Thompson Sampling draws from N(Q, σ²) — balances exploit (high-Q) with explore (high-σ). A/B tests use 90/10 split with auto-evaluation after 20 trials.

Stage 4

Commit — execute + verify

Three safety modes control execution scope. SAGE-inspired verify validates repair. Failed verify → re-enter with strategy blocked.

Observe

Diagnose only, zero risk

Auto

Param repairs, no fund moves

Full

Cost-capped + whitelisted

Formal Safety Verification (in development): SMT-based constraint checking — balance ≥ min, gas ≤ cap, recipient ∈ whitelist. If constraints fail → auto-downgrade to observe. Aerospace-grade safety for agent payments.

Feedback loop

Gene Map — store + learn + predict

Successful repair → Gene Capsule stored. Next identical error → immune. Cross-platform: Tempo Gene protects Coinbase agents. Predictive graph preloads likely next failure.

Bayesian Q ± σ Adaptive α Context-aware Cross-platform Predictive graph Strategy chains Gene Registry Audit log OTEL tracing

↻Same error next time = <1ms immune — bypasses entire pipeline

↻Predictive graph preloads next failure → <0.1ms if prediction hits

↑Gene Registry syncs across instances → Agent A learns → Agent B immune

Data flywheel

More agents→ More failures seen→ More Gene data→ Better repairs→ More agents↻

Technology moats

Five layers of defensibility — code can be copied, these can't.

Causal repair graph

Beyond correlation → causal inference. Traces root cause chains. Enables preventive repairs before symptoms appear.

Federated Gene learning

Agents share Q-value gradients, never raw data. Differential privacy. Distributed RL across the network.

Formal safety verification

SMT constraint checking pre-execution. balance ≥ min, gas ≤ cap, recipient ∈ whitelist. Aerospace-grade.

Meta-learning repair

Few-shot: 3 nonce errors → learns the pattern → 4th variant fixed with 1 example. MAML-inspired transfer.

Adversarial robustness

4-layer defense: reputation scoring · multi-agent verification (3 validators) · anomaly detection on Q-trajectories · automatic rollback to safe state.

When PCEC repairs a nonce error, the Causal Graph traces: concurrent wallet access → shared nonce pool → nonce conflict. This enables fixing the root cause (add nonce queue) not just the symptom (refresh nonce).

Each agent trains Q-values locally. Only gradient updates are shared via differential privacy. The Registry aggregates into a global model. Privacy-preserving collective intelligence — like Google's Federated Learning for keyboard prediction, but for repair strategies.

Before executing any repair, an SMT solver verifies safety constraints: balance ≥ minSafe, gasCost ≤ maxBudget, recipient ∈ approvedList. If any constraint fails → auto-downgrade to observe mode. No unsafe repair ever executes.

After seeing 3 variants of nonce errors repaired, the meta-learner extracts the "nonce error repair pattern" — the common structure across variants. The 4th variant is fixed with a single example because the pattern is already learned. Built on Context-Aware Gene Map's cross-platform transfer mechanism.

Gene Registry poisoning defense: (1) Push threshold Q>0.7 + 3 successes. (2) Pull discount 20%. (3) Multi-agent verification — Gene must be validated by 3 independent agents. (4) Anomaly detection — sudden Q-value spikes flagged. (5) Auto-rollback to last known safe state if poisoning detected.

Technology roadmap

Shipped

In development

Planned

P

Predictive failure graphv1.5

Statistical transition matrix. Preloads likely next Gene into L1 cache.

G

Gene Registryv1.5

Push/pull shared knowledge. Q>0.7 threshold, 20% pull discount, privacy-preserving.

E

Error embedding + A/B testingv1.5

28 token signatures for fuzzy matching. 90/10 controlled experiments.

O

OpenTelemetry + audit logv1.4

Tracing spans + metrics for Datadog/Grafana. Compliance-ready audit trail.

C

Causal repair graphdev

Root cause tracing. Fix the cause before the symptom appears.

F

Formal safety verificationdev

SMT-based pre-execution constraint checking.

M

Meta-learning repairplanned

MAML few-shot. 3 examples → learn error class → instant fix on variant.

D

Federated Gene learningplanned

Distributed RL. Gradient-only sharing. Differential privacy aggregation.

A

Adversarial robustnessplanned

Reputation + multi-agent verification + anomaly detection + rollback.

Gene Map architecture evolution

From local cache to distributed temporal knowledge graph with federated learning.

Phase 1

SQLite

Local, zero-dep

Phase 2

pgvector

Semantic search

Phase 3

Knowledge graph

Causal + temporal

Phase 4

Federated RL

Distributed learning

Self-Evolution Engine

Helix doesn't just heal agents — it heals itself. Recursive self-improvement backed by 5 papers, 5 methods, 1 closed loop.

κ > 0 self-improvement
coefficient

G(generate) → V(verify) → U(update) → κ > 0 → ∞ improvement

GVU Operator (arXiv:2512.02731) — mathematically proven: if signal > noise, self-improvement is guaranteed. PCEC is a GVU instance.

Recursive evolution loop

Five methods, four papers, one closed loop. Each cycle produces a strictly stronger system.

① Memory-driven learning

Gene Map as episodic memory. Q-value tracks utility. High-utility memories retrieved more, low-utility forgotten. The system remembers what works.

MemRL (2026) — already implemented in v1.5

Gene Map = MemRL's episodic memory. Each Gene Capsule stores: error pattern + strategy + Q-value + context + platform. Q-value acts as utility score — high-Q genes are retrieved first, low-Q genes decay via natural selection. This is exactly the mechanism described in the MemRL paper, applied to error repair.

② Experience distillation

Raw repair records → abstract strategic principles. Not "nonce → refresh" but "state desync pattern → refresh from source of truth". Learns the pattern, not the instance.

EvolveR (2025) — offline self-distillation lifecycle

EvolveR's three-phase lifecycle maps perfectly: Online (PCEC repairs in production) → Offline Self-Distillation (Gene Map records → abstract principles) → Policy Update (Q-values + strategy ranking). The key insight: distilled principles transfer across platforms. A "state desync" principle learned on Tempo applies to Coinbase without re-learning.

③ Self-play evolution

Challenger generates increasingly hard failures. Repair agent attempts fix. Verifier validates. No real users needed — 24/7 autonomous evolution toward superintelligent repair.

Self-Play SWE-RL (2025) — "toward superintelligent software agents"

The Self-Play SWE-RL paper shows a single agent trained to both inject and repair bugs of increasing complexity — no human-labeled data needed. For Helix: a Challenger agent generates novel payment failures (e.g. "what if gas spikes 10x mid-transaction and the nonce rotates?"), the PCEC Repair agent attempts to fix it, and a Verifier checks correctness. Each round, the Challenger generates harder scenarios. This is AlphaGo Zero for error repair.

④ Self-distillation

No external teacher needed. Model sees its own failures + rich feedback from Gene Map → teaches itself. Converts error feedback into dense learning signal. Cost → $0.

SDPO (2026) — RL via self-distillation, no reward model

SDPO (Self-Distillation Policy Optimization) eliminates the need for GPT-4 as teacher. Gene Map's repair records provide "rich feedback" — not just success/fail, but error details, context, timing. The model conditioned on this feedback becomes its own teacher. This is how Helix's cost approaches zero: no external LLM calls, pure self-teaching from accumulated experience.

⑤ Recursive loop

Better model → better self-play → better data → better distillation → better model. Each cycle surpasses the last. This is the recursive self-improvement condition: κ > 0.

GVU Theory (2025) — mathematical proof of self-improvement

The GVU paper proves that any system with Generator (PCEC Construct), Verifier (PCEC Evaluate), and Updater (Gene Map) will continuously improve as long as the self-improvement coefficient κ > 0. The "Second Law of AGI Dynamics": entropy increases UNLESS signal > noise. Our Gene Map ensures signal accumulates faster than noise — each successful repair is a verified data point with ground truth.

↻ Step ⑤ output feeds back to step ① — each cycle the system is strictly better. Not a promise. A theorem (κ > 0).

Method comparison

Distillation

Teacher→Student
DeepSeek style

★★★★☆

Self-Play ⭐

Generate→Fix
AlphaGo style

★★★★★

EvolveR

Experience→Principle
Reflect + RL

★★★★☆

SDPO

Self-teach
No reward model

★★★★☆

GVU

Math proof
κ > 0 theorem

★★★☆☆

Stars = fit for Helix. We combine all five — not pick one.

Implementation timeline

v1.5 ✓

MemRL

Gene Map Q-value

Q2 2026

EvolveR

Experience distill

Q3 2026

Self-Play

24/7 auto-evolve

Q4 2026

SDPO

Self-distillation

2027

Full loop

κ > 0 recursive

Self-Evolution Engine

Helix doesn't just heal agents — it heals itself. Five research methods form a recursive improvement loop.

κ > 0RECURSIVE

🧠

MemRL

Memory learn

✓ shipped

⚗️

EvolveR

Distill knowledge

Q2 2026

⚔️

Self-Play

Auto-evolve

Q3 2026

🔄

SDPO

Self-teach

Q4 2026

∞

GVU Loop

Full recursive

2027

↻ Each cycle → strictly better. 4 arXiv papers validate every step. Read the full trade-off analysis →

Self-healing infrastructure forthe autonomous agent economy

Self-healing infrastructure for
the autonomous agent economy