01
Agentic workflow
PCEC is a multi-agent repair pipeline. Four specialized agents collaborate with configurable human-in-the-loop.
02
Proprietary data moat
Gene Map is domain-specific RAG. Every repair generates data that doesn't exist in any LLM's training set.
03
Progressive automation
Day 1: LLM handles 100%. Day 30: Gene Map 90%. Day 180: 99%. Cost decreases with usage.
Perceive → Construct → Evaluate → Commit. Three safety modes provide human-in-the-loop: observe = human reviews, auto = bounded execution, full = human sets cost ceiling.
Gene Map lookup is essentially domain-specific RAG: embed error → retrieve similar repair experience → apply strategy. More agents = more data = better repairs = more agents. The flywheel that creates the moat.
Cost per repair: Day 1 = $0.001 (LLM) → Day 30 = $0.0001 → Day 180 ≈ $0. LLM only called for genuinely novel errors. Gene Map caches everything. Cost approaches zero.
Immune fast lane
— Gene Map hit → skip pipeline → <1ms → $0. Handles 90%+ recurring errors.
Stage 1 · <0.1ms for layers 1–2
Perceive — 4-layer error classification
Each layer tried in order. First match wins. 90% resolved before LLM.
Adapters cover Tempo (13 patterns), Coinbase (8), Privy (7), Generic HTTP (3). Embedding uses 28 token-weighted signatures for fuzzy matching. LLM result cached in Gene Map — called only once per novel error. Output: failureCode + category + severity + rootCause.
Stage 2 · <1ms
Construct — generate repair candidates
Gene Map history · 26-strategy library · LLM novel strategies · chain detection for compound repairs (nonce+gas → sequential fix).
Strategy chains detected automatically: refresh_nonce → speed_up_transaction. Each candidate has cost estimate, speed score, safety rating, and platform compatibility.
Stage 3 · <1ms
Evaluate — Bayesian Q-value ranking
Q ± σ uncertainty · Thompson Sampling · Adaptive α (new α≈0.15, old α≈0.01) · Context-aware by gas/time/chain · A/B test override.
Score = w_q·Q(s) + w_c·(1-cost/budget) + w_s·speed + w_h·safety. Thompson Sampling draws from N(Q, σ²) — balances exploit (high-Q) with explore (high-σ). A/B tests use 90/10 split with auto-evaluation after 20 trials.
Stage 4
Commit — execute + verify
Three safety modes control execution scope. SAGE-inspired verify validates repair. Failed verify → re-enter with strategy blocked.
Observe
Diagnose only, zero risk
Auto
Param repairs, no fund moves
Full
Cost-capped + whitelisted
Formal Safety Verification (in development): SMT-based constraint checking — balance ≥ min, gas ≤ cap, recipient ∈ whitelist. If constraints fail → auto-downgrade to observe. Aerospace-grade safety for agent payments.
Feedback loop
Gene Map — store + learn + predict
Successful repair → Gene Capsule stored. Next identical error → immune. Cross-platform: Tempo Gene protects Coinbase agents. Predictive graph preloads likely next failure.
Bayesian Q ± σ
Adaptive α
Context-aware
Cross-platform
Predictive graph
Strategy chains
Gene Registry
Audit log
OTEL tracing
↻Same error next time = <1ms immune — bypasses entire pipeline
↻Predictive graph preloads next failure → <0.1ms if prediction hits
↑Gene Registry syncs across instances → Agent A learns → Agent B immune
Causal repair graph
Beyond correlation → causal inference. Traces root cause chains. Enables preventive repairs before symptoms appear.
Federated Gene learning
Agents share Q-value gradients, never raw data. Differential privacy. Distributed RL across the network.
Formal safety verification
SMT constraint checking pre-execution. balance ≥ min, gas ≤ cap, recipient ∈ whitelist. Aerospace-grade.
Meta-learning repair
Few-shot: 3 nonce errors → learns the pattern → 4th variant fixed with 1 example. MAML-inspired transfer.
Adversarial robustness
4-layer defense: reputation scoring · multi-agent verification (3 validators) · anomaly detection on Q-trajectories · automatic rollback to safe state.
When PCEC repairs a nonce error, the Causal Graph traces: concurrent wallet access → shared nonce pool → nonce conflict. This enables fixing the root cause (add nonce queue) not just the symptom (refresh nonce).
Each agent trains Q-values locally. Only gradient updates are shared via differential privacy. The Registry aggregates into a global model. Privacy-preserving collective intelligence — like Google's Federated Learning for keyboard prediction, but for repair strategies.
Before executing any repair, an SMT solver verifies safety constraints: balance ≥ minSafe, gasCost ≤ maxBudget, recipient ∈ approvedList. If any constraint fails → auto-downgrade to observe mode. No unsafe repair ever executes.
After seeing 3 variants of nonce errors repaired, the meta-learner extracts the "nonce error repair pattern" — the common structure across variants. The 4th variant is fixed with a single example because the pattern is already learned. Built on Context-Aware Gene Map's cross-platform transfer mechanism.
Gene Registry poisoning defense: (1) Push threshold Q>0.7 + 3 successes. (2) Pull discount 20%. (3) Multi-agent verification — Gene must be validated by 3 independent agents. (4) Anomaly detection — sudden Q-value spikes flagged. (5) Auto-rollback to last known safe state if poisoning detected.
κ > 0
self-improvement
coefficient
G(generate) → V(verify) → U(update) → κ > 0 → ∞ improvement
GVU Operator (arXiv:2512.02731) — mathematically proven: if signal > noise, self-improvement is guaranteed. PCEC is a GVU instance.
Recursive evolution loop
Five methods, four papers, one closed loop. Each cycle produces a strictly stronger system.
① Memory-driven learning
Gene Map as episodic memory. Q-value tracks utility. High-utility memories retrieved more, low-utility forgotten. The system remembers what works.
Gene Map = MemRL's episodic memory. Each Gene Capsule stores: error pattern + strategy + Q-value + context + platform. Q-value acts as utility score — high-Q genes are retrieved first, low-Q genes decay via natural selection. This is exactly the mechanism described in the MemRL paper, applied to error repair.
② Experience distillation
Raw repair records → abstract strategic principles. Not "nonce → refresh" but "state desync pattern → refresh from source of truth". Learns the pattern, not the instance.
EvolveR's three-phase lifecycle maps perfectly: Online (PCEC repairs in production) → Offline Self-Distillation (Gene Map records → abstract principles) → Policy Update (Q-values + strategy ranking). The key insight: distilled principles transfer across platforms. A "state desync" principle learned on Tempo applies to Coinbase without re-learning.
③ Self-play evolution
Challenger generates increasingly hard failures. Repair agent attempts fix. Verifier validates. No real users needed — 24/7 autonomous evolution toward superintelligent repair.
The Self-Play SWE-RL paper shows a single agent trained to both inject and repair bugs of increasing complexity — no human-labeled data needed. For Helix: a Challenger agent generates novel payment failures (e.g. "what if gas spikes 10x mid-transaction and the nonce rotates?"), the PCEC Repair agent attempts to fix it, and a Verifier checks correctness. Each round, the Challenger generates harder scenarios. This is AlphaGo Zero for error repair.
④ Self-distillation
No external teacher needed. Model sees its own failures + rich feedback from Gene Map → teaches itself. Converts error feedback into dense learning signal. Cost → $0.
SDPO (2026) — RL via self-distillation, no reward model
SDPO (Self-Distillation Policy Optimization) eliminates the need for GPT-4 as teacher. Gene Map's repair records provide "rich feedback" — not just success/fail, but error details, context, timing. The model conditioned on this feedback becomes its own teacher. This is how Helix's cost approaches zero: no external LLM calls, pure self-teaching from accumulated experience.
⑤ Recursive loop
Better model → better self-play → better data → better distillation → better model. Each cycle surpasses the last. This is the recursive self-improvement condition: κ > 0.
The GVU paper proves that any system with Generator (PCEC Construct), Verifier (PCEC Evaluate), and Updater (Gene Map) will continuously improve as long as the self-improvement coefficient κ > 0. The "Second Law of AGI Dynamics": entropy increases UNLESS signal > noise. Our Gene Map ensures signal accumulates faster than noise — each successful repair is a verified data point with ground truth.
↻
Step ⑤ output feeds back to step ① — each cycle the system is strictly better. Not a promise. A theorem (κ > 0).
Distillation
Teacher→Student
DeepSeek style
★★★★☆
Self-Play ⭐
Generate→Fix
AlphaGo style
★★★★★
EvolveR
Experience→Principle
Reflect + RL
★★★★☆
SDPO
Self-teach
No reward model
★★★★☆
GVU
Math proof
κ > 0 theorem
★★★☆☆
Stars = fit for Helix. We combine all five — not pick one.
v1.5 ✓
MemRL
Gene Map Q-value
Q2 2026
EvolveR
Experience distill
Q3 2026
Self-Play
24/7 auto-evolve
Q4 2026
SDPO
Self-distillation
2027
Full loop
κ > 0 recursive
🧠
MemRL
Memory learn
✓ shipped
⚗️
EvolveR
Distill knowledge
Q2 2026
⚔️
Self-Play
Auto-evolve
Q3 2026
🔄
SDPO
Self-teach
Q4 2026
∞
GVU Loop
Full recursive
2027