git clone https://github.com/adrianhihi/helix cd helix npm install # Run the exact same benchmark (seed 42, same failures) OPENAI_API_KEY=sk-xxx npm run benchmark # View results npx helix dash # Open localhost:3710/benchmark
Visa/MC volume today
Agent failure rate
At risk
Beyond HumanEval, we validated against real MPP services on Tempo:
3 of 4 calls failed with network mismatch — the #1 failure agents face as MPP services launch on mainnet while agents develop on testnet.
Helix Scenario #13 handles this automatically.