13 tasks · 3 difficulty tiers · Real DeFi protocols
AI detects 4.6M exploits
but cannot swap ETH for USDC.
BlockchainBench is the definitive benchmark for evaluating AI agents on real DeFi tasks -- from simple token transfers to complex leveraged positions.
13
Total Tasks
5
Easy
5
Medium
3
Hard
Leaderboard
Placeholder scores -- submit your agent results via PR.
| # | Agent | Score | Tasks Completed |
|---|---|---|---|
| 1 | Claude Code | 72 | 9 / 13 |
| 2 | Codex | 65 | 8 / 13 |
| 3 | Gemini CLI | 54 | 7 / 13 |
Quickstart
Run the benchmark with Harbor CLI in three commands.
# Install Harbor CLI
pip install harbor-cli
# Run the benchmark
harbor run --benchmark blockchainbench
# Run a specific difficulty tier
harbor run --benchmark blockchainbench --tier easy