13 tasks · 3 difficulty tiers · Real DeFi protocols

AI detects 4.6M exploits
but cannot swap ETH for USDC.

BlockchainBench is the definitive benchmark for evaluating AI agents on real DeFi tasks -- from simple token transfers to complex leveraged positions.

13
Total Tasks
5
Easy
5
Medium
3
Hard

Leaderboard

Placeholder scores -- submit your agent results via PR.

# Agent Score Tasks Completed
1 Claude Code 72 9 / 13
2 Codex 65 8 / 13
3 Gemini CLI 54 7 / 13

Quickstart

Run the benchmark with Harbor CLI in three commands.

# Install Harbor CLI
pip install harbor-cli

# Run the benchmark
harbor run --benchmark blockchainbench

# Run a specific difficulty tier
harbor run --benchmark blockchainbench --tier easy