Can your
agent swap
ETH for USDC?

We built the test.

BlockchainBench is the open evaluation suite for AI agent DeFi capability. 13 tasks across real protocols. Machine-verifiable results. Zero wrappers.

Tasks 13 across real DeFi protocols
Tiers Easy (5) · Medium (5) · Hard (3)
Top score 72 / 100 — Claude Code
Perfect scores Zero. The gap is real.

Best agent:
72 / 100.

Placeholder scores — submit your agent results via pull request. No agent has achieved a perfect score. The gap is real.

# AGENT SCORE TASKS EASY MED HARD
01
Claude Code TOP
72 9 / 13 5/5 3/5 1/3
02
Codex
65 8 / 13 5/5 3/5 0/3
03
Gemini CLI
54 7 / 13 4/5 3/5 0/3
Rankings are community-maintained. Submit your agent's results via pull request on GitHub. Submit via PR →

13 tasks.
Zero shortcuts.

Each task runs on a live mainnet fork. Agents interact via raw JSON-RPC. Rewards are computed from onchain state deltas.

EASY — 5 TASKS
E1
Token Transfer
Send ERC-20 from funded wallet to target address via eth_sendTransaction.
ERC-20
REWARD
Binary · pass / fail
E2
ETH → USDC Swap
Single-hop Uniswap V3 swap. Minimise slippage. Correct router calldata.
Uniswap V3
REWARD
Output delta %
E3
Token Approval
approve() Uniswap router for USDC spend. Correct amount, correct spender.
ERC-20
REWARD
Binary · pass / fail
E4
Balance Query
Enumerate token balances and LP positions across 3 protocols using eth_call.
Multi-protocol
REWARD
Accuracy score
E5
Gas Estimation
Estimate gas for a transaction bundle. Target: within 10% of actual on-chain cost.
EVM core
REWARD
Error margin
MEDIUM — 5 TASKS
M1
Multi-hop Swap
Route ETH → USDC → DAI via optimal path. Handle slippage across two pools.
Uniswap V3
REWARD
P&L vs optimal
M2
Aave Supply
Deposit USDC as collateral on Aave V3. Receive aUSDC. Track health factor post-deposit.
Aave V3
REWARD
Health factor delta
M3
Uniswap LP
Provide concentrated liquidity in a price range. Choose tick bounds to maximise fee capture.
Uniswap V3
REWARD
Fee yield / IL
M4
Aave Borrow
Borrow against collateral, manage LTV. Avoid liquidation. Monitor health factor ≥ 1.5.
Aave V3
REWARD
HF > 1.5
M5
MEV-safe Swap
Execute swap with correct deadline, slippage tolerance, and minimum output. Frontrun-resistant.
Uniswap V3
REWARD
Execution quality
HARD — 3 TASKS
H1
Leveraged Loop
Recursive borrow-deposit on Aave. Reach 3× leverage without triggering liquidation.
Aave V3
REWARD
Leverage achieved
H2
LP Rebalance
Monitor price drift. Close LP, reopen in new range on 2% price move.
Uniswap V3
REWARD
Capital efficiency
H3
Flash Loan Arb
Identify arbitrage. Execute flash loan, multi-hop swap, repay — all in one transaction.
Aave + Uniswap
REWARD
Net profit after fees

Live forks.
Binary rewards.

Chain anvil --fork-url $ETH_RPC · mainnet @ block 21,000,000
Wallet Funded · 100 ETH + 10,000 USDC per episode
Grader reward = grader.score(state_pre, state_post)
Scoring Easy ×1.0 · Medium ×1.5 · Hard ×2.0 · max 100
Access MIT License · github.com/BlockchainBench
$ harbor run --benchmark blockchainbench OUTPUT

loading mainnet fork @ 21,000,000…

wallet 0xAGENT · 100 ETH · 10,000 USDC

task E2 ETH → USDC Swap

✓ tx 0x7f3a… · reward 0.94 / 1.0

task M3 Uniswap LP

~ tx 0x4c1b… · reward 0.61 / 1.5

task H3 Flash Loan Arb

✗ revert · insufficient profit after fees

reward 0.0 / 2.0

final 72 / 100 · 9 / 13 tasks passed

SUBMIT YOUR RESULTS

Beat
72.

Open benchmark · MIT License. Run on any agent. Submit results via PR. All scores verified on a live mainnet fork.