OPEN BENCHMARK · 2026

Can your
agent swap
ETH for USDC?

We built the test.

BlockchainBench is the open evaluation suite for AI agent DeFi capability. 13 tasks across real protocols. Machine-verifiable results. Zero wrappers.

View on GitHub → Read the docs

Tasks 13 across real DeFi protocols

Tiers Easy (5) · Medium (5) · Hard (3)

Top score 72 / 100 — Claude Code

Perfect scores Zero. The gap is real.

CURRENT RANKINGS

Best agent:
72 / 100.

Placeholder scores — submit your agent results via pull request. No agent has achieved a perfect score. The gap is real.

# AGENT SCORE TASKS EASY MED HARD

Claude Code TOP

72 9 / 13 5/5 3/5 1/3

Codex

65 8 / 13 5/5 3/5 0/3

Gemini CLI

54 7 / 13 4/5 3/5 0/3

Rankings are community-maintained. Submit your agent's results via pull request on GitHub. Submit via PR →

TASK CATALOGUE

13 tasks.
Zero shortcuts.

Each task runs on a live mainnet fork. Agents interact via raw JSON-RPC. Rewards are computed from onchain state deltas.

EASY — 5 TASKS

Token Transfer

Send ERC-20 from funded wallet to target address via eth_sendTransaction.

ERC-20

REWARD

Binary · pass / fail

ETH → USDC Swap

Single-hop Uniswap V3 swap. Minimise slippage. Correct router calldata.

Uniswap V3

REWARD

Output delta %

Token Approval

approve() Uniswap router for USDC spend. Correct amount, correct spender.

ERC-20

REWARD

Binary · pass / fail

Balance Query

Enumerate token balances and LP positions across 3 protocols using eth_call.

Multi-protocol

REWARD

Accuracy score

Gas Estimation

Estimate gas for a transaction bundle. Target: within 10% of actual on-chain cost.

EVM core

REWARD

Error margin

MEDIUM — 5 TASKS

Multi-hop Swap

Route ETH → USDC → DAI via optimal path. Handle slippage across two pools.

Uniswap V3

REWARD

P&L vs optimal

Aave Supply

Deposit USDC as collateral on Aave V3. Receive aUSDC. Track health factor post-deposit.

Aave V3

REWARD

Health factor delta

Uniswap LP

Provide concentrated liquidity in a price range. Choose tick bounds to maximise fee capture.

Uniswap V3

REWARD

Fee yield / IL

Aave Borrow

Borrow against collateral, manage LTV. Avoid liquidation. Monitor health factor ≥ 1.5.

Aave V3

REWARD

HF > 1.5

MEV-safe Swap

Execute swap with correct deadline, slippage tolerance, and minimum output. Frontrun-resistant.

Uniswap V3

REWARD

Execution quality

HARD — 3 TASKS

Leveraged Loop

Recursive borrow-deposit on Aave. Reach 3× leverage without triggering liquidation.

Aave V3

REWARD

Leverage achieved

LP Rebalance

Monitor price drift. Close LP, reopen in new range on 2% price move.

Uniswap V3

REWARD

Capital efficiency

Flash Loan Arb

Identify arbitrage. Execute flash loan, multi-hop swap, repay — all in one transaction.

Aave + Uniswap

REWARD

Net profit after fees

METHODOLOGY

Live forks.
Binary rewards.

Chain anvil --fork-url $ETH_RPC · mainnet @ block 21,000,000

Wallet Funded · 100 ETH + 10,000 USDC per episode

Grader reward = grader.score(state_pre, state_post)

Scoring Easy ×1.0 · Medium ×1.5 · Hard ×2.0 · max 100

Access MIT License · github.com/BlockchainBench

$ harbor run --benchmark blockchainbench OUTPUT

loading mainnet fork @ 21,000,000…

wallet 0xAGENT · 100 ETH · 10,000 USDC

task E2 ETH → USDC Swap

✓ tx 0x7f3a… · reward 0.94 / 1.0

task M3 Uniswap LP

~ tx 0x4c1b… · reward 0.61 / 1.5

task H3 Flash Loan Arb

✗ revert · insufficient profit after fees

reward 0.0 / 2.0

final 72 / 100 · 9 / 13 tasks passed

SUBMIT YOUR RESULTS

Beat
72.

Open benchmark · MIT License. Run on any agent. Submit results via PR. All scores verified on a live mainnet fork.

Submit via GitHub PR ↗ Read the docs → Close the gap: visit BlockchainRL ↗

Can your agent swap ETH for USDC?

Best agent: 72 / 100.

13 tasks. Zero shortcuts.

Live forks. Binary rewards.

Beat72.

Can your
agent swap
ETH for USDC?

Best agent:
72 / 100.

13 tasks.
Zero shortcuts.

Live forks.
Binary rewards.

Beat
72.