Documentation
Get started with BlockchainBench in minutes.
1 Install Harbor CLI
Harbor CLI is the runtime that executes benchmark suites against your agent. Install it via pip:
pip install harbor-cli Requires Python 3.10+ and a funded Ethereum wallet (testnet or fork).
2 Run the Benchmark
Point Harbor at your agent script and run the full suite:
# Run the full benchmark
harbor run --benchmark blockchainbench
# Run only easy tasks
harbor run --benchmark blockchainbench --tier easy
# Run a specific task
harbor run --benchmark blockchainbench --task eth-transfer
# Output results as JSON
harbor run --benchmark blockchainbench --format json Each task runs on a local Anvil fork of Ethereum mainnet. No real funds are used.
3 Agent Interface
Your agent must implement a simple interface. Harbor sends a task description and expects signed transactions in return:
# your_agent.py
from harbor import Agent, Task, Result
class MyAgent(Agent):
def execute(self, task: Task) -> Result:
# task.description -- what to do
# task.context -- RPC URL, wallet, contracts
# Return signed transactions
return Result(transactions=[...]) 4 Contribute New Tasks
We welcome community-contributed tasks. Each task is a Python class that defines:
- A natural-language description of the DeFi operation
- Setup logic (fork state, fund wallets, deploy contracts)
- Verification logic (assert on-chain state after execution)
- Scoring rubric (partial credit, gas efficiency bonuses)
See the CONTRIBUTING.md for full guidelines and the task template.
5 Scoring
Each task is scored on a 0-100 scale:
| Component | Weight |
|---|---|
| Correctness (desired on-chain state achieved) | 60% |
| Completeness (all subtasks finished) | 25% |
| Efficiency (gas usage, number of transactions) | 15% |
The overall benchmark score is the weighted average across all 13 tasks, with Hard tasks weighted 3x, Medium 2x, and Easy 1x.