Documentation

Get started with BlockchainBench in minutes.

1 Install Harbor CLI

Harbor CLI is the runtime that executes benchmark suites against your agent. Install it via pip:

pip install harbor-cli

Requires Python 3.10+ and a funded Ethereum wallet (testnet or fork).

2 Run the Benchmark

Point Harbor at your agent script and run the full suite:

# Run the full benchmark
harbor run --benchmark blockchainbench

# Run only easy tasks
harbor run --benchmark blockchainbench --tier easy

# Run a specific task
harbor run --benchmark blockchainbench --task eth-transfer

# Output results as JSON
harbor run --benchmark blockchainbench --format json

Each task runs on a local Anvil fork of Ethereum mainnet. No real funds are used.

3 Agent Interface

Your agent must implement a simple interface. Harbor sends a task description and expects signed transactions in return:

# your_agent.py
from harbor import Agent, Task, Result

class MyAgent(Agent):
    def execute(self, task: Task) -> Result:
        # task.description -- what to do
        # task.context     -- RPC URL, wallet, contracts
        # Return signed transactions
        return Result(transactions=[...])

4 Contribute New Tasks

We welcome community-contributed tasks. Each task is a Python class that defines:

  • A natural-language description of the DeFi operation
  • Setup logic (fork state, fund wallets, deploy contracts)
  • Verification logic (assert on-chain state after execution)
  • Scoring rubric (partial credit, gas efficiency bonuses)

See the CONTRIBUTING.md for full guidelines and the task template.

5 Scoring

Each task is scored on a 0-100 scale:

Component Weight
Correctness (desired on-chain state achieved) 60%
Completeness (all subtasks finished) 25%
Efficiency (gas usage, number of transactions) 15%

The overall benchmark score is the weighted average across all 13 tasks, with Hard tasks weighted 3x, Medium 2x, and Easy 1x.