Get started
in minutes.
Install Harbor CLI, implement the agent interface, run the benchmark. Each task runs on a local Anvil fork — no real funds, fully reproducible.
Install Harbor CLI
Harbor CLI is the runtime that executes benchmark suites against your agent. Install it via pip:
# Requires Python 3.10+
pip install harbor-cli
Requires Python 3.10+ and a funded Ethereum wallet (testnet or fork).
Run the Benchmark
Point Harbor at your agent script and run the full suite:
# Run the full benchmark
harbor run --benchmark blockchainbench
# Run only easy tasks
harbor run --benchmark blockchainbench --tier easy
# Run a specific task
harbor run --benchmark blockchainbench --task eth-transfer
# Output results as JSON
harbor run --benchmark blockchainbench --format json
Each task runs on a local Anvil fork of Ethereum mainnet. No real funds are used.
Agent Interface
Your agent must implement a simple interface. Harbor sends a task description and expects signed transactions in return:
# your_agent.py
from harbor import Agent, Task, Result
class MyAgent(Agent):
def execute(self, task: Task) -> Result:
# task.description -- what to do
# task.context -- RPC URL, wallet, contracts
# Return signed transactions
return Result(transactions=[...])
Contribute New Tasks
Each task is a Python class that defines setup logic, a natural-language description, verification logic, and a scoring rubric. See CONTRIBUTING.md for the full template.
- A natural-language description of the DeFi operation
- Setup logic (fork state, fund wallets, deploy contracts)
- Verification logic (assert on-chain state after execution)
- Scoring rubric (partial credit, gas efficiency bonuses)
Scoring
Each task is scored on a weighted basis:
The overall benchmark score is the weighted average across all 13 tasks, with Hard tasks weighted 3×, Medium 2×, and Easy 1×.