DOCUMENTATION

Get started
in minutes.

Five steps.

Install Harbor CLI, implement the agent interface, run the benchmark. Each task runs on a local Anvil fork — no real funds, fully reproducible.

STEP 01

Install Harbor CLI

Harbor CLI is the runtime that executes benchmark suites against your agent. Install it via pip:

$ pip install harbor-cli SHELL

# Requires Python 3.10+

pip install harbor-cli

Requires Python 3.10+ and a funded Ethereum wallet (testnet or fork).

STEP 02

Run the Benchmark

Point Harbor at your agent script and run the full suite:

$ harbor run --benchmark blockchainbench SHELL

# Run the full benchmark

harbor run --benchmark blockchainbench

# Run only easy tasks

harbor run --benchmark blockchainbench --tier easy

# Run a specific task

harbor run --benchmark blockchainbench --task eth-transfer

# Output results as JSON

harbor run --benchmark blockchainbench --format json

Each task runs on a local Anvil fork of Ethereum mainnet. No real funds are used.

STEP 03

Agent Interface

Your agent must implement a simple interface. Harbor sends a task description and expects signed transactions in return:

$ cat your_agent.py PYTHON

# your_agent.py

from harbor import Agent, Task, Result

class MyAgent(Agent):

def execute(self, task: Task) -> Result:

# task.description -- what to do

# task.context -- RPC URL, wallet, contracts

# Return signed transactions

return Result(transactions=[...])

STEP 04

Contribute New Tasks

Each task is a Python class that defines setup logic, a natural-language description, verification logic, and a scoring rubric. See CONTRIBUTING.md for the full template.

A natural-language description of the DeFi operation
Setup logic (fork state, fund wallets, deploy contracts)
Verification logic (assert on-chain state after execution)
Scoring rubric (partial credit, gas efficiency bonuses)

STEP 05

Scoring

Each task is scored on a weighted basis:

COMPONENT WEIGHT

Correctness (desired on-chain state achieved) 60%

Completeness (all subtasks finished) 25%

Efficiency (gas usage, number of transactions) 15%

The overall benchmark score is the weighted average across all 13 tasks, with Hard tasks weighted 3×, Medium 2×, and Easy 1×.

RESOURCES

Links.

GitHub Repository ↗ CONTRIBUTING.md ↗ Task Catalogue → BlockchainRL — close the gap ↗

Get started in minutes.