Build a solution
End-to-end: write a solver against an existing task, run it locally, push it to GitHub, submit your score.
We're going to write a solver for the sum-two-numbers task from
build a task. It hands you two ints, expects you
back their sum. About 5 minutes once you have uv + tp.
What you're making
A folder with two files:
solve.py— your program. Reads inputs, writes outputs.trap.yaml— points at the task, declares which files come in and go out.
That's it. No framework code. tp run orchestrates each case for you.
Step 1 — write solve.py
# my-solution/solve.py
import json, os
from pathlib import Path
inputs = json.loads(os.environ["INPUTS"])
outputs = json.loads(os.environ["OUTPUTS"])
nums = json.loads(Path(inputs["nums.json"]).read_text())
Path(outputs["sum.json"]).write_text(json.dumps({"sum": nums["a"] + nums["b"]}))
Two env vars are everything:
| Env var | What it holds | Example |
|---|---|---|
INPUTS | JSON dict, filename → absolute path for case inputs | {"nums.json": "/tmp/.trap/.../inputs/basic/nums.json"} |
OUTPUTS | JSON dict, filename → absolute path for each declared file output | {"sum.json": "/tmp/.trap/.../basic/sum.json"} |
You can also use stdin / stdout — tp captures both automatically.
Many LLM solvers print the answer to stdout and let the task's judge
parse it.
Step 2 — write trap.yaml
# my-solution/trap.yaml
tasks:
sum-two-numbers: # must match the task id on trapstreet
cmd: uv run python solve.py
traptask: ../sum-task # path to the cloned task folder
inputs:
files: [nums.json]
file_outputs: [sum.json]
metadata:
framework: stdlib-python
model: hand-written
The metadata: block is self-reported and flows through to the
leaderboard row. tp run auto-fills repo: from git remote if
you've git-init'd the folder. For public tasks, trapstreet rejects
your submission unless metadata.repo resolves to a publicly
reachable GitHub URL — so push your solver first.
Step 3 — run it locally
cd my-solution
tp run
uv builds a venv from your pyproject.toml (any will do, even an
empty one), runs each case, runs the task's judge, prints a summary.
All scores should be 1.0 on the basic / negatives / zero cases.
Step 4 — submit
tp auth login # one-time browser OAuth, see quick start
tp submit sum-two-numbers
The CLI prints a view_url. Click it; your row's on the leaderboard.
What you didn't have to think about
- HTTP, auth, retries —
tp submithandles it. - Per-case scoring — the task author already wrote
judge.py. You just hand back the right output. - Capturing stdout/stderr/latency/exit_code —
tprecords it all. - Submitting from the same machine you ran on — you can copy
.trap/<task>/<ts>/report.jsonanywhere and submit from there.
Gotchas worth remembering
INPUTSkeys are filenames, not paths. UseINPUTS["nums.json"], notINPUTS["inputs/basic/nums.json"].- The task id must match in three places: your trap.yaml's
top-level key, the trapstreet task id, and the argument to
tp submit. - Use
uv run python ...in your cmd, not.venv/bin/python. The first lets uv build the venv; the second only works after a manual setup. - Public tasks require a public repo. Push your code to GitHub
before
tp submitor it's rejected —tp runauto-detects the remote URL intometadata.repo, or set it explicitly in trap.yaml.
