run · run-mpj06plf-3
docling-claude
status
scored · failed
total score
0.632
cases
12 passed · 7 failed
latency
224.17s
cost
$3.636
tokens
—
scored
2026-05-23 23:52:19
duration
225s
Run summary
- score
- 0.632
- passed
- ✗
- total cases
- 19
- passed cases
- 12
- skipped cases
- 0
- pass threshold
- 0.800
- tokens total
- null
- cost total
- $3.636
- latency p95
- 94.97 s
- latency total
- 224.17 s
- latency median
- 5.92 s
score by category
dates
0%
money
29%
clauses
100%
deposit
100%
scenario
0%
scenario_reasoning
100%
extras
n_scored=19n_skipped_no_gold=0Solution metadata
Self-reported by the solution. Not validated.
- repo
- https://github.com/Ruqii/trapstreet-solutions
- engine
- docling + claude-opus-4-7
Per-case results
| case | exit | duration | metrics |
|---|---|---|---|
| break_clause | 0 | 5465 ms | id=break_clausetype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085681difficulty=mediumagent_answer=Noinput_tokens=66output_tokens=5expected_answer=nomatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| deposit_amount | 0 | 5770 ms | id=deposit_amounttype=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=0.086342difficulty=easyagent_answer=The deposit amount is not legible in the extracted text.input_tokens=30output_tokens=21expected_answer=2250matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| deposit_dispute_escalation | 0 | 10356 ms | id=deposit_dispute_escalationtype=textmodel=claude-opus-4-7score=1category=depositusd_cost=0.089251difficulty=hardagent_answer=The dispute will be submitted to the Independent Case Examiner (ICE) for adjudication, and all parties agree to cooperate with the adjudication.input_tokens=64output_tokens=53expected_answer=The dispute may be submitted to the Independent Case Examiner (ICE) for adjudication.matcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| deposit_scheme | 0 | 5879 ms | id=deposit_schemetype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.087391difficulty=mediumagent_answer=The Tenancy Deposit Scheme, administered by The Dispute Service Limited.input_tokens=40output_tokens=33expected_answer=TDSmatcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| early_surrender | 0 | 5580 ms | id=early_surrendertype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085696difficulty=hardagent_answer=Yesinput_tokens=57output_tokens=7expected_answer=yesmatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| early_surrender_economic_incentive | 0 | 10333 ms | id=early_surrender_economic_incentivetype=booleanmodel=claude-opus-4-7score=1category=scenario_reasoningusd_cost=0.093721difficulty=expertagent_answer=No. The clause only requires the tenant to pay "any difference in rental payments that the outgoing Tenant should have paid to the extent that the replacement Tenant will be paying a lower amount" — it provides no corresponding benefit, refund, or credit if the replacement tenant pays a higher rent.input_tokens=127output_tokens=100expected_answer=nomatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| fixed_term_departure_notice | 0 | 5668 ms | id=fixed_term_departure_noticetype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085637difficulty=hardagent_answer=Yesinput_tokens=53output_tokens=7expected_answer=yesmatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| governing_act | 0 | 6717 ms | id=governing_acttype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.086416difficulty=hardagent_answer=Housing Act 1988input_tokens=80output_tokens=12expected_answer=Housing Act 1988matcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| inventory_referenced | 0 | 5923 ms | id=inventory_referencedtype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085741difficulty=hardagent_answer=yesinput_tokens=65output_tokens=6expected_answer=yesmatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| late_rent_interest_rate | 0 | 6689 ms | id=late_rent_interest_ratetype=textmodel=claude-opus-4-7score=1category=moneyusd_cost=0.086597difficulty=hardagent_answer=3% above Bank of England base rateinput_tokens=72output_tokens=16expected_answer=3% per annum above Bank of England base ratematcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| pets_allowed | 0 | 5726 ms | id=pets_allowedtype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085772difficulty=mediumagent_answer=with landlord consentinput_tokens=47output_tokens=10expected_answer=with landlord consentmatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| post_fixed_term_extension | 0 | 6743 ms | id=post_fixed_term_extensiontype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.090602difficulty=hardagent_answer=The agreement automatically extends for a further six months, with the rent increased by 5% from the rent in the agreement; during this extended period the tenant may end the tenancy by giving the landlord not less than two months' notice in writing.input_tokens=49output_tokens=74expected_answer=the tenancy automatically extends for six monthsmatcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| rent_increase_scope | 0 | 5905 ms | id=rent_increase_scopetype=textmodel=claude-opus-4-7score=1category=moneyusd_cost=0.086537difficulty=hardagent_answer=The automatic extension period (the extended six-month period).input_tokens=48output_tokens=20expected_answer=the automatic extension period onlymatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| rent_payment_day | 0 | 5883 ms | id=rent_payment_daytype=textmodel=claude-opus-4-7score=0category=moneyusd_cost=0.085667difficulty=mediumagent_answer=The 1st of the month.input_tokens=30output_tokens=12expected_answer=on or prior 5th of the monthmatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| rent_year2 | 0 | 94966 ms | id=rent_year2type=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=1.058231difficulty=mediumagent_answer=The document references rent increases but the actual figures appear as garbled OCR placeholders, so I cannot extract a clean GBP amount.input_tokens=50output_tokens=47expected_answer=2100matcher_results=[object Object]cache_read_input_tokens=0cache_creation_input_tokens=56211 |
| rent_year3 | 0 | 8183 ms | id=rent_year3type=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=1.055606difficulty=mediumagent_answer=N/Ainput_tokens=75output_tokens=7expected_answer=2400matcher_results=[object Object]cache_read_input_tokens=0cache_creation_input_tokens=56211 |
| scenario_leave_22mo_replacement_1mo_gap | 0 | 15574 ms | id=scenario_leave_22mo_replacement_1mo_gaptype=numericmodel=claude-opus-4-7score=0category=scenariousd_cost=0.124816difficulty=hardagent_answer=# Calculation Months surrendered early = 36 − 22 = 14 months Gap before replacement = 1 month; remaining months replacement pays = 13 months **Rent figures from the agreement:** - Months 1–12: rent applies - Months 13–24 (period after first increase): a higher rent - Months 25–36 (period after second increase): a higher rent still The agreement specifies the rent amounts but the figures shown in the extracted markdown are blank (the £ figures did not OCR). I'll use the rent labels as given: - (a) Gap rent = 1 × (rent for month 23) = **rent month 23** - (b) Letting fee = 13.2% × (sum of rents for months 24–36) - (c) Inventory: £144 × 14 / 36 = **£56.00** - (d) Admin: £480 × 14 / 36 = **£186.67** Fixed components (c) + (d) = £56.00 + £186.67 = **£242.67** # Final total Because the rent amounts in clauses 6(a) and 6(b) are not legible in the supplied markdown (the £ figures are blank), the rent-dependent parts (a) and (b) cannot be computed to a GBP figure from the document. **Total = £242.67 + 1 month's rent (month 23, at the first-increase rate) + 13.2% × (sum of rents for months 24–36)** The non-rent portion totals **£242.67**.input_tokens=275output_tokens=485expected_answer=£6,421.47matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| term_start | 0 | 5527 ms | id=term_starttype=datemodel=claude-opus-4-7score=0category=datesusd_cost=0.085756difficulty=easyagent_answer=30/09/2019input_tokens=41output_tokens=11expected_answer=05/09/2022matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| total_rent_fixed_term | 0 | 7279 ms | id=total_rent_fixed_termtype=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=0.091037difficulty=hardagent_answer=The document's rent figures (amounts) did not come through the OCR — the GBP values in clauses 6(a) and 6(b) are blank/garbled, so the monthly rent and increased rents cannot be read from the markdown.input_tokens=68output_tokens=76expected_answer=77400matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
