financebench

ranked by score ↓

financebench

5 closed-book numeric questions on SEC 10-K filings — Netflix 2017, AES 2022, 3M 2018, Walmart 2018, Block 2016. Each case ships the question **plus the relevant 10-K excerpt inline** as `doc.txt`, so solvers don't need to fetch PDFs or hit external services.

5 cases

Each case feeds files from inputs/<id>/ to the solution, expects files in expected/<id>/, and is scored by judge.py.

traptask.yaml · source on GitHub

cases (5)

netflix_2017_current_liabNetflix FY2017 total current liabilities (USD millions). Single-statement extraction from the consolidated balance sheet.

input

doc.txt

Table of Contents
NETFLIX, INC.
CONSOLIDATED BALANCE SHEETS
(in thousands, except share and per share data)
 
 
 
As of December 31,
 
 
2017
 
2016
Assets
 
 
Current assets:
 
 
Cash and cash equivalents
 $
2,822,795 $
1,467,576
Short-term investments
 
 
266,206
Current content assets, net
 
4,310,934 
3,726,307
Other current assets
 
536,245 
260,202
Total current assets
 
7,669,974 
5,720,291
Non-current content assets, net
 
10,371,055 
7,274,501
Property and equipment, net
 
319,404 
250,395
Other non-current assets
 
652,309 
341,423
Total assets
 $
19,012,742 $
13,586,610
Liabilities and Stockholders Equity
 
 
Current liabilities:
 
 
Current content liabilities
 $
4,173,041 $
3,632,711
Accounts payable
 
359,555 
312,842
Accrued expenses
 
315,094 
197,632
Deferred revenue
 
618,622 
443,472
Total current liabilities
 
5,466,312 
4,586,657
Non-current content liabilities
 
3,329,796 
2,894,654
Long-term debt
 
6,499,432 
3,364,311
Other non-current liabilities
 
135,246 
61,188
Total liabilities
 
15,430,786 
10,906,810
Commitments and contingencies (Note 5)
 
 
Stockholders equity:
 
 
Preferred stock, $0.001 par value; 10,000,000 shares authorized at December 31, 2017 and 2016; no shares
issued and outstanding at December 31, 2017 and 2016
 
 

Common stock, $0.001 par value; 4,990,000,000 shares authorized at December 31, 2017 and December 31,
2016, respectively; 433,392,686 and 430,054,212 issued and outstanding at December 31, 2017 and
December 31, 2016, respectively
 
1,871,396 
1,599,762
Accumulated other comprehensive loss
 
(20,557) 
(48,565)
Retained earnings
 
1,731,117 
1,128,603
Total stockholders equity
 
3,581,956 
2,679,800
Total liabilities and stockholders equity
 $
19,012,742 $
13,586,610
See accompanying notes to consolidated financial statements.
43

question.txt

What is Netflix's year end FY2017 total current liabilities (in USD millions)? Base your judgments on the information provided primarily in the balance sheet.

expected output

answer.json

{
  "financebench_id": "financebench_id_03282",
  "company": "Netflix",
  "doc": "NETFLIX_2017_10K",
  "gold": "$5466.00"
}

Scored by judge.py — see Scoring logic below for the full rule.

aes_2022_roaAES FY2022 return on assets — derived from net income / avg(total assets 2021, 2022). Requires balance-sheet + income-statement cross-reference.

input

doc.txt

128 
Consolidated Balance Sheets
December 31, 2022 and 2021
2022
2021
(in millions, except share and per share data)
ASSETS
CURRENT ASSETS
Cash and cash equivalents
$
1,374 
$
943 
Restricted cash
536 
304 
Short-term investments
730 
232 
Accounts receivable, net of allowance for doubtful accounts of $5 and $5, respectively
1,799 
1,418 
Inventory
1,055 
604 
Prepaid expenses
98 
142 
Other current assets, net of CECL allowance of $2 and $0, respectively
1,533 
897 
Current held-for-sale assets
518 
816 
Total current assets
7,643 
5,356 
NONCURRENT ASSETS
Property, Plant and Equipment:
Land
470 
426 
Electric generation, distribution assets and other
26,599 
25,552 
Accumulated depreciation
(8,651)
(8,486)
Construction in progress
4,621 
2,414 
Property, plant and equipment, net
23,039 
19,906 
Other Assets:
Investments in and advances to affiliates
952 
1,080 
Debt service reserves and other deposits
177 
237 
Goodwill
362 
1,177 
Other intangible assets, net of accumulated amortization of $434 and $385, respectively
1,841 
1,450 
Deferred income taxes
319 
409 
Loan receivable, net of allowance of $26
1,051 
 
Other noncurrent assets, net of allowance of $51 and $23, respectively
2,979 
2,188 
Noncurrent held-for-sale assets
 
1,160 
Total other assets
7,681 
7,701 
TOTAL ASSETS
$
38,363 
$
32,963 
LIABILITIES AND EQUITY
CURRENT LIABILITIES
Accounts payable
$
1,730 
$
1,153 
Accrued interest
249 
182 
Accrued non-income taxes
249 
266 
Accrued and other liabilities
2,151 
1,205 
Non-recourse debt, including $416 and $302, respectively, related to variable interest entities
1,758 
1,367 
Current held-for-sale liabilities
354 
559 
Total current liabilities
6,491 
4,732 
NONCURRENT LIABILITIES
Recourse debt
3,894 
3,729 
Non-recourse debt, including $2,295 and $2,223, respectively, related to variable interest entities
17,846 
13,603 
Deferred income taxes
1,139 
977 
Other noncurrent liabilities
3,168 
3,358 
Noncurrent held-for-sale liabilities
 
740 
Total noncurrent liabilities
26,047 
22,407 
Commitments and Contingencies (see Notes 12 and 13)
Redeemable stock of subsidiaries
1,321 
1,257 
EQUITY
THE AES CORPORATION STOCKHOLDERS EQUITY
Preferred stock (without par value, 50,000,000 shares authorized; 1,043,050 issued and outstanding at December 31, 2022 and
December 31, 2021)
838 
838 
Common stock ($0.01 par value, 1,200,000,000 shares authorized; 818,790,001 issued and 668,743,464 outstanding at December
31, 2022 and 818,717,043 issued and 666,793,625 outstanding at December 31, 2021)
8 
8 
Additional paid-in capital
6,688 
7,106 
Accumulated deficit
(1,635)
(1,089)
Accumulated other comprehensive loss
(1,640)
(2,220)
Treasury stock, at cost (150,046,537 and 151,923,418 shares at December 31, 2022 and December 31, 2021, respectively)
(1,822)
(1,845)
Total AES Corporation stockholders equity
2,437 
2,798 
NONCONTROLLING INTERESTS
2,067 
1,769 
Total equity
4,504 
4,567 
TOTAL LIABILITIES AND EQUITY
$
38,363 
$
32,963 
See Accompanying Notes to Consolidated Financial Statements.

---

129 
Consolidated Statements of Operations
Years ended December 31, 2022, 2021, and 2020
2022
2021
2020
(in millions, except per share amounts)
Revenue:
Regulated
$
3,538 
$
2,868 
$
2,661 
Non-Regulated
9,079 
8,273 
6,999 
Total revenue
12,617 
11,141 
9,660 
Cost of Sales:
Regulated
(3,162)
(2,448)
(2,235)
Non-Regulated
(6,907)
(5,982)
(4,732)
Total cost of sales
(10,069)
(8,430)
(6,967)
Operating margin
2,548 
2,711 
2,693 
General and administrative expenses
(207)
(166)
(165)
Interest expense
(1,117)
(911)
(1,038)
Interest income
389 
298 
268 
Loss on extinguishment of debt
(15)
(78)
(186)
Other expense
(68)
(60)
(53)
Other income
102 
410 
75 
Loss on disposal and sale of business interests
(9)
(1,683)
(95)
Goodwill impairment expense
(777)
 
 
Asset impairment expense
(763)
(1,575)
(864)
Foreign currency transaction gains (losses)
(77)
(10)
55 
Other non-operating expense
(175)
 
(202)
INCOME (LOSS) FROM CONTINUING OPERATIONS BEFORE TAXES AND EQUITY IN EARNINGS OF AFFILIATES
(169)
(1,064)
488 
Income tax benefit (expense)
(265)
133 
(216)
Net equity in losses of affiliates
(71)
(24)
(123)
INCOME (LOSS) FROM CONTINUING OPERATIONS
(505)
(955)
149 
Gain from disposal of discontinued businesses, net of income tax expense of $0, $1, and $0, respectively
 
4 
3 
NET INCOME (LOSS)
(505)
(951)
152 
Less: Net loss (income) attributable to noncontrolling interests and redeemable stock of subsidiaries
(41)
542 
(106)
NET INCOME (LOSS) ATTRIBUTABLE TO THE AES CORPORATION
$
(546)
$
(409)
$
46 
AMOUNTS ATTRIBUTABLE TO THE AES CORPORATION COMMON STOCKHOLDERS:
Income (loss) from continuing operations, net of tax
$
(546)
$
(413)
$
43 
Income from discontinued operations, net of tax
 
4 
3 
NET INCOME (LOSS) ATTRIBUTABLE TO THE AES CORPORATION
$
(546)
$
(409)
$
46 
BASIC EARNINGS PER SHARE:
Income (loss) from continuing operations attributable to The AES Corporation common stockholders, net of tax
$
(0.82)
$
(0.62)
$
0.06 
Income from discontinued operations attributable to The AES Corporation common stockholders, net of tax
 
0.01 
0.01 
NET INCOME (LOSS) ATTRIBUTABLE TO THE AES CORPORATION COMMON STOCKHOLDERS
$
(0.82)
$
(0.61)
$
0.07 
DILUTED EARNINGS PER SHARE:
Income (loss) from continuing operations attributable to The AES Corporation common stockholders, net of tax
$
(0.82)
$
(0.62)
$
0.06 
Income from discontinued operations attributable to The AES Corporation common stockholders, net of tax
 
0.01 
0.01 
NET INCOME (LOSS) ATTRIBUTABLE TO THE AES CORPORATION COMMON STOCKHOLDERS
$
(0.82)
$
(0.61)
$
0.07 
See Accompanying Notes to Consolidated Financial Statements.

question.txt

Based on the information provided primarily in the statement of financial position and the statement of income, what is AES's FY2022 return on assets (ROA)? ROA is defined as: FY2022 net income / (average total assets between FY2021 and FY2022). Round your answer to two decimal places.

expected output

answer.json

{
  "financebench_id": "financebench_id_10420",
  "company": "AES Corporation",
  "doc": "AES_2022_10K",
  "gold": "-0.02"
}

Scored by judge.py — see Scoring logic below for the full rule.

threem_2018_net_ppne3M FY2018 net property, plant & equipment (USD billions). Single-statement extraction; convert millions → billions.

input

doc.txt

Table of Contents 
3M Company and Subsidiaries
Consolidated Balance Shee t
At December 31
 
 
 
December 31,
 
December 31,
 
(Dollars in millions, except per share amount)
 
2018
 
2017
 
Assets
 
 
 
 
 
Current assets
 
 
 
 
 
Cash and cash equivalents
 
$
2,853 
$
3,053 
Marketable securities current
 
 
380 
 
1,076 
Accounts receivable net of allowances of $95 and $103
 
 
5,020 
 
4,911 
Inventories
 
 
 
 
 
Finished goods
 
 
2,120 
 
1,915 
Work in process
 
 
1,292 
 
1,218 
Raw materials and supplies
 
 
954 
 
901 
Total inventories
 
 
4,366 
 
4,034 
Prepaids
 
 
741 
 
937 
Other current assets
 
 
349 
 
266 
Total current assets
 
 
13,709 
 
14,277 
Property, plant and equipment
 
 
24,873 
 
24,914 
Less: Accumulated depreciation
 
 
(16,135) 
 
(16,048) 
Property, plant and equipment net
 
 
8,738 
 
8,866 
Goodwill
 
 
10,051 
 
10,513 
Intangible assets net
 
 
2,657 
 
2,936 
Other assets
 
 
1,345 
 
1,395 
Total assets
 
$
36,500 
$
37,987 
Liabilities
 
 
 
 
 
Current liabilities
 
 
 
 
 
Short-term borrowings and current portion of long-term debt
 
$
1,211 
$
1,853 
Accounts payable
 
 
2,266 
 
1,945 
Accrued payroll
 
 
749 
 
870 
Accrued income taxes
 
 
243 
 
310 
Other current liabilities
 
 
2,775 
 
2,709 
Total current liabilities
 
 
7,244 
 
7,687 
 
 
 
 
 
 
Long-term debt
 
 
13,411 
 
12,096 
Pension and postretirement benefits
 
 
2,987 
 
3,620 
Other liabilities
 
 
3,010 
 
2,962 
Total liabilities
 
$
26,652 
$
26,365 
Commitments and contingencies (Note 16)
 
 
 
 
 
Equity
 
 
 
 
 
3M Company shareholders equity:
 
 
 
 
 
Common stock par value, $.01 par value
 
$
 9 
$
 9 
Shares outstanding - 2018: 576,575,168
 
 
 
 
 
Shares outstanding - 2017: 594,884,237
 
 
 
 
 
Additional paid-in capital
 
 
5,643 
 
5,352 
Retained earnings
 
 
40,636 
 
39,115 
Treasury stock
 
 
(29,626) 
 
(25,887) 
Accumulated other comprehensive income (loss)
 
 
(6,866) 
 
(7,026) 
Total 3M Company shareholders equity
 
 
9,796 
 
11,563 
Noncontrolling interest
 
 
52 
 
59 
Total equity
 
$
9,848 
$
11,622 
Total liabilities and equity
 
$
36,500 
$
37,987 
 
The accompanying Notes to Consolidated Financial Statements are an integral part of this statement.
58

question.txt

Assume that you are a public equities analyst. Answer the following question by primarily using information that is shown in the balance sheet: what is the year end FY2018 net PPNE for 3M? Answer in USD billions.

expected output

answer.json

{
  "financebench_id": "financebench_id_04672",
  "company": "3M",
  "doc": "3M_2018_10K",
  "gold": "$8.70"
}

Scored by judge.py — see Scoring logic below for the full rule.

walmart_2018_dpoWalmart FY2018 days payable outstanding — derived: 365 * avg(AP) / (COGS + Δinventory). Multi-statement, tricky averaging.

input

doc.txt

Walmart Inc.
Consolidated Statements of Income
 
 
Fiscal Years Ended January 31,
(Amounts in millions, except per share data)
 
2018
 
2017
 
2016
Revenues:
 
 
 
Net sales
 $
495,761
 $
481,317 $
478,614
Membership and other income
 
4,582
 
4,556 
3,516
Total revenues
 
500,343
 
485,873 
482,130
Costs and expenses:
 
 
 
Cost of sales
 
373,396
 
361,256 
360,984
Operating, selling, general and administrative expenses
 
106,510
 
101,853 
97,041
Operating income
 
20,437
 
22,764 
24,105
Interest:
 
 
 
Debt
 
1,978
 
2,044 
2,027
Capital lease and financing obligations
 
352
 
323 
521
Interest income
 
(152) 
(100) 
(81)
Interest, net
 
2,178
 
2,267 
2,467
Loss on extinguishment of debt
 
3,136
 
 

Income before income taxes
 
15,123
 
20,497 
21,638
Provision for income taxes
 
4,600
 
6,204 
6,558
Consolidated net income
 
10,523
 
14,293 
15,080
Consolidated net income attributable to noncontrolling interest
 
(661) 
(650) 
(386)
Consolidated net income attributable to Walmart
 $
9,862
 $
13,643 $
14,694
 
 
 
 
Net income per common share:
 
 
 
Basic net income per common share attributable to Walmart
 $
3.29
 $
4.40 $
4.58
Diluted net income per common share attributable to Walmart
 
3.28
 
4.38 
4.57
 
 
 
 
Weighted-average common shares outstanding:
 
 
 
Basic
 
2,995
 
3,101 
3,207
Diluted
 
3,010
 
3,112 
3,217
 
 
 
 
Dividends declared per common share
 $
2.04
 $
2.00 $
1.96
See accompanying notes.
55

---

Walmart Inc.
Consolidated Balance Sheets
 
 
As of January 31,
(Amounts in millions)
 
2018
 
2017
ASSETS
 
 
Current assets:
 
 
Cash and cash equivalents
 $
6,756
 $
6,867
Receivables, net
 
5,614
 
5,835
Inventories
 
43,783
 
43,046
Prepaid expenses and other
 
3,511
 
1,941
Total current assets
 
59,664
 
57,689
Property and equipment:
 
 
Property and equipment
 
185,154
 
179,492
Less accumulated depreciation
 
(77,479) 
(71,782)
Property and equipment, net
 
107,675
 
107,710
Property under capital lease and financing obligations:
 
 
Property under capital lease and financing obligations
 
12,703
 
11,637
Less accumulated amortization
 
(5,560) 
(5,169)
Property under capital lease and financing obligations, net
 
7,143
 
6,468
 
 
 
Goodwill
 
18,242
 
17,037
Other assets and deferred charges
 
11,798
 
9,921
Total assets
 $
204,522
 $
198,825
 
 
 
LIABILITIES AND EQUITY
 
 
Current liabilities:
 
 
Short-term borrowings
 $
5,257
 $
1,099
Accounts payable
 
46,092
 
41,433
Accrued liabilities
 
22,122
 
20,654
Accrued income taxes
 
645
 
921
Long-term debt due within one year
 
3,738
 
2,256
Capital lease and financing obligations due within one year
 
667
 
565
Total current liabilities
 
78,521
 
66,928
 
 
 
Long-term debt
 
30,045
 
36,015
Long-term capital lease and financing obligations
 
6,780
 
6,003
Deferred income taxes and other
 
8,354
 
9,344
 
 
 
Commitments and contingencies
 
 
 
 
 
Equity:
 
 
Common stock
 
295
 
305
Capital in excess of par value
 
2,648
 
2,371
Retained earnings
 
85,107
 
89,354
Accumulated other comprehensive loss
 
(10,181) 
(14,232)
Total Walmart shareholders' equity
 
77,869
 
77,798
Noncontrolling interest
 
2,953
 
2,737
Total equity
 
80,822
 
80,535
Total liabilities and equity
 $
204,522
 $
198,825
See accompanying notes.
57

question.txt

What is FY2018 days payable outstanding (DPO) for Walmart? DPO is defined as: 365 * (average accounts payable between FY2017 and FY2018) / (FY2018 COGS + change in inventory between FY2017 and FY2018). Round your answer to two decimal places. Please base your judgments on the information provided primarily in the statement of financial position and the P&L statement.

expected output

answer.json

{
  "financebench_id": "financebench_id_06247",
  "company": "Walmart",
  "doc": "WALMART_2018_10K",
  "gold": "42.69"
}

Scored by judge.py — see Scoring logic below for the full rule.

block_2016_working_capitalBlock (Square) FY2016 working capital ratio = total current assets / total current liabilities.

input

doc.txt

SQUARE,INC.
CONSOLIDATEDBALANCESHEETS
(In thousands, except share and per share data)

December31,

2016

2015
Assets

 
Currentassets:

 
Cashandcashequivalents
$
452,030 $
461,329
Short-terminvestments
59,901 

Restrictedcash
22,131 
13,537
Settlementsreceivable
321,102 
142,727
Customerfundsheld
43,574 
9,446
Loansheldforsale
42,144 
604
Merchantcashadvancereceivable,net
4,212 
36,473
Othercurrentassets
56,331 
41,447
Totalcurrentassets
1,001,425 
705,563
Propertyandequipment,net
88,328 
87,222
Goodwill
57,173 
56,699
Acquiredintangibleassets,net
19,292 
26,776
Long-terminvestments
27,366 

Restrictedcash
14,584 
14,686
Otherassets
3,194 
3,826
Totalassets
$
1,211,362 $
894,772
LiabilitiesandStockholdersEquity

 
Currentliabilities:

 
Accountspayable
$
12,602 $
18,869
Customerspayable
388,058 
215,365
Customerfundsobligation
43,574 
9,446
Accruedtransactionlosses
20,064 
17,176
Accruedexpenses
39,543 
44,401
Othercurrentliabilities
73,623 
28,945
Totalcurrentliabilities
577,464 
334,202
Debt(Note11)
 

Otherliabilities
57,745 
52,522
Totalliabilities
635,209 
386,724
Commitmentsandcontingencies(Note16)

Stockholdersequity:
 
Preferredstock,$0.0000001parvalue:100,000,000sharesauthorizedatDecember31,2016andDecember31,2015.None
issuedandoutstandingatDecember31,2016andDecember31,2015.
 

ClassAcommonstock,$0.0000001parvalue:1,000,000,000sharesauthorizedatDecember31,2016andDecember31,2015;
198,746,620and31,717,133issuedandoutstandingatDecember31,2016andDecember31,2015,respectively.
 

ClassBcommonstock,$0.0000001parvalue:500,000,000sharesauthorizedatDecember31,2016andDecember31,2015;
165,800,756and303,232,312issuedandoutstandingatDecember31,2016andDecember31,2015,respectively.
 

Additionalpaid-incapital
1,357,381 
1,116,882
Accumulatedothercomprehensiveloss
(1,989) 
(1,185)
Accumulateddeficit
(779,239) 
(607,649)
Totalstockholdersequity
576,153 
508,048
Totalliabilitiesandstockholdersequity
$
1,211,362 $
894,772
Seeaccompanyingnotestoconsolidatedfinancialstatements.
68

question.txt

Considering the data in the balance sheet, what is Block's (formerly known as Square) FY2016 working capital ratio? Define working capital ratio as total current assets divided by total current liabilities. Round your answer to two decimal places.

expected output

answer.json

{
  "financebench_id": "financebench_id_04660",
  "company": "Block",
  "doc": "BLOCK_2016_10K",
  "gold": "1.73"
}

Scored by judge.py — see Scoring logic below for the full rule.

scoring logic

judge.py runs once per case and prints a score per case. Without grader.py, the server averages case scores and marks the run passed at 0.8+.

judge.py125 lines · view on GitHub
#!/usr/bin/env python3
"""FinanceBench per-case judge — runs once per case in trap's judge protocol.

Reads:
  - the agent's captured stdout (the model's answer to this case's question)
  - the case's gold answer from expected/answer.json

Compares with 1% relative tolerance for numerics; falls back to exact / substring
string match. Adapted from the original `grade.py` in the trapstreet-eval-demo
skill (https://github.com/AntiNoise-ai/trapstreet-eval-demo).

Outputs a JSON object to stdout that trap stores as the case's `metrics`.
"""

from __future__ import annotations

import json
import os
import re
import sys
from pathlib import Path
from typing import Any

REL_TOL = 0.01

# Numeric-magnitude suffix table — handles "1.2 billion", "$1.2B", "12K", etc.
SCALE = [
    ("trillion", 1e12), ("trillions", 1e12), ("tn", 1e12), ("t", 1e12),
    ("billion", 1e9),  ("billions", 1e9),  ("bn", 1e9),  ("b", 1e9),
    ("million", 1e6),  ("millions", 1e6),  ("mn", 1e6),  ("mm", 1e6), ("m", 1e6),
    ("thousand", 1e3), ("thousands", 1e3), ("k", 1e3),
]

NUMBER_RE = re.compile(r"\(?-?\$?\s*[\d,]+(?:\.\d+)?\)?")


def parse_number(text: str) -> float | None:
    """Extract the first number-like token from `text`. Handles $, commas,
    accounting parentheses for negatives, % suffix, and magnitude suffixes.
    Returns None if no number found."""
    if not text:
        return None
    s = text.strip().lower()
    is_pct = "%" in s or " percent" in s
    m = NUMBER_RE.search(s)
    if not m:
        return None
    raw, sign = m.group(0), 1
    if raw.startswith("(") and raw.endswith(")"):
        raw, sign = raw[1:-1], -1
    raw = raw.replace("$", "").replace(",", "").replace(" ", "").strip()
    try:
        value = float(raw) * sign
    except ValueError:
        return None
    tail = s[m.end():].lstrip()
    for unit, mult in SCALE:
        if re.match(rf"\b{unit}\b", tail):
            value *= mult
            break
    if is_pct:
        value /= 100.0
    return value


def numeric_close(a: float, b: float) -> bool:
    if a == b:
        return True
    if a == 0 or b == 0:
        return abs(a - b) < 1e-9
    return abs(a - b) / max(abs(a), abs(b)) <= REL_TOL


def normalize_string(s: str) -> str:
    return re.sub(r"\s+", " ", s.strip().lower()).strip(".!?,;:")


def score_one(pred: str, gold: str) -> tuple[float, str]:
    """Return (score in {0.0, 1.0}, human-readable reason)."""
    if not pred.strip():
        return 0.0, "empty prediction"

    p_num = parse_number(pred)
    g_num = parse_number(gold)
    if p_num is not None and g_num is not None:
        if numeric_close(p_num, g_num):
            return 1.0, f"numeric match (pred={p_num:.6g} gold={g_num:.6g})"
        return 0.0, f"numeric mismatch (pred={p_num:.6g} gold={g_num:.6g})"

    if normalize_string(pred) == normalize_string(gold):
        return 1.0, "string exact match"
    if len(gold) <= 40 and normalize_string(gold) in normalize_string(pred):
        return 1.0, "substring match"
    return 0.0, f"string mismatch (pred={pred[:80]!r} gold={gold[:80]!r})"


def main() -> int:
    payload: dict[str, Any] = json.loads(os.environ["TRAPTASK_PAYLOAD"])

    # Solver writes its answer to stdout; trap captures into `outputs.stdout`.
    pred_path = payload.get("outputs", {}).get("stdout")
    pred = Path(pred_path).read_text() if pred_path else ""

    gold_path = payload["expected"]["answer.json"]
    gold_obj = json.loads(Path(gold_path).read_text())
    gold = gold_obj["gold"]

    s, reason = score_one(pred, gold)
    print(json.dumps({
        "score": s,
        "correct": s == 1.0,
        # Truncate at 500 chars so we don't store entire LLM monologues.
        "agent_answer": pred.strip()[:500],
        "expected_answer": gold,
        "reason": reason,
        "company": gold_obj.get("company"),
        "doc": gold_obj.get("doc"),
        "financebench_id": gold_obj.get("financebench_id"),
    }))
    return 0


if __name__ == "__main__":
    sys.exit(main())