Tau3 Bench - dataset notes

Dataset Notes

Small dataset facts we do not want to forget.

This page is for dataset oddities, caveats, and little observations that matter when we build training data or an RL harness around Tau3 banking.

Observation 001

User ID format drift

In the users table, record_id and user_id still match each other for these rows, but the ID format changes. The first three are short numeric strings. The later rows are generated alphanumeric strings.

Observation 002

Account ID format drift

The accounts table has the same kind of drift. Early account rows use short zero-padded IDs like 01 and 02. Later rows use generated, typed IDs like chk_e9d195fe8e and biz_chk_b5e8f2a1c9.

Observation 003

Debit card ID format drift

The debit_cards table uses dbc_ IDs, but the pattern is mixed: some encode the user and tier, while others look like generated IDs. The linked account_id can also be a legacy numeric ID like 03.

Observation 004

CVV should be hidden

The raw table includes cvv. That is okay for us as dataset inspectors, but an agent-facing banking harness should redact it or collect it only through a secure user-side verification flow.

Banking DB / users

User ID Format Drift

What we noticed

The first three user rows use numeric IDs: 123, 125, and 126. The rows after that use generated-looking alphanumeric IDs, such as 6680a37184, af0581dcbf, and 86e92f639e.

Why it matters

A parser, join rule, reward checker, or synthetic trace generator should treat these IDs as opaque strings. Do not assume user IDs are numeric, ordered, or fixed-width.

record_id address date_of_birth email name phone_number user_id rho_bank_plus_subscription
123 742 Maple Street, Austin, TX 78701 03/15/1992 mia.smith8574@gmail.com Mia Smith 512-555-0147 123
125 2793 Frum Street, Nashville, TN 02/23/1998 ayan.wang@gmail.com Ayan Wang 920-029-2039 125
126 24 Beacon Street, Boston, MA, 02114 09/21/1996 alex_alex@gmail.com Alex Riviera 930-102-1332 126
6680a37184 1847 Cherry Blossom Lane, Seattle, WA 98101 07/22/1985 kenji.tanaka@outlook.com Kenji Tanaka 206-555-0293 6680a37184
af0581dcbf 562 Riverside Drive, Chicago, IL 60611 11/03/1990 sparkle_queen_99@yahoo.com Priya Sharma 312-555-0481 af0581dcbf
86e92f639e 2201 Peachtree Road NE, Atlanta, GA 30309 02/14/1988 oadeyemi@gmail.com Oluwaseun Adeyemi 404-555-0672 86e92f639e

Banking DB / accounts

Account ID Format Drift

What we noticed

The first account rows use zero-padded numeric strings: 01 through 08. Later rows switch to generated IDs with account-type prefixes, like chk_e9d195fe8e, chk_b5e8f2a1c9, and biz_chk_b5e8f2a1c9.

Why it matters

Account IDs should also be treated as opaque strings. Any code that casts them to numbers, strips leading zeroes, or assumes checking accounts always start with chk_ will be brittle.

record_id account_id class current_holdings date_opened level status user_id
01 01 checking 0 11/08/2025 Blue Account OPEN 123
02 02 saving 129.02 11/08/2025 Bronze Account OPEN 123
03 03 checking 3400 09/02/2023 Green Account OPEN 125
04 04 saving 900 09/02/2023 Silver Account OPEN 125
05 05 checking 2850.00 11/10/2022 Green Account OPEN 224959b99e
06 06 checking 4250.00 03/10/2024 Blue Account OPEN c7d8e9f0a1
07 07 checking 3650.00 01/15/2024 Green Account OPEN e3f4a5b6c7
08 08 checking 50.00 02/01/2024 Green Account OPEN h1i2j3k4l5
chk_e9d195fe8e chk_e9d195fe8e checking 8500.00 11/15/2024 Blue Account OPEN e9d195fe8e
chk_b5e8f2a1c9 chk_b5e8f2a1c9 checking 6500.00 09/10/2023 Green Account OPEN b5e8f2a1c9
biz_chk_b5e8f2a1c9 biz_chk_b5e8f2a1c9 business_checking 3200.00 08/01/2025 Navy Blue OPEN b5e8f2a1c9

Banking DB / debit cards

Debit Card ID Format Drift

What we noticed

record_id and card_id match in these rows, but the style changes. Some IDs encode a user and card tier, like dbc_lj82d4f1a9_bluest. Others are generated-looking IDs, like dbc_7a3f9c2b1e. The linked account_id also mixes generated account IDs with legacy numeric account IDs.

Why it matters

Do not parse meaning from debit card IDs unless the harness explicitly owns that convention. Join through exact string equality and treat IDs as opaque.

record_id account_id card_id cardholder_name cvv expiration_date issue_date issue_reason
dbc_lj82d4f1a9_bluest chk_lj82d4f1a9 dbc_lj82d4f1a9_bluest LIANG JINHAI 739 03/31/2028 03/20/2024 new_account
dbc_538bfb9cba chk_538bfb9cba dbc_538bfb9cba LIANG JINHAI 451 06/30/2026 06/15/2018 expired
dbc_kj93a7b2e1_blue chk_kj93a7b2e1_1 dbc_kj93a7b2e1_blue KIM JUNHO 628 01/31/2029 01/20/2025 new_account
dbc_kj93a7b2e1_green chk_kj93a7b2e1_2 dbc_kj93a7b2e1_green KIM JUNHO 384 03/31/2029 03/25/2025 lost
dbc_kj93a7b2e1_lightgreen chk_kj93a7b2e1_3 dbc_kj93a7b2e1_lightgreen KIM JUNHO 912 08/31/2029 08/15/2025 damaged
dbc_7a3f9c2b1e 03 dbc_7a3f9c2b1e AYAN WANG 516 01/31/2030 01/05/2026 new_account
dbc_4e8d2a6f9c 05 dbc_4e8d2a6f9c YUKI NAKAMURA 432 01/31/2030 01/05/2026 stolen
dbc_9b1c5d7e3a 06 dbc_9b1c5d7e3a MARGOT BELLAMY 837 01/31/2030 01/05/2026 expired

Banking DB / debit cards

CVV Should Not Be Agent-Visible

What we noticed

The raw debit_cards table includes cvv values. Since this website is a dataset-inspection tool, it shows raw rows. That does not mean the production-style agent should see those values.

Harness rule

Hide cvv from agent-facing tools and traces. If a task truly needs CVV verification, the simulated user should provide it through a secure user tool or verification step, and logs should redact it.