Small dataset facts we do not want to forget.
This page is for dataset oddities, caveats, and little observations that matter when
we build training data or an RL harness around Tau3 banking.
Observation 001
User ID format drift
In the users table, record_id and
user_id still match each other for these rows, but the ID format changes.
The first three are short numeric strings. The later rows are generated
alphanumeric strings.
Observation 002
Account ID format drift
The accounts table has the same kind of drift. Early account rows use
short zero-padded IDs like 01 and 02. Later rows use
generated, typed IDs like chk_e9d195fe8e and
biz_chk_b5e8f2a1c9.
Observation 003
Debit card ID format drift
The debit_cards table uses dbc_ IDs, but the pattern is
mixed: some encode the user and tier, while others look like generated IDs. The linked
account_id can also be a legacy numeric ID like 03.
Observation 004
CVV should be hidden
The raw table includes cvv. That is okay for us as dataset inspectors,
but an agent-facing banking harness should redact it or collect it only through a
secure user-side verification flow.