Open Tau2 code
How Telecom Tasks Are Generated
The telecom generator is the clean public example of synthetic task creation in Tau2.
It does not ask an LLM to invent tasks. It composes small simulator mutations, creates
an oracle fix path, then verifies that path in a fresh telecom environment.
3
task families
mobile data, service, MMS
2,285
generated telecom tasks
tasks_full.json in this checkout
20
single-issue tasks
tasks_small.json
32
expected-transfer tasks
unfixable cases add ACTION reward
What is open
The banking knowledge construction pipeline is described in the paper, but the public
repo ships the finished banking tasks and documents. Telecom is different: its task
composition code is open under src/tau2/domains/telecom/tasks/.
The mental model
A telecom task is built from root-cause atoms. Each atom says: here is how to break the
simulator, here is the expected way to fix it, and here are any extra checks needed to
prove the fix. Multi-issue tasks are just multiple atoms applied to the same clean env.
The data objects
BaseTask: one root cause, with init funcs and fix funcs.
SelectionSet: a group where the composer chooses one option or none.
ComposedTask: flattened root causes, init funcs, fixes, and assertions.
Task: the final benchmark JSON the harness can run.
Important files
create_tasks.py merges all generated task families and samples the base split.
manager.py turns composed root causes into final task JSON.
mobile_data_issues.py, service_issues.py, and mms_issues.py define issue families.
utils.py contains composition helpers like compose_tasks.
One subtle repo detail
create_tasks.py has code to write a sampled tasks.json split
from multi-issue bins. In the local checkout, tasks.json and
tasks_full.json are identical, both with 2,285 tasks. So read the code path
and the checked-in artifact separately.
Exact pipeline
-
Issue files define atoms.
service_issues.py, mobile_data_issues.py, and
mms_issues.py define BaseTask objects. Each one has
init_funcs that create the failure and fix_funcs that return
the expected tool calls.
-
Selection sets create the combinatorics.
compose_tasks takes the cartesian product of every selection set plus
None. That means each issue slot can contribute one root cause, or be
absent. Validators keep the family honest: mobile-data tasks must include at least one
mobile-data issue; MMS tasks must include at least one MMS issue.
-
The manager starts from a clean simulator.
TaskManager.create_task calls get_environment(), applies
set_surrounding to set the user name, phone, and location, then runs every
init func. Init assertions are checked during generation, but only real env mutations
are saved into initial_state.initialization_actions.
-
The oracle fix path is generated, not guessed.
Fix funcs inspect the broken env and return expected
ToolCalls. A tool
with requestor: "assistant" is an agent-side backend tool. A tool with
requestor: "user" is something the simulated user can do after the agent
asks them, like toggling roaming or rebooting the phone.
-
Unfixable cases are represented explicitly.
If any fix func is
None, the task becomes an expected failure. The answer
key becomes transfer_to_human_agents, and the reward basis becomes
ENV_ASSERTION + ACTION.
-
The final task is assembled.
The manager fills the scenario text, ticket text, persona, task id, initialization
actions, expected actions, env assertions, and reward basis. Persona is assigned by
cycling through
None, Easy, and Hard.
-
The generated task is proved.
verify_task creates a fresh telecom env, checks it starts fixed, applies
the saved initialization actions, replays the expected fix actions, and finally checks
is_fixed plus all env assertions. This is the guardrail that keeps
synthetic tasks from becoming nonsense JSON.
BaseTask(
name="data_mode_off",
init_funcs=[data_mode_off], # break the simulator
fix_funcs=[fix_data_mode_off], # oracle expected action
)
compose_tasks(selection_sets)
product(selection_set.tasks + [None] for each selection set)
validate the combination
sort selected atoms by name
flatten init funcs, fix funcs, and extra assertions
return ComposedTask(...)
TaskManager.create_task(composed_task)
env = get_environment()
run set_surrounding(env)
run init funcs and save non-assertion EnvFunctionCalls
run fix funcs to create expected ToolCalls
if any fix func is None, expect transfer_to_human_agents
fill Task JSON
verify by replaying init + expected actions in a fresh env
Concrete example: fixable
data_mode_off_task calls a user-env function that turns mobile data off.
Its expected fix is the user tool toggle_data. The final assertions require
mobile data to be enabled and the speed test to report excellent internet.
Concrete example: unfixable
lock_sim_card_pin_task has fix_funcs=[None]. Tau2 still saves
the broken starting state, but the answer key is for the assistant to transfer to human
support instead of pretending it can unlock the SIM.
Why this matters for our RL generator
This is the recipe we would reuse for a banking-style generator: define small failure
atoms, compose them, let code produce the oracle action path, and verify every task by
replaying it against the environment. Banking is the richer trace target; telecom is the
public blueprint for building reliable synthetic tasks.