{"tasks":[{"id":"task_1","name":"Basic Ad Triage","difficulty":"easy","queue_size":5,"action_budget":25,"description":"Learn the investigation loop. Queue of 5 ads with obviously fraudulent or clearly legitimate signals. Generous budget of 25 actions (5 per ad). Novice Fraudster: only fake-giveaway and miracle-cure templates allowed. Capped at 3 proposals so the queue never exceeds 8 ads (~3 actions per ad even after the Fraudster maxes out)."},{"id":"task_2","name":"Sophisticated Fraud Under Budget Pressure","difficulty":"medium","queue_size":12,"action_budget":30,"description":"Triage under budget constraints. Mix of legit ads, sophisticated scams, and gray-area cases. 12 ads but only 30 actions (~2.5 per ad). Agent must prioritize which ads to investigate deeply. Mid-tier Fraudster: adds counterfeit, clone-brand, advance-fee, crypto, celebrity-endorsement, and gray-area supplement templates."},{"id":"task_3","name":"Coordinated Fraud Network Detection","difficulty":"hard","queue_size":20,"action_budget":35,"description":"Full challenge including coordinated fraud rings. 20 ads with 3 hidden fraud networks using varied topologies (cliques, chains, hub-and-spoke). Budget of 35 actions (~1.75 per ad). Ring member ads look borderline individually — the agent must cross-reference investigation data across ads to detect shared signals. Sophisticated Fraudster: 5 rounds, 7 proposals, full category palette including network_* ring templates."},{"id":"task_3_unseen","name":"Networks Under Tighter Budget (Held-out Eval)","difficulty":"hard","queue_size":25,"action_budget":30,"description":"Held-out generalisation eval. Same fraud + escalate templates and ring topologies as task_3, but the budget regime is deliberately unseen: 25 ads with only 30 actions (~1.2/ad vs task_3's ~1.75) and 4 hidden rings instead of 3. Used by eval_suite.run_before_after to test whether the Investigator learned the underlying detection skill or just over-fit to the training budget distribution. Never appears in TRAINING_SEED_TIERS."}],"action_schema":{"additionalProperties":false,"description":"Action space for the ad fraud investigation agent.\n\nThree action types:\n- investigate: Spend budget to reveal information about an ad\n- verdict: Approve, reject, or escalate an ad\n- link_accounts: Flag two ads as part of the same fraud network","properties":{"metadata":{"additionalProperties":true,"description":"Additional metadata for the action","title":"Metadata","type":"object"},"action_type":{"enum":["investigate","verdict","link_accounts"],"title":"Action Type","type":"string"},"ad_id":{"description":"Target ad identifier (e.g. 'ad_001')","title":"Ad Id","type":"string"},"investigation_target":{"anyOf":[{"enum":["advertiser_history","landing_page","payment_method","targeting_overlap","campaign_structure","policy_classifier"],"type":"string"},{"type":"null"}],"default":null,"description":"What to investigate (required for action_type='investigate')","title":"Investigation Target"},"verdict":{"anyOf":[{"enum":["approve","reject","escalate"],"type":"string"},{"type":"null"}],"default":null,"description":"Verdict decision (required for action_type='verdict')","title":"Verdict"},"confidence":{"anyOf":[{"maximum":1.0,"minimum":0.0,"type":"number"},{"type":"null"}],"default":null,"description":"Agent's confidence in verdict (0.0-1.0)","title":"Confidence"},"rationale":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Optional natural-language reason for the verdict (consumed by the Auditor)","title":"Rationale"},"linked_ad_id":{"anyOf":[{"type":"string"},{"type":"null"}],"default":null,"description":"Other ad in suspected fraud ring (required for action_type='link_accounts')","title":"Linked Ad Id"},"link_reason":{"anyOf":[{"type":"string"},{"type":"null"}],"default":null,"description":"Why the agent believes these ads are connected","title":"Link Reason"}},"required":["action_type","ad_id"],"title":"AdReviewAction","type":"object"},"roles":{"fraudster":{"description":"Adversarial agent. Proposes and mutates ads into the shared queue during its turn, reacting to Investigator feedback.","ws":"/ws/fraudster","action_schema":{"additionalProperties":false,"description":"Reactive turn-based action space for the Fraudster.\n\nWithin a single Fraudster turn the agent may issue multiple actions\n(typically `propose_ad` and/or `modify_pending_ad`) before finishing\nthe turn with `end_turn` (control flips to the Investigator) or\n`commit_final` (no more changes ever; episode fast-tracks to audit).\n\nHard caps (configurable on the Referee):\n  - max_proposals_per_episode  (default: 5)\n  - max_actions_per_turn       (default: 3)","properties":{"metadata":{"additionalProperties":true,"description":"Additional metadata for the action","title":"Metadata","type":"object"},"action_type":{"enum":["propose_ad","modify_pending_ad","end_turn","commit_final"],"title":"Action Type","type":"string"},"ad_copy":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Surface text of the proposed ad (required for propose_ad)","title":"Ad Copy"},"landing_page_blurb":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Optional landing-page summary the Fraudster wants the ad to advertise","title":"Landing Page Blurb"},"category":{"anyOf":[{"maxLength":64,"type":"string"},{"type":"null"}],"default":null,"description":"Self-declared ad category (must be one of the categories advertised in /tasks)","title":"Category"},"targeting_summary":{"anyOf":[{"maxLength":512,"type":"string"},{"type":"null"}],"default":null,"description":"Audience the Fraudster claims to target (e.g. 'Adults 25-45, US, interests: investing')","title":"Targeting Summary"},"slot_index":{"anyOf":[{"minimum":0,"type":"integer"},{"type":"null"}],"default":null,"description":"Index into the Fraudster's own proposals list (0-based)","title":"Slot Index"},"new_ad_copy":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Replacement ad copy","title":"New Ad Copy"},"new_landing_page_blurb":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Replacement landing page blurb","title":"New Landing Page Blurb"},"rationale":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Optional natural-language reason for this action (consumed by the Auditor)","title":"Rationale"}},"required":["action_type"],"title":"FraudsterAction","type":"object"},"observation_schema":{"additionalProperties":false,"description":"Reactive observation for the Fraudster.\n\nThe Fraudster sees the Investigator's verdicts and which investigation\ntargets the Investigator pulled, so it can adapt within the same episode\n(e.g. 'they keep checking landing_page → improve my landing page blurbs',\nor 'category=fake_crypto keeps getting rejected → try gray_area_supplements').","properties":{"done":{"default":false,"description":"Whether the episode has terminated","title":"Done","type":"boolean"},"reward":{"anyOf":[{"type":"boolean"},{"type":"integer"},{"type":"number"},{"type":"null"}],"default":null,"description":"Reward signal from the last action","title":"Reward"},"metadata":{"additionalProperties":true,"description":"Additional metadata for the observation","title":"Metadata","type":"object"},"feedback":{"default":"","description":"Free-form feedback on the last action","title":"Feedback","type":"string"},"phase":{"default":"fraudster_turn","description":"Global state-machine phase","enum":["fraudster_turn","investigator_turn","audit_phase","done"],"title":"Phase","type":"string"},"task_id":{"default":"","description":"Currently-running task id (e.g. 'task_1', 'task_3_unseen'). Surfaced so the Fraudster can scale its stealth posture per task tier without the Referee having to mutate its system prompt: easy tiers want louder fraud cues so the Investigator can succeed; hard tiers want subtler fraud cues so the trained Investigator's evaluation gain is meaningful.","title":"Task Id","type":"string"},"round_number":{"default":0,"description":"1-based round counter","minimum":0,"title":"Round Number","type":"integer"},"rounds_remaining":{"default":0,"description":"Rounds left before audit_phase","minimum":0,"title":"Rounds Remaining","type":"integer"},"proposals_used":{"default":0,"minimum":0,"title":"Proposals Used","type":"integer"},"proposals_remaining":{"default":0,"minimum":0,"title":"Proposals Remaining","type":"integer"},"actions_left_this_turn":{"default":0,"minimum":0,"title":"Actions Left This Turn","type":"integer"},"current_queue":{"description":"Current ad queue: [{ad_id, ad_copy, category, status, is_my_proposal, slot_index?}]. status ∈ {pending, investigating, approved, rejected, escalated}.","items":{"additionalProperties":true,"type":"object"},"title":"Current Queue","type":"array"},"prior_verdicts":{"description":"All verdicts rendered so far by the Investigator: [{ad_id, verdict, confidence, rationale, was_my_proposal}]","items":{"additionalProperties":true,"type":"object"},"title":"Prior Verdicts","type":"array"},"investigation_targets_used":{"additionalProperties":{"items":{"type":"string"},"type":"array"},"description":"Per-ad list of investigation_target names already pulled by the Investigator","title":"Investigation Targets Used","type":"object"},"allowed_categories":{"description":"Whitelist of category strings the Fraudster may declare","items":{"type":"string"},"title":"Allowed Categories","type":"array"},"my_proposal_signals":{"additionalProperties":{"additionalProperties":true,"type":"object"},"description":"For each Fraudster-proposed ad still on the queue, the auto-assigned underlying signals (payment_id, registrar, domain, country, account_age_days, targeting_fingerprint). These fields are NOT settable by `propose_ad` — the env samples them from the fraud-mode distribution. Surfacing them lets the Fraudster react via `modify_pending_ad` (e.g. soften the landing page on ad_004 because Investigator rejected ad_002 which shares its registrar) and reason about ring-style cross-ad collisions in its own slate.","title":"My Proposal Signals","type":"object"}},"title":"FraudsterObservation","type":"object"}},"investigator":{"description":"Review agent. Investigates ads via sub-tools and renders verdicts (approve/reject/escalate). Cannot see Fraudster intent — only the growing queue.","ws":"/ws/investigator","action_schema":{"additionalProperties":false,"description":"Action space for the ad fraud investigation agent.\n\nThree action types:\n- investigate: Spend budget to reveal information about an ad\n- verdict: Approve, reject, or escalate an ad\n- link_accounts: Flag two ads as part of the same fraud network","properties":{"metadata":{"additionalProperties":true,"description":"Additional metadata for the action","title":"Metadata","type":"object"},"action_type":{"enum":["investigate","verdict","link_accounts"],"title":"Action Type","type":"string"},"ad_id":{"description":"Target ad identifier (e.g. 'ad_001')","title":"Ad Id","type":"string"},"investigation_target":{"anyOf":[{"enum":["advertiser_history","landing_page","payment_method","targeting_overlap","campaign_structure","policy_classifier"],"type":"string"},{"type":"null"}],"default":null,"description":"What to investigate (required for action_type='investigate')","title":"Investigation Target"},"verdict":{"anyOf":[{"enum":["approve","reject","escalate"],"type":"string"},{"type":"null"}],"default":null,"description":"Verdict decision (required for action_type='verdict')","title":"Verdict"},"confidence":{"anyOf":[{"maximum":1.0,"minimum":0.0,"type":"number"},{"type":"null"}],"default":null,"description":"Agent's confidence in verdict (0.0-1.0)","title":"Confidence"},"rationale":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Optional natural-language reason for the verdict (consumed by the Auditor)","title":"Rationale"},"linked_ad_id":{"anyOf":[{"type":"string"},{"type":"null"}],"default":null,"description":"Other ad in suspected fraud ring (required for action_type='link_accounts')","title":"Linked Ad Id"},"link_reason":{"anyOf":[{"type":"string"},{"type":"null"}],"default":null,"description":"Why the agent believes these ads are connected","title":"Link Reason"}},"required":["action_type","ad_id"],"title":"AdReviewAction","type":"object"},"observation_schema":{"additionalProperties":false,"description":"Observation returned after each Investigator step.\n\nText-heavy by design so LLM agents can reason about the content naturally.\nStructured data is in queue_status for programmatic access.","properties":{"done":{"default":false,"description":"Whether the episode has terminated","title":"Done","type":"boolean"},"reward":{"anyOf":[{"type":"boolean"},{"type":"integer"},{"type":"number"},{"type":"null"}],"default":null,"description":"Reward signal from the last action","title":"Reward"},"metadata":{"additionalProperties":true,"description":"Additional metadata for the observation","title":"Metadata","type":"object"},"queue_summary":{"default":"","description":"Natural language overview of the ad queue","title":"Queue Summary","type":"string"},"current_ad_info":{"default":"","description":"Details of the ad currently in focus","title":"Current Ad Info","type":"string"},"investigation_findings":{"default":"","description":"Accumulated investigation results","title":"Investigation Findings","type":"string"},"verdict_history_summary":{"default":"","description":"Summary of verdicts rendered so far","title":"Verdict History Summary","type":"string"},"feedback":{"default":"","description":"Natural language feedback on the last action taken","title":"Feedback","type":"string"},"available_ads":{"description":"Ad IDs still pending review","items":{"type":"string"},"title":"Available Ads","type":"array"},"queue_status":{"additionalProperties":true,"description":"Structured status: total_ads, reviewed, pending, budget, step","title":"Queue Status","type":"object"},"queue_may_grow":{"default":false,"description":"True when running inside the Referee — Fraudster can still add ads","title":"Queue May Grow","type":"boolean"},"evidence_ledger":{"additionalProperties":{"additionalProperties":true,"type":"object"},"description":"Per-ad structured evidence accumulated across investigations. Surface fields (category, country, account_age_days) are always present once an ad has been touched; investigation-only fields (payment_id, registrar, domain, targeting_fingerprint, advertiser_id) appear only after the corresponding `investigate` target has been pulled. Cross-ad collisions on a SUBSET of these fields indicate fraud rings — the policy must learn which fields are discriminative (payment_id collisions matter, country collisions usually don't).","title":"Evidence Ledger","type":"object"},"queue_digest":{"description":"One-row-per-pending-ad summary surfaced WITHOUT requiring an investigation. Each row carries a curated subset of fields from the ad + advertiser_profile: a small set of potentially-discriminative columns (payment_type, registrar, domain) the Investigator can use as a pre-investigation ring-detection hint, plus a handful of decoy columns (category, country, account_age_days) that are intentionally non-discriminative so the policy must learn which collisions matter. Capped to ~12 ads to keep the prompt budget bounded.","items":{"additionalProperties":true,"type":"object"},"title":"Queue Digest","type":"array"},"decided_ads":{"description":"Per-decided-ad summary: verdict + confidence + a curated mix of discriminative (payment_id, registrar, domain, targeting_fingerprint) and decoy (category, country, account_age_days) signals from the evidence ledger. Gives the Investigator memory of past decisions for link_accounts.","items":{"additionalProperties":true,"type":"object"},"title":"Decided Ads","type":"array"}},"title":"AdReviewObservation","type":"object"}},"auditor":{"description":"Third-agent arbiter. After the match ends, audits the Investigator's reasoning (Track A) and the Fraudster's ad plausibility (Track B). Emits flags + a final audit report.","ws":"/ws/auditor","action_schema":{"additionalProperties":false,"description":"Post-hoc audit actions.\n\nTrack A audits the Investigator's *reasoning* (rationale coherence,\ncitation, calibration, consistency, bias).  Track B audits the\nFraudster's *output plausibility* (template diversity, parameter\nrealism, market fit, etc.).  The Auditor accumulates flags and then\nsubmits a final report.","properties":{"metadata":{"additionalProperties":true,"description":"Additional metadata for the action","title":"Metadata","type":"object"},"action_type":{"enum":["flag_investigator","flag_fraudster","submit_audit_report"],"title":"Action Type","type":"string"},"target_ad_id":{"anyOf":[{"type":"string"},{"type":"null"}],"default":null,"description":"Ad the flag applies to (required for flag_* actions)","title":"Target Ad Id"},"flag_type":{"anyOf":[{"maxLength":64,"type":"string"},{"type":"null"}],"default":null,"description":"Track A flag types: miscalibration, missing_citation, incoherent_rationale, inconsistency, bias. Track B flag types: gibberish, parameter_mismatch, template_repetition, market_implausible, branding_anomaly.","title":"Flag Type"},"severity":{"anyOf":[{"maximum":1.0,"minimum":0.0,"type":"number"},{"type":"null"}],"default":null,"description":"0.0 = warning, 1.0 = critical","title":"Severity"},"note":{"anyOf":[{"maxLength":2000,"type":"string"},{"type":"null"}],"default":null,"description":"Free-form auditor note","title":"Note"},"audit_report":{"anyOf":[{"additionalProperties":true,"type":"object"},{"type":"null"}],"default":null,"description":"Final report payload for action_type='submit_audit_report'","title":"Audit Report"}},"required":["action_type"],"title":"AuditorAction","type":"object"},"observation_schema":{"additionalProperties":false,"description":"Post-hoc observation for the Auditor.\n\nContains the full episode trace: every Fraudster proposal, every\nInvestigator action+rationale, all verdicts, and the synthesized\ninvestigation data the Investigator saw.","properties":{"done":{"default":false,"description":"Whether the episode has terminated","title":"Done","type":"boolean"},"reward":{"anyOf":[{"type":"boolean"},{"type":"integer"},{"type":"number"},{"type":"null"}],"default":null,"description":"Reward signal from the last action","title":"Reward"},"metadata":{"additionalProperties":true,"description":"Additional metadata for the observation","title":"Metadata","type":"object"},"feedback":{"default":"","title":"Feedback","type":"string"},"phase":{"default":"audit_phase","enum":["fraudster_turn","investigator_turn","audit_phase","done"],"title":"Phase","type":"string"},"full_episode_record":{"additionalProperties":true,"description":"Serialized record of the entire episode","title":"Full Episode Record","type":"object"},"investigator_actions":{"description":"Ordered log of every Investigator action with rationales","items":{"additionalProperties":true,"type":"object"},"title":"Investigator Actions","type":"array"},"fraudster_proposals":{"description":"Ordered log of every Fraudster proposal/modification","items":{"additionalProperties":true,"type":"object"},"title":"Fraudster Proposals","type":"array"},"investigation_data_seen":{"additionalProperties":{"additionalProperties":{"type":"string"},"type":"object"},"description":"The actual findings text the Investigator pulled per (ad_id, target)","title":"Investigation Data Seen","type":"object"},"pending_flags":{"description":"Flags accumulated so far in this audit","items":{"additionalProperties":true,"type":"object"},"title":"Pending Flags","type":"array"}},"title":"AuditorObservation","type":"object"}}},"multi_agent_endpoints":{"fraudster_ws":"/ws/fraudster","investigator_ws":"/ws/investigator","auditor_ws":"/ws/auditor","matches":"/matches","grader":"/grader"}}