Compliance-grade tax-filing agent — Brazilian IRPF
A typed-tool LLM agent for regulated tax filings. One hallucinated number is a compliance failure, so every step is schema-validated and every figure cites its source.
The problem
A regulated tax-filing agent cannot hallucinate one numerical field. Every output has to replay for a five-year audit window under LGPD, on-premise, with no data egress. General-purpose RAG fails this twice: it can't prove where a number came from, and it can't bound how long it 'thinks'.
The solution
The loop caps at ≤40 turns per filing. The router rejects any cross-year retrieval hit. Forced citation plus post-hoc span validation means the model's arithmetic is never trusted blind. A deterministic anomaly-rule layer catches out-of-policy values before they land, and an append-only audit ledger anchors to a transparency log. Per-field retry, rather than re-running the whole filing, avoids redundant LLM calls. Rejected: a single large-context prompt (no provenance) and an open-ended agent loop (no audit ceiling).
- Constraint
- LGPD plus a five-year audit-replay mandate, on-premise, zero data egress. One hallucinated numerical field is a regulatory failure.
- Decision
- Bound the agent at ≤40 typed-tool turns per filing. Scope retrieval per filing year and reject cross-year hits at the router. Force every number to cite, then re-validate it against its source span before it lands. Rejected a single large-context prompt (no provenance) and an unbounded loop (no audit ceiling).
- Outcome
- By construction the agent has no path to emit an unsourced number — every figure re-validates against its source span before it lands, and a deterministic anomaly-rule layer gates out-of-policy values. The replay guarantee is a property of the design, not a production-tenure claim.
Overview
A Brazilian income-tax (IRPF) filing agent built by a six-engineer team. The loop is bounded at ≤40 turns per filing; each step is a schema-validated tool call; retrieval is scoped to the filing year. Every emitted number cites its source span and is re-validated against that span before it lands in the return. My part was the field-extraction and structured-output validation for individual filing sections. The decision log is append-only, so a filing can be replayed.