Legal-domain RAG — per-jurisdiction retrieval, citations that hold
Per-jurisdiction retrieval with a deterministic citation validator. Every answer grounds in a real source. AWS Bedrock + Lambda + pgvector.
The problem
In legal work a fabricated or mis-attributed citation is malpractice. An answer grounded in the wrong jurisdiction is worse than no answer. Doing it by hand doesn't scale.
The solution
Per-jurisdiction pgvector indexes keep retrieval inside the correct legal regime. A deterministic citation validator verifies every cited span exists and supports the statement before generation returns. The model is never trusted to cite from memory. AWS Bedrock handles inference; Lambda keeps it serverless and cost-bounded.
- Constraint
- A fabricated citation is malpractice. A cross-jurisdiction answer is worse than none. Manual documentation doesn't scale.
- Decision
- Isolate retrieval per jurisdiction so a query can't pull from the wrong regime. Gate generation behind a deterministic citation validator that confirms every cited span exists and supports the claim. Rejected a single shared index (cross-regime bleed) and trusting model-generated citations (hallucination risk).
- Outcome
- A working proof-of-concept where every citation is deterministically validated against the source, never generated from memory, and per-jurisdiction isolation keeps answers inside the correct legal regime.
Overview
A legal-tech RAG proof-of-concept (2024) over a multi-jurisdiction corpus, which I contributed to. Each jurisdiction gets its own pgvector index, so retrieval never bleeds across legal regimes. A deterministic citation validator checks that every cited passage exists and supports the claim before the answer returns. Serverless on AWS Bedrock + Lambda, so cost tracks usage instead of idle capacity.