01
Problem
Business problem
Enterprise compliance programs sit on hundreds of regulations, SOPs, and historical audit findings. Internal Q&A is slow, tribal-knowledge dependent, and turns every regulator question into a fire drill. Generic LLM chat doesn't clear the legal bar.
User problem
Compliance managers Ctrl-F across PDF libraries to back up every answer. Generic chatbots hallucinate citations - a non-starter in a regulated workflow. Users need answers they can defend in front of an auditor, not just answers that sound right.
02
User journey
03
RAG architecture


04
PRD excerpt
- User
- Compliance manager, audit program owner, or operations leader who must back every answer with a defensible source.
- Problem
- Critical requirements are buried across long regulatory documents and SOPs; manual lookup is slow and error-prone.
- Goal
- Help users find accurate, source-grounded answers quickly - with citations they can click into.
- Non-goal
- Replace legal or regulatory interpretation; the assistant surfaces sources, not adjudications.
- Success metrics
- Answer groundedness (% of claims backed by retrieved passages)
- Citation precision (cited passage actually supports the claim)
- Retrieval relevance (top-k recall against golden Q&A set)
- User trust score (post-answer survey)
- Escalation rate to human reviewer
05
Example answer
Citation-first. Refusal-aware. Human-reviewed.
User question
"What inspection records must be retained for audit readiness?"
Assistant answer
High confidenceInspection records covered by the program SOP must be retained for a minimum of five (5) years, including completed checklists, photo evidence, corrective-action records, and verification signatures. Records older than the retention period may be archived but must remain retrievable for the duration of any open finding.
Citations
SOP-EHS-014 · §3.2 Records retention
"All completed inspection records, including supporting evidence, shall be retained for no less than five (5) years from the date of completion…"
Internal Audit Manual · §7.4
"Open findings extend the retention obligation for all linked inspection records until formal closure is verified."
Design principle
No citation, no answer.
If retrieval can't anchor a claim, the assistant says so and routes the user to a human reviewer or a clarifying question. Precision over recall - silent confidence is a release blocker.
06
Trust, safety & quality gates
| Gate | What it checks | Pass criteria | Human fallback |
|---|---|---|---|
Citation requirement | Every factual claim maps to a retrieved passage | 100% of claims have ≥ 1 supporting citation | Strip uncited claims; if none remain, refuse and explain |
Retrieval confidence | Top-k rerank score against query | ≥ threshold; otherwise low-confidence path | Trigger clarification mode instead of guessing |
Conflicting sources | Detect contradiction across retrieved passages | No conflict, or conflict explicitly surfaced | Switch to comparison mode showing both sources |
Sensitive / regulatory topic | Topic classifier flags regulatory adjudication asks | Answer framed as informational + human-review banner | Route to human reviewer queue |
Tenant scoping | Retrieval bound to caller's tenant document space | Zero cross-tenant passages in context | Hard block; log and alert |
Eval log review | Pre-release eval suite + sampled prod review | All eval dimensions ≥ baseline before rollout | Gate release; iterate on retrieval or prompt |
07
Evaluation
A failure in any dimension halted release.
Every answer was evaluated against a six-dimension rubric. A failure in any single dimension halted release.
| Dimension | What we checked | Pass | Fail |
|---|---|---|---|
| Retrieval accuracy | Did it pull the right requirement? | ✓Correct requirement surfaced | ✗Wrong/irrelevant requirement returned |
| Summary fidelity | Does the summary reflect the requirement? | ✓Faithful to source meaning | ✗Overstates, omits, or misreads |
| Citation correctness | Is the citation tied to that requirement? | ✓Citation matches | ✗Mismatched or missing citation |
| Grounding | Is every answer traceable to a real requirement? | ✓Links back to source | ✗Unsupported claim / no source |
| Tenant isolation | Did the user see only their own content? | ✓Correctly scoped | ✗Leakage across subscribers |
| Refusal behavior | Does it decline when content isn't in the library? | ✓“Not found in library” | ✗Fabricates an answer anyway |
QUALITY GATES (ALL MUST BE MET FOR RELEASE)
- Summary fidelity verified against source on a review sample
- Zero citation mismatches in the tested set
- No cross-tenant content leakage
- Graceful refusal when a requirement isn't in the subscriber's library
08
Tradeoffs, outcome, next
Tradeoffs
Precision over magic
- Refuses rather than guesses - slower to feel magical, far safer to defend
- Tight tenant scoping over cross-tenant 'global wisdom' features
- Full-text + metadata retrieval over compliance tables costs more than pure vector - bought precision
Outcome
Expected impact
- 30+ minute document hunts collapse to 1–2 minute grounded answers
- Defensible citation trail for every answer surfaced
- Cleared the bar for pilots inside regulated GRC and EHS programs
What I'd improve next
Next bets
- Per-tenant eval dashboards visible to compliance owners
- Feedback capture (👍 / 👎 + reason) wired into the eval loop
- Agentic follow-up: 'draft the response to the auditor' as a guarded action