Product stories

Product Story 01 · Multimodal AI · OCR · Human-in-the-loop

AI/OCR Checklist Digitization: Turning paper inspections into configurable digital workflows

Thesis: A multimodal AI workflow that helps safety and compliance teams convert PDF and photo-based inspection checklists into structured digital checklists with human review, classification, and assignment logic.

Multimodal AIOCRHuman-in-the-loopEvalsEnterprise Workflow
Year
2025
Role
Senior PM (owner)
Stack
Vision LLM · OCR · HITL
Users
Enterprise EHS admins

01

Problem

Business problem

Onboarding new enterprise tenants stalls because admins arrive with hundreds of legacy PDF and paper checklists. Manual digitization is the single biggest source of implementation friction and a leading driver of stalled rollouts.

User problem

Site safety managers and compliance admins re-key checklists by hand into the platform. Field types are inconsistent, safety-critical wording gets paraphrased away, and an inspection program that should launch in a week takes a month.

02

Before / after workflow

Before
  • Paper or PDF checklist arrives from site team
  • Admin re-keys questions into the platform manually
  • Inconsistent response types, missed required fields
  • Inspection setup delayed by days or weeks
After
  • Upload PDF or photo of checklist
  • OCR extracts text + layout, LLM structures questions
  • Human reviews flagged fields, accepts or edits
  • Digital checklist scheduled and assigned the same day

03

System flow

AI/OCR Checklist Digitization system flow diagram
Two-stage AI pipeline on Amazon Bedrock + AWS Textract, with a confidence gate and human-in-the-loop review.
01UploadPDF · photo
02OCR extractionLayout-aware
03Document parsingSections + tables
04LLM structuringVision model
05Field mappingQuestion schema
06Confidence scoringPer field
07Human validationReview queue
08Digital checklistSchedule + assign

04

My PM role & product decisions

My PM role

End-to-end owner

  • Problem framing + opportunity sizing with implementation team
  • Vendor and model trade-off analysis (OCR vs custom ML vs managed multimodal)
  • PRD, UX flow with design, eval criteria, rollout plan
  • Customer-council validation and adoption playbook

Product decisions

Why managed multimodal + HITL

  • Managed multimodal: ship sooner, no in-house model maintenance
  • Two-stage pipeline: layout-aware OCR feeds vision LLM for structure
  • Confidence-scored review queue instead of autonomous publish
  • Map output into existing checklist schema, not a parallel data model

05

PRD excerpt

PRD excerptv1.0 · draft
User
Site safety manager / compliance admin onboarding a new inspection program.
Problem
Manual checklist digitization is slow, inconsistent, and error-prone — and blocks tenant onboarding.
Goal
Reduce setup time per checklist and improve structural quality of the digital output.
Non-goal
Fully autonomous publishing without human review of safety-critical content.
Success metrics
  • Extraction accuracy (field-level F1)
  • Time-to-publish per checklist (target: < 10 min, baseline: hours)
  • Review acceptance rate without edits
  • Rework rate after publish
  • Time-to-first-inspection for a new tenant

06

Trust, safety & quality gates

GateWhat it checksPass criteriaHuman fallback
OCR confidence
Per-block recognition confidence from the OCR engine≥ 0.85 average; ≥ 0.70 per safety-critical blockFlag block for manual transcription in review queue
Question extraction completeness
Number of detected questions vs. layout-implied countDetected ≥ 95% of expected questionsShow side-by-side comparison; admin adds missing items
Response type classification
Yes/No, numeric, text, date, photo, signature mappingModel confidence ≥ 0.8 AND matches expected patternDefault to text + surface suggested type for confirmation
Duplicate question detection
Semantic similarity across extracted questionsNo two questions within similarity threshold of each otherHighlight duplicates and prompt admin to merge or keep
Required field validation
Owner, frequency, response type, scoring rule presenceAll required metadata populatedBlock publish; route to admin to complete missing fields
Safety-critical wording preservation
Regulated phrases preserved verbatim, not paraphrasedExact-match against protected terminology dictionaryForce original-wording mode; require explicit override

The review experience

Flagged and safety-critical fields route to a confidence-scored review queue before anything publishes.

Confidence-scored review queue interface with flagged and safety-critical fields
Confidence-scored review queue — humans verify flagged and safety-critical fields.

07

Sample evaluation framework

Eval 01

Structure accuracy

Did the AI preserve sections, questions, ordering, and hierarchy against a hand-labeled gold set of 200 enterprise checklists? Scored as section F1 + ordering Kendall tau.

Eval 02

Field classification

Did the model correctly classify response types — yes/no, numeric, text, date, photo, signature? Per-class precision and recall, tracked over time as new checklist domains are added.

Eval 03

Safety integrity

Did the output preserve required compliance wording without unsafe summarization? Verified against a regulated-terminology dictionary; any miss is a release blocker.

08

Tradeoffs, outcome, next

Tradeoffs

What we deliberately accepted

  • Shipped on managed multimodal instead of a custom-trained model — faster to value, modest accuracy ceiling
  • Kept human review in the loop instead of pursuing autonomous publish
  • Mapped into existing schema, deferring a richer checklist data model

Outcome

What changed

  • Checklist setup dropped from hours to minutes per checklist
  • Removed the single largest implementation barrier for new tenants
  • Repeatable pattern for embedding multimodal AI in existing config flows

What I'd improve next

Next bets

  • Confidence-prioritized review queue to focus admin attention
  • Feedback loop: capture review edits to improve few-shot prompts
  • Extend pattern to JSAs, permits, and audit checklists