8 weeks. 200+ agents. 94% extraction accuracy. Zero data leakage incidents. How codetoday.io automated 20 manual clinical workflows for a mid-size pharma company — with full HIPAA compliance baked in from day one.
Our client — a mid-size pharmaceutical company running drug discovery and clinical trials — had 20 separate manual workflows for extracting structured data from research documents: trial reports, lab results, regulatory filings, and clinical notes. Twelve full-time employees were spending 60% of their working hours on copy-paste extraction tasks.
The pain was compounding: document volume was growing 40% year-over-year as the pipeline expanded, while headcount was frozen. The team was falling further behind each quarter. Manual errors in extraction were also creating downstream issues in regulatory submissions, requiring expensive re-review cycles.
The constraint: every solution had to be HIPAA-compliant, with full audit trail, no data leaving the AWS environment, and documented evidence for SOC2 Type II review. Their previous attempt at automation using a SaaS NLP vendor had stalled on compliance review for 11 months.
The full architecture runs entirely within the client's AWS VPC. No data egresses to external services. All model inference runs through Amazon Bedrock with VPC endpoints — no public internet traffic.
┌─────────────────────────────────────────────────────────────────┐
│ DOCUMENT INGESTION │
│ S3 Bucket (encrypted, VPC-only) ──► Bedrock Knowledge Base │
└────────────────────────┬────────────────────────────────────────┘
│ Retrieval
┌────────────────────────▼────────────────────────────────────────┐
│ ORCHESTRATOR AGENT (Claude 3.5 Sonnet via Bedrock) │
│ - Classifies document type │
│ - Routes to appropriate sub-agent │
│ - Tracks extraction confidence │
└──┬──────────┬──────────┬──────────┬──────────┬─────────────────┘
│ │ │ │ │
┌──▼──┐ ┌───▼───┐ ┌───▼───┐ ┌──▼───┐ ┌───▼──────────────┐
│Doc │ │Regex │ │Clin- │ │Valid-│ │Audit │
│Class│ │Extract│ │NLP │ │ation │ │Logger │
│-ify │ │-or │ │Agent │ │Agent │ │(CloudTrail+DDB) │
└──┬──┘ └───┬───┘ └───┬───┘ └──┬───┘ └───────────────────┘
└─────────┴──────────┴─────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ DynamoDB — Agent State + Extraction Results │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ REPORT GENERATOR AGENT │
│ - Assembles structured output │
│ - Confidence scoring │
│ - Routes <85% confidence to human review queue │
└────────────────────────┬────────────────────────────────────────┘
│
┌───────▼──────────┐
│ S3 Output Bucket │
│ (structured JSON) │
└──────────────────┘
Step Functions Express Workflows orchestrate the entire pipeline.
All agent-to-agent calls are synchronous invocations via Bedrock.
We chose Step Functions Express Workflows as the backbone — not direct agent-to-agent chaining. This gave us full execution history in the AWS console, automatic retries on transient Bedrock API errors, and per-execution cost visibility. The 50ms overhead per step is trivially worth the operational benefit in a regulated environment.
Configure Bedrock Guardrails PII redaction from day one — retrofitting it after the Knowledge Base is populated requires a full re-ingestion. We learned this the hard way during a pre-production audit where a test document containing synthetic patient IDs was indexed without redaction. Budget an extra sprint for compliance hardening.
| Metric | Before | After |
|---|---|---|
| Extraction accuracy | 87% (manual, spot-checked) | 94% (automated, every document) |
| Documents processed/day | ~200 (12 FTE × 60%) | 3,000+ (fully automated) |
| Average extraction time | 8–25 minutes per document | 4.7 seconds per document |
| FTE hours on extraction | ~340 hrs/week | ~28 hrs/week (review queue only) |
| Annual labor cost (extraction) | ~$420K | ~$80K (review + oversight) |
| Regulatory re-review cycles | 14 per quarter | 2 per quarter |
| HIPAA audit readiness | Manual evidence collection, 3 weeks | Automated, continuous, <1 day |
We'd been blocked on AI adoption by compliance for nearly a year. Every vendor we evaluated either couldn't meet HIPAA requirements or couldn't demonstrate the audit trail our legal team needed. codetoday.io came in, understood the regulatory constraints immediately, and built the guardrails architecture first — before writing a single line of agent code. The result is a production system our compliance team actually trusts. The guardrails architecture alone was worth the entire engagement fee.
— CTO, Mid-Size Pharmaceutical Company (name withheld per NDA)
We'll review your use case, compliance requirements, and architecture options in a free 30-minute call. No sales pitch — just engineering honesty.
Book a Free AI Readiness Audit