Epic Clarity — Agentic AI Layer
Purpose: Intelligence map of Epic Clarity’s data topology for teams building an agentic AI layer on top of clinical EHR data. Covers all 12 domains, agent use cases, investment wedges, build sequence, and architecture decisions.
Quick Orientation
| Chronicles | Clarity | Caboodle | |
|---|---|---|---|
| Type | Hierarchical MUMPS/Caché DB | Relational SQL (~30,000 tables) | Star schema EDW (~300 tables) |
| Purpose | Live clinical operations | Reporting & analytics | Dashboards & KPIs |
| Granularity | Full operational record | Record-level, full depth | Aggregated, curated subset |
| Freshness | Real-time | T-1 (nightly ETL) | T-1 (nightly ETL) |
| AI Surface | Not queryable directly | Primary agent surface | Secondary / faster reads |
| External data | No | No | Yes (claims, registries) |
The insight: Clarity’s complexity — 30,000 tables, ZC_ join dependencies, continuation tables — is the moat. Whoever builds schema intelligence on top of it first wins. The difficulty is the feature.
Critical Constraints — Read Before Building
These are not edge cases. Every one surfaces in the first month.
- T-1 freshness — All data is one day old. Agents must surface data-as-of timestamps on every output. Never use Clarity for real-time operational decisions.
- Date format — Dates stored as days since
12/31/1840in_REALcolumns (MUMPS heritage). All time-series logic requires conversion before use.- ZC_ dependency — Every categorical field (gender, race, order status, encounter type) is a numeric code requiring a
JOIN ZC_*for human-readable output. LLM-generated SQL that skips this produces unreadable garbage.- Continuation tables —
PATIENT,PATIENT_2,PATIENT_3are one logical record. Same forCLARITY_SER,ORDER_MED, others. Querying only the base table silently loses data.- LINE column pattern — One-to-many relationships use a
LINEcolumn (MUMPS inheritance). Appears in medications, flowsheets, diagnoses, charge lines. Agents that ignore this produce row-multiplied aggregations.- 30K table surface — RAG over the data dictionary is the only viable architecture. No LLM can navigate this at context scale. Build schema retrieval before writing any agent logic.
- Multi-tenant schema variation — Health systems customize their Clarity schemas. An agent built against one instance will silently fail against another. Tenant-aware schema isolation is the core technical moat.
Domain Map — 12 Subject Areas
01 · Patient Identity & Demographics
Key Tables: PATIENT · PATIENT_2 · PATIENT_3
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Identity | MRN, DOB, gender, race, ethnicity |
| Contact | Language, address, zip code |
| Linkage | Guarantor ID, coverage ID → payer |
| Status | Patient status flags, merge history |
Agent Use Cases
- Identity resolution and deduplication across source systems
- SDOH risk stratification using zip-level poverty index and language barriers
- Patient cohort builder for population health programs
- Outreach eligibility filtering by demographics and payer
02 · Encounters & Visits
Key Tables: PAT_ENC · PAT_ENC_2 · PAT_ENC_DX · HSP_ACCOUNT
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Core | Visit date/time, encounter type, department |
| Settings | Ambulatory · Inpatient · ED · Telehealth · Telephone · MyChart |
| Inpatient | LOS, discharge disposition, ADT events (admit-discharge-transfer) |
| Diagnosis | Encounter-level ICD-10 linkage |
Agent Use Cases
- Readmission prediction via encounter sequence modeling
- Care pathway anomaly detection across patient timelines
- No-show and cancellation propensity scoring
- ED-to-inpatient conversion analysis
- LOS benchmarking against cohort and DRG norms
03 · Clinical Documentation
Key Tables: PROBLEM_LIST · ALLERGY · IP_FLWSHT_REC · HNO_NOTE_TEXT · PAT_ENC_DX
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Diagnoses | ICD-10 codes, problem list, onset date |
| Notes | Progress notes, discharge summaries, surgical notes, consult notes — free text |
| Observations | Vital signs, flowsheet rows, measurements |
| History | Immunizations, social history, SDOH fields |
HNO_NOTE_TEXTis the highest-value unstructured table in Clarity. This is where the clinical story lives. Every NLP use case runs through here.
Agent Use Cases
- NLP / LLM extraction from note text — chief complaint, HPI, assessment, plan
- Diagnosis coding accuracy audit — ICD vs. free-text alignment
- Clinical decision support triggers from flowsheet deviations
- Automated prior auth documentation generation
- Allergy-drug interaction alerting
04 · Orders & Medications
Key Tables: ORDER_PROC · ORDER_MED · MAR_ADMIN_DOSES · RX_PHR_ORDER
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Orders | Order type, datetime, ordering provider, priority, status |
| Medications | Route, dose, frequency, administration record (MAR) |
| Pharmacy | Dispenses, refills, renewals, IV infusions |
| Protocol | Order set linkage, standing orders, protocol flags |
Agent Use Cases
- Polypharmacy risk detection across multi-drug regimens
- Medication adherence inference from refill gap analysis
- Order set optimization — co-ordered items correlated against outcomes
- High-cost drug utilization monitoring
- Formulary compliance enforcement and alerting
05 · Labs & Results
Key Tables: ORDER_RESULTS · LNS_RESULT · LNS_SPECIMEN · CLARITY_COMPONENT
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Results | Value, unit, reference range, abnormal flag, status |
| Specimen | Collection datetime, specimen type, accession number |
| Categories | Chemistry · Hematology · Microbiology / cultures · Pathology |
| POC | Point-of-care and rapid testing |
Agent Use Cases
- Trend-based early warning from deteriorating lab trajectories over time
- Sepsis, AKI, and deterioration alerting from multi-lab sequences
- Critical value notification gap detection
- Lab utilization and redundancy reduction by ordering pattern
- Research cohort identification by lab phenotype
06 · Imaging & Radiology
Key Tables: ORDER_PROC (rad) · RAD_EXAM · CLARITY_PROC · Cupid (cardiology)
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Orders | CPT codes, modality, ordering indication, priority |
| Results | Report text (free text), read datetime, radiologist |
| Modalities | X-ray · CT · MRI · Ultrasound · Echo · Cath · Nuclear |
| Procedures | Interventional, procedural imaging |
Agent Use Cases
- LLM extraction from radiology report text — incidental finding triage
- Imaging appropriateness scoring against clinical guidelines
- Repeat imaging detection and cost flagging
- Order-to-read turnaround time SLA monitoring
07 · Scheduling & Access
Key Tables: PAT_ENC (appt view) · CLARITY_DEP · CLARITY_SER_DEPT
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Appointments | Datetime, type, status, slot, provider, location |
| Outcomes | No-show flag, cancellation reason, reschedule history |
| Access | Wait time, referral source, waitlist position |
| Type | New patient · Follow-up · Procedure · Urgent · Waitlist |
Agent Use Cases
- Intelligent slot fill and open access optimization
- No-show prediction driving proactive outreach
- Referral leakage detection — did the patient follow through?
- Access equity analysis by zip code, race, and payer
- Demand forecasting for capacity planning by service line
08 · Billing & Revenue Cycle
Key Tables: HSP_ACCOUNT · ARPB_TRANSACTIONS · CLARITY_EDI · COVERAGE · ACCOUNT
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Charges | CPT / DRG codes, charge amount, charge drop timing |
| Claims | Payer, claim status, denial reason, remittance |
| AR | AR aging, patient balance, co-pay, write-offs |
| Auth | Authorization records, referral linkage |
Agent Use Cases
- Denial prediction before claim submission — intercept at charge entry
- Undercoding / upcoding audit — ICD vs. charges alignment
- Prior authorization automation from clinical documentation
- Revenue leakage detection — charges not dropped post-procedure
- Payer contract performance analysis and renegotiation intelligence
09 · Operational & Capacity
Key Tables: ADT_ARRIVAL · ADT_DAYS · OT_CASE_RECORD · BED_CENSUS
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Beds | Status, unit, service line, census by hour |
| OR | Case start/end, surgeon, turnover time, cancellation reason |
| Flow | Patient transport, transfer events, escalation flags |
| Staffing | Assignments, coverage gaps, float pool |
Agent Use Cases
- Real-time bed demand prediction and proactive discharge planning
- OR block utilization optimization — recover unused time
- LOS outlier detection triggering case management workflow
- Patient flow bottleneck analysis — ED through inpatient
- Staffing demand-supply mismatch alerting by unit and shift
10 · Providers & Workforce
Key Tables: CLARITY_SER · CLARITY_SER_2 · CLARITY_EMP · SER_DEPT_ATTND
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Identity | NPI, specialty, department, credentials, DEA |
| Attribution | Ordering, attending, referring, and cosigning linkage |
| Care team | Composition, role, shift, assignment |
| Activity | User access logs, documentation timestamps, after-hours patterns |
Agent Use Cases
- Provider performance benchmarking across outcomes, utilization, and coding
- Referral pattern analysis and out-of-network leakage detection
- Care team attribution for value-based contract performance
- Clinical variation analysis by provider and service line
- Burnout proxy signals from documentation burden and after-hours login patterns
11 · Population Health & Quality
Key Tables: REGISTRY_PATIENT · QM_RESULT · CARE_PLAN · HEALTHY_PLANET_*
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Measures | HEDIS, CMS Stars, MIPS, MACRA result values |
| Gaps | Care gap flags, measure numerator/denominator status |
| Risk | Patient risk scores, registry membership, tier |
| SDOH | Social risk factors, community resource linkage |
Agent Use Cases
- Automated care gap closure via outreach agent
- Risk stratification driving proactive intervention prioritization
- Quality measure trajectory forecasting for month-end performance
- SDOH-adjusted outcome benchmarking for equity reporting
- Value-based contract P&L attribution by attributed population
12 · System & Configuration (Foundation Layer)
Key Tables: ZC_* (all category master files) · CLARITY_DEP · CLARITY_POS
Fields & Subtypes
| Field Group | Detail |
|---|---|
| Code maps | Numeric code → display value for every categorical field |
| Structure | Department hierarchy, facility structure, POS codes |
| Security | User roles, access levels, build configuration |
| Metadata | Epic version, module activation, local customizations |
Every domain query depends on this layer. ZC_ joins are a precondition for human-readable output across all 11 other domains. Build the ZC_ resolver before writing any agent logic.
Agent Use Cases
- Schema-aware SQL generation — LLM grounding on ZC_ maps is mandatory
- Data dictionary embedding for RAG-based analytics assistant
- Role-based query scoping in the agent access control layer
- Cross-domain lineage mapping for compliance audit trails
Agent Architecture — How the Layer Stacks
USER / WORKFLOW TRIGGER
Natural language query · scheduled job · API event · alert threshold
↓
[ 01 ] ORCHESTRATION LAYER
Intent classification
Domain router → selects one of 12 Clarity domains
ReAct planning loop → multi-step decomposition
Memory / state management across turns
↓ ↓
[ 02a ] NLP / LLM MODULE [ 02b ] SQL GENERATION MODULE
Free-text extraction Schema RAG over data dictionary
HNO_NOTE_TEXT parsing ZC_ map grounding (mandatory)
Summarisation Domain-scoped query builder
Structured field population Query validation + explain plan
↓ ↓
[ 03 ] CLARITY DATABASE LAYER
~30,000 SQL tables · 12 domains
Nightly ETL · T-1 freshness
Domain-scoped views recommended over raw tables
↓
[ 04 ] OUTPUT LAYER
Insight cards · alerts · triggers
Draft documents (prior auth, appeal letters)
Downstream API calls (RCM system, payer portal, care management)
Data-as-of timestamp on every output ← non-negotiable
The domain router is the most consequential design decision. Intent misclassification routes a billing query to clinical tables and produces numerically plausible wrong answers — which is worse than an obvious error. The router must be purpose-built, not a general-purpose prompt.
High-Value Wedges — VC / PE Lens
Ranked by combined score: willingness to pay + time-to-ROI + structural moat.
| Domain | Agent Wedge | TAM Signal | Moat | Sharp Insight |
|---|---|---|---|---|
| Billing / RCM | Denial prediction + auto-appeal | $20B+ RCM market | Payer-specific model training | CFO-level pain. Denial rate is a visible P&L line. Fastest path to a signed contract. |
| Scheduling | No-show prediction + slot fill | Access crisis is acute | Cross-system network effect | Every unfilled slot is quantifiable lost revenue. Easy to instrument ROI from day one. |
| Clinical Docs | Prior auth automation via NLP | $35B admin waste annually | Fine-tuned on Epic note schemas | HNO_NOTE_TEXT is the crown jewel. Unstructured data no one else has modeled well. |
| Labs | Deterioration early warning | Patient safety liability | Outcome-labeled training data | Sepsis and AKI are the two lead use cases. Direct patient harm reduction narrative. |
| Population Health | Care gap closure + VBC attribution | VBC contract penetration growing | Measure-specific logic + payer rules | HEDIS and CMS Stars drive the ROI. Requires deepest Epic integration to win. |
| OR / Capacity | Block utilization optimizer | $150B surgical market | OR data is notoriously siloed | 30% of block time goes unused industry-wide. Direct surgical margin impact. |
| Provider Analytics | Clinical variation + benchmarking | MIPS / value-based pressure | Multi-system benchmarking data | Requires multi-system data to build the comparison set — the network is the moat. |
Architecture Decisions to Lock Early
These five decisions have the highest cost if deferred. Each becomes exponentially harder to change once agent logic is built on top of it.
01 · Schema Retrieval Strategy
Vector store over the Clarity data dictionary is the only viable architecture at 30,000 tables. The embedding corpus must include table names, column names, descriptions, ZC_ join dependencies, and sample values. Build this before writing a single agent. Everything else depends on it.
02 · Domain Router Design
Fine-tune a small classifier on domain-labeled Clarity queries. Do not rely on a general LLM’s prompt-following for routing. Routing errors produce plausible wrong answers with no visible error signal — the worst failure mode in an analytics agent.
03 · ZC_ Grounding Layer
Build a static ZC_ resolver that maps every categorical field to its human-readable lookup at query construction time. Bake it into the SQL generation layer as a mandatory post-processing step. Without it, every categorical output is a numeric code that users cannot interpret.
04 · Data Freshness Contract
Every agent output must carry a data_as_of timestamp. Build this into the output schema from day one — not as a UI afterthought. Clinical users who do not understand T-1 lag will use agent outputs in operational contexts where stale data causes patient harm.
05 · Multi-Tenant Schema Isolation
Design the vector store and SQL generation module to be tenant-aware from the start. Health systems customize their Clarity schemas. An agent built against one instance silently fails against another. This is your core technical moat if you scale across health systems — and your deepest risk if you ignore it.
Epic Clarity — Agentic AI Layer · Internal Brainstorm · Confidential