Architecture
System design overview: NL2SQL engine, document processing, agent framework, and database schema.
System Overview
┌─────────────────────────────────────────────────────────┐
│ Next.js App Router │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Dashboard │ │ Query │ │ Documents │ │ Agents │ │
│ │ Pages │ │ Engine │ │ Upload │ │Marketplace│ │
│ └─────┬────┘ └─────┬────┘ └─────┬─────┘ └─────┬────┘ │
│ └─────────────┼───────────┼──────────────┘ │
│ ▼ ▼ │
│ API Routes (/api/*) |
| | | |
| +-----------------┼-----------┼------------------+ |
| | Service Layer | |
| | NL2SQL | Doc Processors | Agent RT | 3WM | |
| +-----------------┼-----------┼------------------+ |
| | | |
| +-----------------┼-----------┼------------------+ |
| | Connector Layer (DB + Storage) | |
| | Postgres | Mock | SharePoint | S3 | Email | |
| +-----------------┼-----------┼------------------+ |
+----------------------┼-----------┼-----------------------+
▼ ▼
+---------------------------------+
| Supabase PostgreSQL + pgvector |
| LLM Providers (Claude + GPT) |
+---------------------------------+NL2SQL Engine — 5-Layer Grep Architecture
The core NL2SQL engine uses a 5-layer pipeline that greps before it generates, achieving 95%+ accuracy vs 60-70% for pure LLM approaches.
User Query: "Which vendors overcharge the most?"
│
▼
┌──────────────────────────────────────────┐
│ Layer 0: QME (Query Memory Engine) │
│ Check if we've seen this query before. │
│ If cached with high confidence → skip. │
└──────────────────────────────────────────┘
│ (cache miss)
▼
┌──────────────────────────────────────────┐
│ Layer 1: RETRIEVAL │
│ • Extract keywords: "vendors","overcharge│
│ • Full-text search (pg_trgm on catalog) │
│ • Vector search (pgvector embeddings) │
│ • Output: Top 10 relevant tables/columns │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Layer 2: SEMANTIC INFERENCE │
│ • Map "vendors" → suppliers table │
│ • Map "overcharge" → price mismatch │
│ • Infer joins: invoices ← suppliers │
│ • Resolve via Entity Graph if needed │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Layer 3: SQL GENERATION (Multi-Path) │
│ Generate 3 candidate SQLs (Claude): │
│ Path A: Aggregation-first │
│ Path B: Join-first │
│ Path C: Subquery approach │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Layer 4: VALIDATION & RANKING │
│ • Safety check (no DDL/DML) │
│ • Syntax validation │
│ • EXPLAIN plan analysis │
│ • Rank: correctness > perf > simplicity │
│ • Select best SQL │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Layer 5: EXECUTION & REASONING │
│ • Execute via execute_readonly_query RPC │
│ (10s timeout, 500 row limit) │
│ • Generate NL explanation (Claude) │
│ • Store in Query Memory for learning │
│ • Return results + insights to user │
└──────────────────────────────────────────┘Key files:
src/lib/nl2sql/five-layer-engine.ts— Orchestratorsrc/lib/nl2sql/retrieval.ts— Layer 1 (grep + vector)src/lib/nl2sql/semantic-inference.ts— Layer 2src/lib/nl2sql/multi-path-generator.ts— Layer 3src/lib/nl2sql/sql-validator.ts— Layer 4src/lib/nl2sql/query-memory.ts— QME cache
3-Way Match — 7-Phase Pipeline
Hierarchical invoice matching designed to process every invoice path with auditability and review controls.
Phase 1: Metadata Extraction (Claude Sonnet)
→ Invoice number, date, vendor, totals
Phase 2: Line Item Extraction (GPT-4o Mini, batch)
→ Per-line: description, qty, price, SKU
Phase 3: PO Matching
→ Find matching purchase order by PO number
Phase 4: Receipt Matching
→ Match goods receipts / delivery confirmations
Phase 5: Discrepancy Detection
→ Qty, price, and amount mismatches flagged
Phase 6: Auto-Resolution
→ Tolerance rules, auto-approve small diffs
Phase 7: Human Review Queue
→ Escalate unresolvable discrepanciesKey file: src/lib/processors/three-way-match.ts
Document Processing Pipeline
Upload → SHA-256 Hash → Dedup Check
│ │
│ (duplicate) │ (new)
▼ ▼
Create Version Store in Supabase Storage
│ │
└────────────────────────┘
│
▼
Classify Document Type
(invoice/contract/PO/receipt)
│
▼
Route to Processor
(InvoiceProcessor / ContractProcessor)
│
▼
Extract Metadata + Store
Create document_versions recordAgent Runtime
12 pre-built agents + Custom Agent Builder. Each agent wraps the NL2SQL engine with domain context.
User Query → Agent Context Enrichment
│
▼
"[Agent: Invoice Intelligence]
[Context: Related queries: unpaid invoices, spend by vendor]
Show me overdue invoices"
│
▼
NL2SQL 5-Layer Engine (with agent's data sources)
│
▼
Apply Business Rules (IF/THEN)
│
▼
Return: answer + data + insights + alertsKey file: src/lib/services/agent-runtime.ts
Database Schema
85+ tables across 8 migration files. Key categories:
| Category | Tables | Key Tables |
|---|---|---|
| Core Platform | 15 | customers, users, workspaces, connections, api_keys, audit_log |
| Intelligence Modules | 24 | invoices, purchase_orders, goods_receipts, contracts, clauses |
| Document Management | 10 | documents, document_versions, document_sync_status, connector_configurations |
| Entity Graph | 8 | entities, entity_mappings, entity_relationships |
| NL2SQL Engine | 8 | query_history, query_patterns, catalog_assets, nl2sql_query_log |
| License Metering | 12 | licenses, license_modules, usage_counters, license_violations |
| AI Insights | 10 | insight_detection_jobs, detected_insights, insight_detection_runs |
| Data Governance | 15 | catalog_assets, lineage_edges, quality_monitors, data_profiles |
Connector Framework
Pluggable connectors for databases and document storage:
Database Connectors
- PostgreSQL — Full implementation
- Mock — Demo/testing
- Snowflake, BigQuery, Oracle, MySQL — Extensible framework
Document Connectors
- SharePoint — Microsoft Graph API
- S3 — AWS S3-compatible
- Email — Gmail / Exchange (OAuth2)
- Google Drive, Box, Dropbox — Extensible
LLM Strategy (Cost-Optimized Hybrid)
| Model | Use Case | Cost/1M tokens |
|---|---|---|
| Claude Sonnet 4 | Complex NL2SQL, contract analysis, semantic matching | $3.00 |
| Claude Haiku 3.5 | Simple matching, classification, exception detection | $0.25 |
| GPT-4o Mini | High-volume line item extraction, batch processing | $0.15 |
Target: 80–88% gross margins. Average cost: ~$0.45 per invoice processed.
Security Model
- Multi-tenancy: Row Level Security (RLS) on all tables. Every query is scoped by customer_id.
- BYOC: Platform runs in customer's infrastructure. No data leaves their network.
- Auth: Supabase Auth (JWT) + API key auth for programmatic access.
- Readonly queries: NL2SQL executes via
execute_readonly_queryRPC — blocks INSERT/UPDATE/DELETE/DDL. - Query limits: 10-second timeout, 500-row maximum per query.
- Secrets: All credentials are environment variables. Never stored in code or database.
- Storage: Document storage uses content-addressed hashing (SHA-256). RLS on storage buckets.