Text-to-SQL · Security Architecture

Audit and Harden Text-to-SQL Pipelines

Defense-in-depth guide: schema grounding, query validation, read-only enforcement, row-level security, and observability. 180 citations across 7 sub-topics. The threat model is not crashes — it's silent plausible wrong answers.

97% of wrong queries
execute cleanly

180 citations 7 sub-topics 48 min read expedition depth Sonnet 4.6 · 2026-06-09 cost: $19.69

Defense Stack — Pipeline Order

↓

Schema Grounding expedition probabilistic

Stops Hallucinated tables, columns, and foreign-key joins

How RAG over schema embeddings · CHESS adaptive column pruning · function-call schema injection · LLM self-reflection correction pass

Finding Providing the LLM with the full schema adds noise and hallucinations; selective retrieval is crucial. ^[16] Schema linking is even more critical in the LLM era due to input-length limits. ^[–]

97.12% CHESS schema
recall rate 30.57% FPR ^[2]

↓

Query Validation survey deterministic

Stops DDL, disallowed statement types, unbound references, structural attacks

How Parse to AST (not regex) · SELECT-only allowlist · schema-binding gate · sqlglot ⭐ 9.3k (31 dialects) · evidence log with signed timestamps ^[5]

Warning Classical SQL injection WAFs degrade from 98% to 60% accuracy against LLM-generated queries. ^[1] Rule must be code-based, not another LLM call. ^[18]

60% WAF accuracy vs.
LLM-generated SQL down from 98% ^[1]

↓

Read-Only Enforcement survey physical

Stops All writes, DDL, DROP, DELETE, INSERT — regardless of query content or injection

How pg_read_all_data role (PG 14+) · read replica as physical barrier · schema pruning (omit sensitive columns from LLM prompt)

Caution default_transaction_read_only can be overridden in-session — role-level GRANT is the only true enforcement. ^[7] Deploy first: cheapest, most reliable boundary.

0 writes possible on
read replica structural impossibility ^[12]

↓

Row-Level Security + Column Masking recon deterministic

Stops Cross-tenant row reads, PII column exposure, WHERE-clause bypass attempts

How PostgreSQL ALTER TABLE … ENABLE ROW SECURITY · session variable policies · column-level GRANT SELECT (col) (revoke table-level first)

Gaps CVE-2024-10976 (RLS bypass below subqueries) · CVE-2025-8713 (statistics leaking RLS-hidden rows). ^[13] Non-owner roles and rigorous testing required.

2 active CVEs in
PostgreSQL RLS as of 2025 ^[13]

↓

Observability & Audit Logging recon detective

Role Only mechanism to detect silent correctness failures in production; proves compliance

How Structured query logs with NL question + generated SQL + result hash · signed-timestamp evidence records ^[5] · anomaly detection over query patterns

Gap Researched at recon depth only (7 citations, 2 min read) — dedicated expedition run recommended, especially for structured logging schemas and RLS policy testing frameworks.

7 citations gathered
depth: recon ⚠ needs deeper run

↓

Threat ✕ Layer Coverage

Attack / Failure Mode	Schema Grounding	Query Validation	Read-Only Role	RLS / Column	Observ- ability
Hallucinated table or column	✓ stops	⚠ partial	—	—	⚠ detects
DDL (DROP / ALTER / TRUNCATE)	—	✓ blocks	✓ blocks	—	—
SQL injection via NL input	—	⚠ partial	✓ limits	✓ limits	⚠ detects
Prompt injection (embedded SQL fragments)	✗ passes	⚠ partial	✓ limits	✓ limits	⚠ detects
ToxicSQL backdoor (training poison) ^[14]	✗ passes	⚠ partial	✓ limits	✓ limits	⚠ detects
Cross-tenant row read	—	—	—	✓ blocks	⚠ detects
Silent semantic correctness error	✗ passes	✗ passes	—	—	⚠ detects

Residual Risk — What Survives All Layers

⚠ Unverifiable: Semantic Correctness

A well-grounded, structurally valid query can silently aggregate on the wrong time window, join on the wrong key, or misinterpret an ambiguous column name. Crashes are not the threat model. ^[15]

The NL2SQL-Bugs benchmark found 75.16% average LLM detection accuracy across a 9-category semantic error taxonomy — and 6.91% of BIRD's own gold-standard benchmark contains previously undetected semantic errors. ^[3] Whether LLM self-grading of result plausibility is reliable enough for production quality gates remains the most consequential unresolved question in the text-to-SQL safety literature.

Key Tools & Libraries

sqlglot ⭐ 9.3k

Python SQL parser supporting 31 dialects with full AST traversal, transformation, and type inference. Recommended engine for query validation.

pgvector ⭐ 21k

PostgreSQL vector extension for ANN search. Supports HNSW and IVFFlat, 16k-dim vectors via halfvec, hybrid SQL+vector with WHERE filtering.

Vanna.AI ⭐ 23.5k

RAG text-to-SQL framework that embeds DDL, documentation, and example SQL queries, retrieving the 10 most relevant pieces per query for grounded prompts.

Semantic Search Augmentation

Augments the pipeline · Does not replace SQL

pgvector Semantic Row Search expedition augments

Role Grounds filter values and answers questions SQL cannot express compactly — extends, not replaces, structured queries

Pattern Hybrid SQL + ANN: WHERE category = $1 ORDER BY embedding <=> $2 LIMIT k · HNSW index with RRF fusion for keyword+semantic merging · iterative scans (pgvector 0.8+) prevent overfiltering

Tuning ef_construction ≥ 2×m, ef_search ≥ k — defaults are rarely appropriate at scale. HNSW needs to fit in RAM; above 5–10M vectors use DiskANN.

3× HNSW vs IVFFlat
at 0.99 accuracy Supabase bench ^[6]

Sub-Research Topics — 7 Pages · 180 Citations

Naive Text-to-SQL Failure Modes expedition

~97% of wrong SQL executes cleanly. A field guide to the failure modes and the hardening stack that contains them.

41 citations8 min

Schema Grounding Strategies expedition

The layered stack of schema linking, representation, retrieval, and execution feedback that grounds text-to-SQL in a real database.

42 citations11 min

Read-Only Enforcement survey

Purpose-built read-only roles, column/row filtering, read replicas as physical write boundaries for LLM-generated SQL.

25 citations8 min

Query Validation & Safety Gates survey

Five-layer pipeline: allowlists, AST structural checks, semantic schema binding, policy injection, and DB-level enforcement with evidence logging.

20 citations8 min

pgvector Semantic Row Search expedition

Nearest-neighbor row retrieval, index parameter tuning without wrecking recall, and grounding a text-to-SQL pipeline with hybrid search.

40 citations8 min

Observability & Audit Logging recon

Observability helps teams debug production; audit logging proves compliance and accountability.

7 citations2 min

Multi-tenant & Row-Level Security recon

PostgreSQL RLS is the most cost-efficient multi-tenant isolation pattern, but requires non-owner roles and rigorous testing to prevent data leaks.

5 citations3 min