← Default view

Text-to-SQL · Security Architecture

Audit and Harden Text-to-SQL Pipelines

Defense-in-depth guide: schema grounding, query validation, read-only enforcement, row-level security, and observability. 180 citations across 7 sub-topics. The threat model is not crashes — it's silent plausible wrong answers.

97% of wrong queries
execute cleanly
180 citations 7 sub-topics 48 min read expedition depth Sonnet 4.6 · 2026-06-09 cost: $19.69

Defense Stack — Pipeline Order

▶ NL INPUT   "Show total revenue by customer for Q1 of last year"
L1

Schema Grounding expedition probabilistic

Stops Hallucinated tables, columns, and foreign-key joins
How RAG over schema embeddings · CHESS adaptive column pruning · function-call schema injection · LLM self-reflection correction pass
Finding Providing the LLM with the full schema adds noise and hallucinations; selective retrieval is crucial. [16] Schema linking is even more critical in the LLM era due to input-length limits. [–]
97.12% CHESS schema
recall rate
30.57% FPR [2]
L2

Query Validation survey deterministic

Stops DDL, disallowed statement types, unbound references, structural attacks
How Parse to AST (not regex) · SELECT-only allowlist · schema-binding gate · sqlglot ⭐ 9.3k (31 dialects) · evidence log with signed timestamps [5]
Warning Classical SQL injection WAFs degrade from 98% to 60% accuracy against LLM-generated queries. [1] Rule must be code-based, not another LLM call. [18]
60% WAF accuracy vs.
LLM-generated SQL
down from 98% [1]
L3

Read-Only Enforcement survey physical

Stops All writes, DDL, DROP, DELETE, INSERT — regardless of query content or injection
How pg_read_all_data role (PG 14+) · read replica as physical barrier · schema pruning (omit sensitive columns from LLM prompt)
Caution default_transaction_read_only can be overridden in-session — role-level GRANT is the only true enforcement. [7] Deploy first: cheapest, most reliable boundary.
0 writes possible on
read replica
structural impossibility [12]
L4

Row-Level Security + Column Masking recon deterministic

Stops Cross-tenant row reads, PII column exposure, WHERE-clause bypass attempts
How PostgreSQL ALTER TABLE … ENABLE ROW SECURITY · session variable policies · column-level GRANT SELECT (col) (revoke table-level first)
Gaps CVE-2024-10976 (RLS bypass below subqueries) · CVE-2025-8713 (statistics leaking RLS-hidden rows). [13] Non-owner roles and rigorous testing required.
2 active CVEs in
PostgreSQL RLS
as of 2025 [13]
L5

Observability & Audit Logging recon detective

Role Only mechanism to detect silent correctness failures in production; proves compliance
How Structured query logs with NL question + generated SQL + result hash · signed-timestamp evidence records [5] · anomaly detection over query patterns
Gap Researched at recon depth only (7 citations, 2 min read) — dedicated expedition run recommended, especially for structured logging schemas and RLS policy testing frameworks.
7 citations gathered
depth: recon
⚠ needs deeper run
⬛ DATABASE   PostgreSQL — read-only role active · RLS policies enforced · audit log streaming

Threat ✕ Layer Coverage

Attack / Failure Mode Schema
Grounding
Query
Validation
Read-Only
Role
RLS /
Column
Observ-
ability
Hallucinated table or column ✓ stops ⚠ partial ⚠ detects
DDL (DROP / ALTER / TRUNCATE) ✓ blocks ✓ blocks
SQL injection via NL input ⚠ partial ✓ limits ✓ limits ⚠ detects
Prompt injection (embedded SQL fragments) ✗ passes ⚠ partial ✓ limits ✓ limits ⚠ detects
ToxicSQL backdoor (training poison) [14] ✗ passes ⚠ partial ✓ limits ✓ limits ⚠ detects
Cross-tenant row read ✓ blocks ⚠ detects
Silent semantic correctness error ✗ passes ✗ passes ⚠ detects

Residual Risk — What Survives All Layers

⚠ Unverifiable: Semantic Correctness

A well-grounded, structurally valid query can silently aggregate on the wrong time window, join on the wrong key, or misinterpret an ambiguous column name. Crashes are not the threat model. [15]

The NL2SQL-Bugs benchmark found 75.16% average LLM detection accuracy across a 9-category semantic error taxonomy — and 6.91% of BIRD's own gold-standard benchmark contains previously undetected semantic errors. [3] Whether LLM self-grading of result plausibility is reliable enough for production quality gates remains the most consequential unresolved question in the text-to-SQL safety literature.

Key Tools & Libraries

sqlglot on GitHub
sqlglot ⭐ 9.3k

Python SQL parser supporting 31 dialects with full AST traversal, transformation, and type inference. Recommended engine for query validation.

pgvector on GitHub
pgvector ⭐ 21k

PostgreSQL vector extension for ANN search. Supports HNSW and IVFFlat, 16k-dim vectors via halfvec, hybrid SQL+vector with WHERE filtering.

Vanna.AI on GitHub
Vanna.AI ⭐ 23.5k

RAG text-to-SQL framework that embeds DDL, documentation, and example SQL queries, retrieving the 10 most relevant pieces per query for grounded prompts.

Semantic Search Augmentation

Augments the pipeline · Does not replace SQL

pgvector Semantic Row Search expedition augments

Role Grounds filter values and answers questions SQL cannot express compactly — extends, not replaces, structured queries
Pattern Hybrid SQL + ANN: WHERE category = $1 ORDER BY embedding <=> $2 LIMIT k · HNSW index with RRF fusion for keyword+semantic merging · iterative scans (pgvector 0.8+) prevent overfiltering
Tuning ef_construction ≥ 2×m, ef_search ≥ k — defaults are rarely appropriate at scale. HNSW needs to fit in RAM; above 5–10M vectors use DiskANN.
HNSW vs IVFFlat
at 0.99 accuracy
Supabase bench [6]

Sub-Research Topics — 7 Pages · 180 Citations

Naive Text-to-SQL Failure Modes expedition

~97% of wrong SQL executes cleanly. A field guide to the failure modes and the hardening stack that contains them.

41 citations8 min
Schema Grounding Strategies expedition

The layered stack of schema linking, representation, retrieval, and execution feedback that grounds text-to-SQL in a real database.

42 citations11 min
Read-Only Enforcement survey

Purpose-built read-only roles, column/row filtering, read replicas as physical write boundaries for LLM-generated SQL.

25 citations8 min
Query Validation & Safety Gates survey

Five-layer pipeline: allowlists, AST structural checks, semantic schema binding, policy injection, and DB-level enforcement with evidence logging.

20 citations8 min
pgvector Semantic Row Search expedition

Nearest-neighbor row retrieval, index parameter tuning without wrecking recall, and grounding a text-to-SQL pipeline with hybrid search.

40 citations8 min
Observability & Audit Logging recon

Observability helps teams debug production; audit logging proves compliance and accountability.

7 citations2 min
Multi-tenant & Row-Level Security recon

PostgreSQL RLS is the most cost-efficient multi-tenant isolation pattern, but requires non-owner roles and rigorous testing to prevent data leaks.

5 citations3 min