Atlas survey

Query Validation and Safety Gates for Text-to-SQL Pipelines

A five-layer defense-in-depth pipeline for validating LLM-generated SQL — from statement allowlists and AST structural checks through semantic schema binding, policy injection, and database-level enforcement — with evidence logging and emerging threat coverage.

20 sources ~8 min read #209 sql · security · llm · text-to-sql · validation · safety · rag

TL;DR — Regex blocklists are bypassed by comment wrapping and Unicode tricks; a single validation layer never covers all failure classes. Chain five deterministic layers: statement-type allowlist → AST structural check → semantic schema binding → policy injection → read-only DB connection. Add a SQL Compile Gate (PARSEONLY / EXPLAIN) before execution and cryptographically-signed audit logs for SOC 2 / HIPAA. All validation must run after LLM generation, in code — never delegated to a second LLM call [1] [3].


Why Single-Layer Validation Fails

Regex keyword blocklists are the most common first approach — and the most commonly bypassed. Classical SQL injection detection techniques drop from ~98% accuracy on standard benchmarks to 60% against LLM-generated queries [1], because comment-wrapped commands (/* DROP */ TABLE users), Unicode homoglyphs, and semicolon-delimited multi-statement sequences all evade pattern matching.

A 50,000-query production evaluation found a more fundamental problem: most broken LLM queries execute successfully and return data [2]. The failure distribution:

Failure class Description Caught by
Schema hallucination Column/table exists in training data, not in this DB Semantic binding
Wrong join path Fabricated FK relationships between unrelated tables Semantic binding
Missing required filter Omits status='active', tenant_id, date range Policy injection
Wrong table selection payments instead of revenue_recognition Semantic binding
Unsafe statement type DELETE, DROP, ALTER generated from ambiguous intent Statement allowlist
Syntax errors Unbalanced parens, invalid keywords Parser / Compile Gate

Source: [2]

Syntax errors — the only class reliably caught by a basic parser — are the least common. Intent errors masquerading as working queries are the dominant risk [2]. Each layer below targets a distinct failure class.


The Five-Layer Validation Pipeline

Layer 0 — Intent Classification (Pre-SQL)

Before generating SQL at all, classify whether the user’s message warrants a database query. Off-topic, ambiguous, or malformed inputs should be refused early [3].

The LatentRefusal mechanism [4] pushes this further: a lightweight probe on the LLM’s intermediate hidden activations intercepts unanswerable or schema-mismatched queries before any SQL tokens are emitted. Its Tri-Residual Gated Encoder suppresses schema noise and amplifies question-schema mismatch signals, adding ~2ms overhead while achieving 88.5% refusal F1 across four benchmarks [4]. The complementary “Query Carefully” [18] approach detects ambiguous queries at parse time before execution. Both gates are single-pass, pre-generation, and execution-free.

Layer 1 — Statement-Type Allowlist

Parse the generated SQL and enforce a default-deny policy on statement types [5]:

  • Block: DROP, TRUNCATE, ALTER, GRANT, REVOKE, CREATE, INSERT, UPDATE, DELETE
  • Block: multi-statement queries (semicolon-delimited sequences)
  • Block: stored procedure calls, EXEC, dynamic SQL constructors
  • Allow: SELECT only (for read-only agents)

Regex is insufficient here because of the bypass patterns above. Use a real parser: sqlglot ⭐ 9.3k (Jun 2026) traverses the AST to identify top-level statement type programmatically [6].

Layer 2 — AST Structural Validation

Traverse the parse tree to validate structure against what this user is authorized to see [7]:

  • Table allowlist: every FROM / JOIN reference must match the user’s role-scoped table set
  • Column allowlist: every column in SELECT, WHERE, JOIN ON, subqueries, and CTEs must be in the user’s per-role column list — catches hallucinated fields immediately
  • Sensitive column blocklist: columns tagged PII / financial are rejected at projection level, even when buried in CTEs or derived expressions [8]
# Minimal example: block forbidden types, check allowed tables via sqlglot AST
import sqlglot

ALLOWED_TABLES = {"orders", "products", "customers"}
FORBIDDEN_TYPES = {
    sqlglot.exp.Drop, sqlglot.exp.Delete,
    sqlglot.exp.Update, sqlglot.exp.Insert,
}

def validate(sql: str) -> tuple[bool, str]:
    parsed = sqlglot.parse_one(sql)
    if type(parsed) in FORBIDDEN_TYPES:
        return False, f"Forbidden statement type: {type(parsed).__name__}"
    tables = {t.name.lower() for t in parsed.find_all(sqlglot.exp.Table)}
    if not tables.issubset(ALLOWED_TABLES):
        blocked = tables - ALLOWED_TABLES
        return False, f"Unauthorized tables: {blocked}"
    return True, "ok"

sqlglot ⭐ 9.3k handles 31 SQL dialects and supports AST mutation via transform() [6]. For simpler tokenization, sqlparse ⭐ 3.9k is lighter but is explicitly a non-validating parser that will miss edge-case constructs [9]. For PostgreSQL-native pipelines, pg_query wraps libpg_query for exact PG parse trees [19].

Layer 3 — Semantic / Schema-Binding Validation

The AST check validates shape; semantic validation validates meaning against the real schema [10]. A catalog-aware validator resolves every reference against live database metadata:

Semantic check Error code What it catches
Column resolution UNKNOWN_COLUMN Hallucinated fields not in current schema
Ambiguity detection AMBIGUOUS_COLUMN Unqualified names present in multiple joined tables
Type compatibility TYPE_MISMATCH Function args with wrong types, dialect mismatches
Required filter enforcement MISSING_REQUIRED_FILTER Missing tenant_id, status, date range
Join predicate check CARTESIAN_JOIN Many-to-many cross joins without predicates
Sensitive field exposure RESTRICTED_COLUMN PII in projections, filters, or expressions

Source: [10]

The validator returns structured codes — allow, deny, warn, or repair — with repair hints the LLM can safely act on [10].

SQL Compile Gate — engine-native dry-run before execution [7] [11]:

  • SQL Server: SET PARSEONLY ON (parse phase only) or SET SHOWPLAN_XML ON (compile + plan, no execute) — validates column existence and user permissions without running the query
  • PostgreSQL / BigQuery / Snowflake: EXPLAIN — rejects insane cost estimates and full scans on large tables
  • Engine-native validation catches edge cases (dialect-specific syntax, permission grants) that custom parsers miss

Layer 4 — Policy Injection

Do not trust the LLM to consistently include required business-logic filters. Mechanically rewrite the AST to inject non-negotiable constraints before execution [7]:

  • Tenant isolation: append WHERE tenant_id = :ctx.tenant to every top-level query
  • Time-window filters: inject AND created_at >= :ctx.start_date
  • Status filters: add AND status = 'active' where schema requires it

Policies are evaluated against the parsed AST, never against the raw prompt text — a regex-matched tenant_id in the user’s message is not the same as an enforced predicate in the generated SQL [5]. sqlglot’s transform() API enables programmatic AST mutation that survives alias and CTE resolution [6].

Layer 5 — Database-Level Enforcement

Application layers 1–4 are the primary controls. Layer 5 is the backstop if any prior layer has a bug [12]:

  • Read-only DB user: GRANT SELECT only — cannot execute INSERT, UPDATE, DELETE regardless of LLM output
  • Row-Level Security: database-enforced tenant isolation regardless of query structure (PostgreSQL RLS, SQL Server RLS) [17]
  • Read replica: route LLM queries to an analytics replica; production writes are physically isolated [3]
  • Result row cap: LIMIT 1000 prevents resource exhaustion from missing LIMIT clauses [3]

RLS limitations — understand the ceiling [5] [13]: RLS cannot enforce column-level redaction on SELECT *, cannot block DDL statements when the user has valid grants, and has exploitable CVEs — CVE-2024-10976 (RLS bypass below subqueries in PostgreSQL) and CVE-2025-8713 (optimizer statistics leaking RLS-hidden rows) [13]. RLS is defense-in-depth, not the primary control.


Evidence Logging

Every validation decision — accepted or rejected — must produce a cryptographically-signed, timestamped record [5]:

  • User identity and role context
  • Natural language input
  • Generated SQL (as-generated, before policy injection)
  • Policy decisions, violation codes, and repair attempts
  • Final executed SQL and result row count
  • Which layers accepted / rejected the query

Required for SOC 2, HIPAA, and forensic analysis. Log the LLM prompt alongside the SQL so backdoor attacks can be traced to their trigger inputs [16].


Emerging Threats

Backdoor Attacks (ToxicSQL)

The ToxicSQL framework [14] (SIGMOD 2026 [15]) shows that poisoning 0.44% of training data achieves a 79.41% attack success rate for embedding backdoors that generate predefined malicious SQL on trigger inputs while maintaining normal behavior on clean queries [14]. Stealthy semantic and character-level triggers make poisoned samples hard to detect in training pipelines.

⚠ Implication: fine-tuned models are not inherently safer than base models. Application-layer validation remains mandatory regardless of model provenance or fine-tuning source.

Repair Loop Escapes

When generated SQL fails validation, returning the error to the LLM for repair can produce queries that remove safety filters to avoid the error. Mitigations [8]:

  • Re-validate every repaired query with identical strictness — no exceptions for repair attempts
  • Treat policy violations and syntax errors as separate error classes: return policy violations to the application logic, not to the LLM
  • Cap repair iterations (2–3 max); escalate to human review if retries fail

Schema Curation as Proactive Reduction

Validation is easier if the LLM cannot reference unauthorized objects in the first place [7]:

  • Strip sensitive columns from DDL in the system prompt — email, hashed_password, ssn never appear in schema context [12]
  • Use a vector index of table definitions tagged by access role; the model only receives schema for tables the user’s role can see [7]
  • Never pass production row data to the LLM — only format specifications and low-cardinality vocabularies

Schema curation reduces obvious hallucinations but does not replace post-generation validation: LLMs still produce probabilistic output, and that output still requires all five layers.


Tools Reference

Library AST support Dialects Stars Best for
sqlglot Full 31 dialects ⭐ 9.3k Cross-dialect traversal, mutation, type inference
sqlparse Partial DB-agnostic ⭐ 3.9k Lightweight tokenization; non-validating
pg_query Full (PG) PostgreSQL only Exact PG parse tree via libpg_query
sqlfluff No Linting only SQL style / dialect linting, not safety gates

Implementation: Crawl → Walk → Run

Start constrained and expand based on observed patterns — the use cases you don’t anticipate are the ones that will expose you [3] [20]:

  1. Crawl — expose 2–3 specific business queries, hard-code allowed tables/columns, manually review generated SQL in staging for 2 weeks
  2. Walk — expand to a role-scoped schema subset, add automated AST validation and semantic binding, enable audit logging
  3. Run — roll out across roles with monitoring; add the SQL Compile Gate and anomaly alerts on the audit log stream

Do not skip the Crawl phase. A narrow, manually-reviewed allow-set is far safer than a broad schema with “should-be-fine” LLM constraints.

Citations · 20 sources

Click the Citations tab to load…