Session arc & live-coding scaffold

TL;DR Open with a ~40-line working RAG demo in 5 minutes [1], then break it with real production demands — each failure becomes the next live-coding step. Script the repo as step-00 through step-05 git branches so anyone who falls behind does git checkout step-0N and rejoins instantly. Pre-script every keystroke with Demo Time [2] to survive stage typos.

Session arc

The 90-minute structure follows the hook → crack → fix loop validated by Packt’s 6-hour production RAG workshop [3] and PyCon US 2026’s 3.5-hour tutorial [4]. For a 45-minute slot: drop segments 6–7 and compress 4+5 into one “retrieval upgrades” block (15 min).

#	Segment	Min	Goal
0	Setup	5	Audience clones starter repo; speaker defines RAG in one sentence
1	Minimal demo	10	~40 lines live-coded (Qdrant + FastEmbed + LLM); query returns an answer; audience confident
2	Reality check	5	Show failing test or prod log: “80% of RAG failures trace to ingestion, not the LLM” [5]
3	Chunking	15	Fixed splits → recursive/semantic; 60–70% retrieval accuracy gain [6]
4	Query transform	10	HyDE expansion; “20–40% precision boost for one extra LLM call” [6]
5	Hybrid + rerank	15	BM25 + vector fusion + cross-encoder; “single biggest quality improvement” [5]
6	Observability	10	Chunk-level trace dict; per-chunk source attribution (drop first if overtime)
7	Guardrail	10	Confidence gate at 0.65; “prevent confident hallucinations from irrelevant context” [7]
8	Q&A	10	Open floor

Micro-arc per segment

Each segment repeats the same three-beat pattern: show the failure → add the fix → run one query and let the output speak. Presenting the solution before the audience has felt the failure raises extraneous cognitive load with no benefit [8]. Display a running architecture diagram at the start of each segment, highlighting the newly added component; after five additions it is the production system.

Live-coding scaffold

Git branch structure

Workshops at AI Coding Summit 2026 with the strongest participant ratings [9] used the same recovery pattern: participants clone the starter repo, speaker codes forward, anyone who falls behind does git checkout step-0N and rejoins.

Branch	Contains
`step-00-start`	`rag.py` skeleton with `# TODO` stubs; `requirements.txt`; `.env.example`
`step-01-minimal`	~40-line naive RAG: Qdrant + FastEmbed [1]
`step-02-chunking`	`RecursiveCharacterTextSplitter` → semantic chunker [10]
`step-03-hybrid`	BM25 + vector fusion (α param); Cohere Rerank 3.5 [11]
`step-04-query`	HyDE query expansion via one extra LLM call [6]
`step-05-prod`	Observability trace dict; confidence gate; incremental ingestion loop [5] [12]

Each branch is self-contained: installs cleanly, all tests pass, running python rag.py "What is chunking?" returns a valid answer without additional setup. That is the recovery guarantee.

Starter repo layout

rag-workshop/
├── rag.py                  # pipeline file (grows each step)
├── requirements.txt        # pinned versions
├── .env.example            # QDRANT_URL, OPENAI_API_KEY, DEEPSEEK_API_KEY
├── data/                   # sample PDF corpus ≤5 MB
├── tests/
│   └── test_pipeline.py    # pytest; new assertions added each step
└── STEPS.md                # one-line diff summary per branch (put on a slide)

STEPS.md is the single most useful file for participants — one line per branch stating exactly what changed. Put it on a slide before each live-coding segment.

Embedding model choice

Use a local, no-API-key model for the opening demo to eliminate setup friction. BAAI/bge-small-en-v1.5 ships inside qdrant-client[fastembed] and runs in <100 ms per query on any laptop [1]. The switch to a production model is a one-line change — a strong “what’s next” beat at the end of the session.

Model	Tokens	Where	Workshop role
`BAAI/bge-small-en-v1.5`	512	Local/FastEmbed	Steps 0–2 (no API key needed)
`text-embedding-3-small`	8 191	OpenAI API	Steps 3–4 (upgrade beat)
Gemini Embedding 2	32 000	Google API	“Production” reference slide [13]

Tooling

Demo Time [2] — VS Code extension used at NDC, Microsoft Ignite, and React Summit. Script every keystroke in advance; trigger each step with one hotkey; zero typo risk under pressure. Walk through all five steps the day before the talk.

Recovery protocol: Keep git log --oneline visible on screen while coding. If a live step fails and cannot be fixed in 60 seconds, say “Let’s jump to the checkpoint” and git checkout step-0N. Transparency is more professional than a silent panic fix.

Cognitive load management

The Gradual Release of Responsibility model (I do → we do → you do) [8] [14] maps cleanly onto the two delivery modes. For a 3–6 hour workshop (e.g. PyCon US 2026 format [4] or LangChain-based tutorials [15]), defer the “you do” phase until participants have seen all five layers:

Mode	Segments	Pattern
Talk (45–90 min)	1–7	I do: speaker codes, audience watches; repo published post-session
Workshop (3–6 h)	1–2	I do: speaker models; audience watches
Workshop (3–6 h)	3–4	We do: `git checkout` each branch; participants code alongside speaker
Workshop (3–6 h)	5–7	You do: participants code a variant; speaker reviews live

One new concept per segment. Maximum two new library imports per step. The running architecture diagram is the scaffolding — it shows where each new piece fits without requiring participants to hold the entire mental model unaided.