← Default view
Workshop Exercises & Scaffolding

Exercise Catalog

AI-Assisted TDD · Expert Developers · Pick one warm-up + one main exercise per session. All exercises use pre-written failing tests; participants implement with their AI tool. The golden constraint that prevents AI cheating:

You may not modify the test file.
Session formats
90-minute session recommended
String Calculator 15–20 min · warm-up
Tetris or Goose Game 45–60 min · medium
Reserve 15 min for debrief. Leave autonomy level choice open — comparing Level A vs C in discussion is richer than mandating one.
Half-day session
String Calculator × 2 30 min · run without AI first, then with
Goose Game 45–60 min · prompt log debrief
Gilded Rose 40–60 min · legacy refactoring
Anti-pattern demo 10–15 min · vibe-coding failure mode
The before/after delta on String Calculator is the sharpest data point in the room. Earns more than any slide on AI productivity.
Exercises Difficulty: Warm-up  Medium  Medium-High  High  Expert
Warm-up Greenfield TDD
String Calculator
⏱ 15–20 min · no repo required
Incremental growth from add("") == 0 through comma delimiters, newlines, custom delimiters, and negatives. Primary use: calibrate the room. Run it twice — once without AI, once with — to make the productivity delta concrete. Expert developers internalize the point without further argument. [16]

See also garora/TDD-Katas ⭐ 735 for multi-language kata alternatives. [22]
Prerequisites
none
Medium Greenfield AI TDD ⭐ 0
Tetris Skeleton
⏱ 45–60 min · Eficode skeleton repo · github.com
Full test suite pre-written — board init → movement → line clearing → scoring. Participants use Copilot to make failing tests pass. The "you may not modify the test file" constraint prevents AI from gaming assertions. CI runs on every push, removing the facilitator as a validation bottleneck. [2]
Prerequisites
AI tool installed
Medium Greenfield AI TDD ⭐ 3
Goose Game
⏱ 45–60 min · Kotlin · xpepper · github.com
Full test suite pre-written. Adds a prompts/ folder where participants log each AI prompt chronologically as they work. This log becomes the primary debrief artifact: what context the AI needed, where it hallucinated, where it outperformed. Expert developers find the prompt-quality discussion more valuable than the code itself. [4]
Prerequisites
AI tool installed TDD basics
Medium-High Legacy refactoring
Gilded Rose
⏱ 40–60 min · pure nested conditionals · no external deps
Write characterization tests before touching any logic. No HTTP, no DB — all complexity is cognitive. Participants experience the AI's tendency to "refactor" without tests and observe architectural degradation in real time. Entry point to Bourgau's 4-stage legacy code progression. [9] [8]
Prerequisites
TDD basics Legacy code concepts
High Dependency breaking
Trip Service
⏱ 45–60 min · HTTP + DB seam design
Break HTTP and DB dependencies to get code under test before writing a single assertion. AI alone cannot reliably design seams — this is the exercise where human architectural judgment is the critical input, not prompt quality. The sharpest demonstration that TDD-with-AI is developer-led, not AI-delegated. [9]
Prerequisites
Dependency injection Gilded Rose first
Expert Full EXACT workflow
EXACT Mini-Project
⏱ 60–90 min · facilitator-required
Start with Example Mapping (story → rules → examples → open questions), then run EXACT at Level B: pause after each Red-Green-Refactor cycle for human review. Synthesises everything: prompt engineering, autonomy control, and test-as-spec discipline. Leave autonomy level choice open; comparing Level A vs C choices in debrief is the workshop's richest discussion. [12]
Prerequisites
All prior exercises EXACT framework Facilitator required
Scaffolding checklist
📦
Dev Container / GitHub Codespaces
One-click identical runtime + AI extension. Eliminates environment setup as a time sink before the first exercise starts. Centric offers choice of .NET, Spring Boot, or bilingual containers. [5]
🔒
.github/copilot-instructions.md
Encodes TDD rules so the AI tool itself enforces the discipline: "write the failing test first; never implement without a red test; you may not modify the test file." [18]
🔴
Failing tests pre-written
Skeleton repo with tests present, implementation stubs empty. Enforces TDD discipline without requiring participants to write test specs under time pressure. [2]
🌿
Reference solution branch
Unblocks stuck participants without spoiling the exercise for everyone else. Centric names it solution/. Participants reach for it when needed — not on a timer. [5]
📝
prompts/ folder
Participants log each AI prompt chronologically. Makes debrief on prompt quality concrete rather than hypothetical: what context AI needed, where it hallucinated. [4]
CI on every push
Instant green/red via GitHub Actions. Removes the facilitator as a validation bottleneck — participants self-verify. Caltech's format adds a take-home checklist that outlasts the session. [2] [6]
Frameworks
TDAID
Test-Driven AI Development
1 Plan — AI generates a structured implementation roadmap as a comment block before any code is written
2 Red — write the failing test
3 Green — AI drives minimal implementation
4 Refactor — clean up with AI assistance
5 Validate — human reviews the git diff; confirms tests don't verify broken behavior [3] [1]
Workshop shape: write the Plan as a comment block, let AI drive Red → Green → Refactor, then human-review the diff before moving to the next increment.
EXACT
Example-guided AI-Collaborative Test-driven Coding
0 Example Mapping — story → rules → examples → open questions. Runs before the first test is written.
Level AI pauses after… Best for
A End of feature Speed · experienced users
B default End of RGR cycle Workshop default · balanced
C End of each phase Learning mode · max oversight
Leave autonomy level open during exercises — participant choice becomes the richest debrief material. The GitHub Copilot Workshop maps a similar three-path progression onto beginner-to-expert cohorts. [12] [15]
Expert audience notes
Skip TDD theory. They know red-green-refactor. Spend that time on what changes with AI in the loop: the Validate phase, autonomy levels, and prompt engineering.
Debrief the prompts, not the code. The prompts/ log makes this concrete. The most valuable expert discussion is about prompt quality — not implementation choices. [4]
Pair strategically. Architects own test strategy and system design; devs drive agent prompts and the RGR cycle. Knowledge transfer surfaces naturally without making it the explicit goal. [19]
Use real-world complexity. Greenfield toys disengage senior developers. Working on existing codebases is more relevant and more engaging for the cohort. [7]
Demo the failure mode. Implement a feature without tests via AI, add a second feature, observe architectural degradation live. AI never spontaneously suggests refactoring without test constraints. Show the diff. [20]
Plan before coding. A SPEC.md or mini-PRD before prompting shifts AI from free-wheeling generator to constrained implementation engine — pairs naturally with EXACT's Example Mapping step. [13]