Decision. GitHub Actions is already running Claude Code [27] ⭐ 7.3k, so make Actions the orchestrator and stash correlation state in one small key-value store keyed by Slack
thread_ts(Redis, SQLite, or Cloudflare Durable Object [16]). Reach for a durable-execution engine — DBOS [7] ⭐ 1.2k for Postgres-only shops, Cloudflare Workflows [14] for Workers shops, Temporal [4] ⭐ 20k only when the flow truly needsstep.waitForEventsemantics, multi-day pauses, or fan-out beyond what Actions matrix gives you. For this pipeline (Slack → Claude → PR → Synology preview), the lightweight path wins — Claude Code’s GHA jobs themselves cap at 6 h [17], and the human approval gate is a Slack button +repository_dispatch, not a multi-day suspended workflow.
What state is actually moving
Before picking an orchestrator, name the bytes. Across this pipeline four kinds of state need to survive between steps:
| State | Lifetime | Producer | Consumer |
|---|---|---|---|
Correlation key (thread_ts ↔ run ID ↔ branch ↔ PR # ↔ preview URL) |
Until the PR closes | Slack bot at task-start | Every later step that posts back to Slack |
| Approval status (pending / approved / rejected) | Seconds–hours; bounded by reviewer SLA | Slack interaction handler | The runner that promotes the preview, or repository_dispatch |
| Run progress (current step, last log line, cost so far) | Job lifetime (≤6 h on hosted, ≤5 d self-hosted [17]) | Claude Code action steps | Slack progress messages |
| Artifacts (diff, screenshots, build images) | Until PR merge or 90 d retention | Claude Code action / build job | PR comments, preview deployment |
The first two need a small persistent store; the third is ephemeral and lives in the runner; the fourth uses GitHub Actions artifacts or GHCR. Every claim in the rest of this doc rolls up to one of these four.
Two orchestrator shapes
Shape A — GitHub Actions as the orchestrator
The default. The trigger graph already exists: repository_dispatch from Slack → claude-code-action job [27] → peter-evans/create-pull-request step → pull_request event → preview-deploy workflow → status-back-to-Slack step. State that has to outlive a single job goes into one external KV; everything else flows as job outputs and artifacts.
Within Actions’s native budget. Job outputs cap at 1 MB per job, 50 MB per run [19] and reusable workflows can chain 10 deep [18] — fine for IDs and URLs, hopeless for diffs. Anything bigger goes to artifacts. Secrets do not auto-propagate down a workflow_call chain — pass each one explicitly [18]. A workflow run can total 35 days including approval waits [17], but a single job is still capped at 6 h hosted / 5 d self-hosted.
The KV store. Pick one of three by team’s existing infra:
| Store | Where it runs | Sweet spot | Notes |
|---|---|---|---|
| Redis | Anywhere (Synology container, Upstash, Fly) | Lots of ephemeral keys, distributed locks per Slack thread [21] | Leases prevent two workers replying to the same thread; TTL deletes stale state |
| SQLite + Litestream | One small VM or container | Pure simplicity, queryable history | Single-writer; not for multi-region |
| Cloudflare Durable Object | Cloudflare edge | Slack bot already on Workers | One DO per thread_ts, strongly-consistent SQLite per object [16] |
Slack Bolt itself ships no durable state [22] ⭐ 2.9k — token storage, modal state, and per-thread leases are the developer’s problem. That’s the gap the KV fills.
Shape B — A durable-execution engine
Worth it when the flow needs to pause for hours, retry across process restarts, or fan out beyond the Actions matrix. The shape: the Slack bot starts a workflow; each step (call Claude Code, open PR, build preview, await approval, promote) is a checkpoint; if the worker crashes, the engine replays from the last successful step. Every option below saves the result of every completed step so retries skip them [1].
| Engine | Storage | Hosting model | Step semantics | Where it shines | ⭐ Stars |
|---|---|---|---|---|---|
| Temporal [4] | Cassandra/MySQL/Postgres cluster | Managed (Temporal Cloud) or self-host cluster | Signal (async write), Query (read), Update (sync write w/ validation) [2] | Mission-critical, audited, multi-language [3] | ⭐ 20k |
| Restate [5] | Embedded RocksDB | Single binary sidecar in front of HTTP services | Journaled handlers; no separate cluster [3] | Greenfield microservices, low ops | ⭐ 3.8k |
| DBOS [7] | Postgres only | Library, no server | Each step + checkpoint is one Postgres transaction [6] | “Postgres is enough” shops; air-gapped on-prem | ⭐ 1.2k (TS) / ⭐ 1.3k (Py) [8] |
| Cloudflare Workflows [13] | Workers platform | Managed only | step.do, step.sleep, step.sleepUntil, step.waitForEvent [15] |
Already on Workers; want managed | n/a (closed) |
| Inngest [9] | Cloud (proprietary) or self-host single-node SQLite [24] | Calls into your serverless endpoints | Step-level retries; event-driven first | Event-driven backends; serverless | ⭐ 5.3k |
| Trigger.dev v3 [10] | Postgres + Redis | Cloud or self-host (Apache 2.0) | Checkpoint/resume on dedicated compute, no serverless timeout [12] | TS-first teams that want best DX | ⭐ 15k |
| Hatchet [11] | Postgres | Self-host with one DB | DAGs, rate limits, concurrency keys, >100 jobs/s | High-throughput task queues | ⭐ 7.0k |
| Windmill [25] | Postgres | Self-host single binary or Docker | One ACID statement per state transition | Internal-tools / scripts platform | ⭐ 16k |
| Dagu [23] | None required (filesystem) | Single binary, no DB | DAGs over scripts, containers, SQL, HTTP | Cron + approvals on a homelab | ⭐ — |
Decision shortcut. If the team already has Postgres → DBOS or Hatchet. Already on Workers → Cloudflare Workflows. Already polyglot at scale → Temporal. Already TypeScript-everywhere → Trigger.dev. None of those describe a Slack→Claude→PR pipeline that runs in single-digit minutes — which is why Shape A wins for this specific pipeline.
Where the patterns differ on human-in-the-loop
The “wait for the reviewer to click Approve” gate is the one place a durable-execution engine is qualitatively better than Actions. Compare:
| Approach | How the wait works | Time bound |
|---|---|---|
| Actions + KV | First job ends; Slack button → webhook → second repository_dispatch triggers the next workflow |
Bounded by KV retention; Actions run cap is 35 d [17] |
| Temporal Update / Signal | Workflow blocks on a signal handler; Slack button → signalWorkflow(); replay-safe, exactly-once [2] |
Months — but cap at 50K events / 50 MB history; use Continue-As-New [20] |
Cloudflare step.waitForEvent |
Workflow hibernates until the event is sent to the run [15] | Cloud-managed; no server to babysit |
| Postgres FSM | Reviewer click writes a row; a periodic worker / LISTEN/NOTIFY advances state [26] |
As long as the DB is up |
For a homelab pipeline where the reviewer is one of two humans and replies in minutes, the Actions+KV path is the simplest correct thing. For a regulated org where the reviewer might be on PTO, durable-execution earns its complexity tax.
State store cookbook for the lightweight path
KEY VALUE (json)
slack:thread:{thread_ts} { team_id, user_id, repo, branch, run_id, pr_number, preview_url, status }
slack:lock:{thread_ts} "<worker-id>" (TTL 15s, lease for processing) [[21]](https://redis.io/tutorials/chat-sdk-slackbot-distributed-locking/)
approval:{pr_number} { state: "pending"|"approved"|"rejected", approver, ts }
run:{run_id} → {thread_ts} reverse index for callbacks from Actions
Two endpoints close the loop: a Slack button handler that flips approval:* and emits repository_dispatch, and a workflow_run listener that reads run:* and posts to Slack. Every other piece of state stays inside one Actions run.
Pitfalls
- Don’t use Actions outputs for diffs. 1 MB cap [19] — push diffs as artifacts and link from the PR comment.
- Slack
thread_tsis your only stable correlation key before a PR exists. Generate it client-side and pass it through every dispatch payload. - Default
GITHUB_TOKENdoesn’t trigger downstream workflows. Use a GitHub App installation token to push the branch, otherwise the preview deploy never runs. - Don’t pick Temporal for a flow whose longest wait is a Slack approval. The 50K-event / 50 MB-history cap [20] is generous, but the operational tax is real and the human gate fits a row in a table [26].
- Don’t double-process Slack events. Slack retries within 3 s if no 200; a Redis lease keyed on
thread_tsis one line and prevents two runners from racing on the same task [21]. - DEEs and event logs are complements, not substitutes. If Kafka is already in the stack, the engine handles the workflow lifecycle on top of the event log [28].