Orchestration and state: tying Slack, Claude Code, GitHub, and Synology together

Decision. GitHub Actions is already running Claude Code [27] ⭐ 7.3k, so make Actions the orchestrator and stash correlation state in one small key-value store keyed by Slack thread_ts (Redis, SQLite, or Cloudflare Durable Object [16]). Reach for a durable-execution engine — DBOS [7] ⭐ 1.2k for Postgres-only shops, Cloudflare Workflows [14] for Workers shops, Temporal [4] ⭐ 20k only when the flow truly needs step.waitForEvent semantics, multi-day pauses, or fan-out beyond what Actions matrix gives you. For this pipeline (Slack → Claude → PR → Synology preview), the lightweight path wins — Claude Code’s GHA jobs themselves cap at 6 h [17], and the human approval gate is a Slack button + repository_dispatch, not a multi-day suspended workflow.

What state is actually moving

Before picking an orchestrator, name the bytes. Across this pipeline four kinds of state need to survive between steps:

State	Lifetime	Producer	Consumer
Correlation key (`thread_ts` ↔ run ID ↔ branch ↔ PR # ↔ preview URL)	Until the PR closes	Slack bot at task-start	Every later step that posts back to Slack
Approval status (pending / approved / rejected)	Seconds–hours; bounded by reviewer SLA	Slack interaction handler	The runner that promotes the preview, or `repository_dispatch`
Run progress (current step, last log line, cost so far)	Job lifetime (≤6 h on hosted, ≤5 d self-hosted [17])	Claude Code action steps	Slack progress messages
Artifacts (diff, screenshots, build images)	Until PR merge or 90 d retention	Claude Code action / build job	PR comments, preview deployment

The first two need a small persistent store; the third is ephemeral and lives in the runner; the fourth uses GitHub Actions artifacts or GHCR. Every claim in the rest of this doc rolls up to one of these four.

Two orchestrator shapes

Shape A — GitHub Actions as the orchestrator

The default. The trigger graph already exists: repository_dispatch from Slack → claude-code-action job [27] → peter-evans/create-pull-request step → pull_request event → preview-deploy workflow → status-back-to-Slack step. State that has to outlive a single job goes into one external KV; everything else flows as job outputs and artifacts.

Within Actions’s native budget. Job outputs cap at 1 MB per job, 50 MB per run [19] and reusable workflows can chain 10 deep [18] — fine for IDs and URLs, hopeless for diffs. Anything bigger goes to artifacts. Secrets do not auto-propagate down a workflow_call chain — pass each one explicitly [18]. A workflow run can total 35 days including approval waits [17], but a single job is still capped at 6 h hosted / 5 d self-hosted.

The KV store. Pick one of three by team’s existing infra:

Store	Where it runs	Sweet spot	Notes
Redis	Anywhere (Synology container, Upstash, Fly)	Lots of ephemeral keys, distributed locks per Slack thread [21]	Leases prevent two workers replying to the same thread; TTL deletes stale state
SQLite + Litestream	One small VM or container	Pure simplicity, queryable history	Single-writer; not for multi-region
Cloudflare Durable Object	Cloudflare edge	Slack bot already on Workers	One DO per `thread_ts`, strongly-consistent SQLite per object [16]

Slack Bolt itself ships no durable state [22] ⭐ 2.9k — token storage, modal state, and per-thread leases are the developer’s problem. That’s the gap the KV fills.

Shape B — A durable-execution engine

Worth it when the flow needs to pause for hours, retry across process restarts, or fan out beyond the Actions matrix. The shape: the Slack bot starts a workflow; each step (call Claude Code, open PR, build preview, await approval, promote) is a checkpoint; if the worker crashes, the engine replays from the last successful step. Every option below saves the result of every completed step so retries skip them [1].

Engine	Storage	Hosting model	Step semantics	Where it shines	⭐ Stars
Temporal [4]	Cassandra/MySQL/Postgres cluster	Managed (Temporal Cloud) or self-host cluster	Signal (async write), Query (read), Update (sync write w/ validation) [2]	Mission-critical, audited, multi-language [3]	⭐ 20k
Restate [5]	Embedded RocksDB	Single binary sidecar in front of HTTP services	Journaled handlers; no separate cluster [3]	Greenfield microservices, low ops	⭐ 3.8k
DBOS [7]	Postgres only	Library, no server	Each step + checkpoint is one Postgres transaction [6]	“Postgres is enough” shops; air-gapped on-prem	⭐ 1.2k (TS) / ⭐ 1.3k (Py) [8]
Cloudflare Workflows [13]	Workers platform	Managed only	`step.do`, `step.sleep`, `step.sleepUntil`, `step.waitForEvent` [15]	Already on Workers; want managed	n/a (closed)
Inngest [9]	Cloud (proprietary) or self-host single-node SQLite [24]	Calls into your serverless endpoints	Step-level retries; event-driven first	Event-driven backends; serverless	⭐ 5.3k
Trigger.dev v3 [10]	Postgres + Redis	Cloud or self-host (Apache 2.0)	Checkpoint/resume on dedicated compute, no serverless timeout [12]	TS-first teams that want best DX	⭐ 15k
Hatchet [11]	Postgres	Self-host with one DB	DAGs, rate limits, concurrency keys, >100 jobs/s	High-throughput task queues	⭐ 7.0k
Windmill [25]	Postgres	Self-host single binary or Docker	One ACID statement per state transition	Internal-tools / scripts platform	⭐ 16k
Dagu [23]	None required (filesystem)	Single binary, no DB	DAGs over scripts, containers, SQL, HTTP	Cron + approvals on a homelab	⭐ —

Decision shortcut. If the team already has Postgres → DBOS or Hatchet. Already on Workers → Cloudflare Workflows. Already polyglot at scale → Temporal. Already TypeScript-everywhere → Trigger.dev. None of those describe a Slack→Claude→PR pipeline that runs in single-digit minutes — which is why Shape A wins for this specific pipeline.

Where the patterns differ on human-in-the-loop

The “wait for the reviewer to click Approve” gate is the one place a durable-execution engine is qualitatively better than Actions. Compare:

Approach	How the wait works	Time bound
Actions + KV	First job ends; Slack button → webhook → second `repository_dispatch` triggers the next workflow	Bounded by KV retention; Actions run cap is 35 d [17]
Temporal Update / Signal	Workflow blocks on a signal handler; Slack button → `signalWorkflow()`; replay-safe, exactly-once [2]	Months — but cap at 50K events / 50 MB history; use Continue-As-New [20]
Cloudflare `step.waitForEvent`	Workflow hibernates until the event is sent to the run [15]	Cloud-managed; no server to babysit
Postgres FSM	Reviewer click writes a row; a periodic worker / `LISTEN/NOTIFY` advances state [26]	As long as the DB is up

For a homelab pipeline where the reviewer is one of two humans and replies in minutes, the Actions+KV path is the simplest correct thing. For a regulated org where the reviewer might be on PTO, durable-execution earns its complexity tax.

State store cookbook for the lightweight path

KEY                                        VALUE (json)
slack:thread:{thread_ts}                   { team_id, user_id, repo, branch, run_id, pr_number, preview_url, status }
slack:lock:{thread_ts}                     "<worker-id>"          (TTL 15s, lease for processing) [[21]](https://redis.io/tutorials/chat-sdk-slackbot-distributed-locking/)
approval:{pr_number}                       { state: "pending"|"approved"|"rejected", approver, ts }
run:{run_id} → {thread_ts}                 reverse index for callbacks from Actions

Two endpoints close the loop: a Slack button handler that flips approval:* and emits repository_dispatch, and a workflow_run listener that reads run:* and posts to Slack. Every other piece of state stays inside one Actions run.

Pitfalls

Don’t use Actions outputs for diffs. 1 MB cap [19] — push diffs as artifacts and link from the PR comment.
Slack thread_ts is your only stable correlation key before a PR exists. Generate it client-side and pass it through every dispatch payload.
Default GITHUB_TOKEN doesn’t trigger downstream workflows. Use a GitHub App installation token to push the branch, otherwise the preview deploy never runs.
Don’t pick Temporal for a flow whose longest wait is a Slack approval. The 50K-event / 50 MB-history cap [20] is generous, but the operational tax is real and the human gate fits a row in a table [26].
Don’t double-process Slack events. Slack retries within 3 s if no 200; a Redis lease keyed on thread_ts is one line and prevents two runners from racing on the same task [21].
DEEs and event logs are complements, not substitutes. If Kafka is already in the stack, the engine handles the workflow lifecycle on top of the event log [28].