# Runner Scaling Model ## Assumptions - Runner state (saga state, dedupe markers, checkpoints, outbox, schedules) is stored in a local MDBX database via `edge_storage`. - Correctness for a given tenant+saga depends on reading/writing the same storage instance over time. ## Practical Scaling Model ### 1) Scale by tenant partitioning (recommended) Run multiple Runner instances, each responsible for a disjoint set of tenants, and give each instance its own storage volume. - Use `RUNNER_TENANT_ALLOWLIST` to bind an instance to tenants. - Or use NATS KV placement: set `RUNNER_TENANT_PLACEMENT_BUCKET` and `RUNNER_SHARD_ID`. - Streams/consumers can be shared; subjects are tenant-qualified, and per-instance consumers filter by tenant subjects. Example: - Runner A: `RUNNER_TENANT_ALLOWLIST=t1,t2` - Runner B: `RUNNER_TENANT_ALLOWLIST=t3,t4` ### NATS KV Placement (optional) If `RUNNER_TENANT_PLACEMENT_BUCKET` and `RUNNER_SHARD_ID` are set, the Runner watches a NATS KV bucket where: - key = tenant_id - value = shard_id and dynamically updates the set of per-tenant consumers it is polling without restarting. ### 2) Multiple replicas for the same tenant (not supported with local storage) If two replicas for the same tenant use different local storages, they will not share: - dedupe markers - checkpoints - saga state and can duplicate work. To support same-tenant replicas, storage must be shared/replicated (not implemented here). ## Rollout/Drain Strategy Use the drain endpoint before stopping a process: - `POST /admin/drain` to stop taking new work. - then stop the container/process. ## Replay Controlled replay exists for operational/debug use: - `POST /admin/replay` with `tenant_id`, `saga_name`, and `mode`. - Modes: - `checkpoint_only` - `checkpoint_and_dedupe` - `full_reset`