# Load and Failure Testing Strategy ## Goals - Verify the Gateway stays responsive under sustained traffic. - Verify auth flows behave correctly under concurrency. - Verify routing reloads are atomic and safe under load. - Verify upstream failures are bounded (timeouts) and observable (metrics/logs). ## Scenarios ### AuthN - Sign up once, then: - Burst sign-in attempts to verify rate limits and correct 401/429 behavior. - Parallel refresh calls to verify refresh rotation correctness. ### Routing Reload - Run steady traffic to: - `POST /v1/query/{view_type}` - `POST /v1/commands/{aggregate_type}/{aggregate_id}` - Trigger `POST /admin/routing/reload` repeatedly and verify: - No 500s from partial routing table reads. - Routing decisions switch only at revision boundaries. ### Upstream Failure Modes - Configure routing to a shard endpoint that: - Refuses connections (ECONNREFUSED) - Hangs (no response) - Returns 5xx - Verify: - Gateway timeouts are enforced. - Errors are surfaced as 5xx to callers. - `gateway_http_requests_total` and duration histograms capture the failures. ### HA Behavior (Swarm) - Run `gateway` with 2 replicas and no sticky sessions. - Verify: - Refresh works across replicas. - IAM updates become effective immediately on both replicas. - Rolling update keeps at least 1 replica ready.