45 lines
1.3 KiB
Markdown
45 lines
1.3 KiB
Markdown
# Load and Failure Testing Strategy
|
|
|
|
## Goals
|
|
|
|
- Verify the Gateway stays responsive under sustained traffic.
|
|
- Verify auth flows behave correctly under concurrency.
|
|
- Verify routing reloads are atomic and safe under load.
|
|
- Verify upstream failures are bounded (timeouts) and observable (metrics/logs).
|
|
|
|
## Scenarios
|
|
|
|
### AuthN
|
|
|
|
- Sign up once, then:
|
|
- Burst sign-in attempts to verify rate limits and correct 401/429 behavior.
|
|
- Parallel refresh calls to verify refresh rotation correctness.
|
|
|
|
### Routing Reload
|
|
|
|
- Run steady traffic to:
|
|
- `POST /v1/query/{view_type}`
|
|
- `POST /v1/commands/{aggregate_type}/{aggregate_id}`
|
|
- Trigger `POST /admin/routing/reload` repeatedly and verify:
|
|
- No 500s from partial routing table reads.
|
|
- Routing decisions switch only at revision boundaries.
|
|
|
|
### Upstream Failure Modes
|
|
|
|
- Configure routing to a shard endpoint that:
|
|
- Refuses connections (ECONNREFUSED)
|
|
- Hangs (no response)
|
|
- Returns 5xx
|
|
- Verify:
|
|
- Gateway timeouts are enforced.
|
|
- Errors are surfaced as 5xx to callers.
|
|
- `gateway_http_requests_total` and duration histograms capture the failures.
|
|
|
|
### HA Behavior (Swarm)
|
|
|
|
- Run `gateway` with 2 replicas and no sticky sessions.
|
|
- Verify:
|
|
- Refresh works across replicas.
|
|
- IAM updates become effective immediately on both replicas.
|
|
- Rolling update keeps at least 1 replica ready.
|