### 🧱 Component: Gateway

**Definition:**  
The Gateway is the single ingress for the platform. It provides:
- **Tenant-aware routing** to the node services: **Aggregate** (write/commands), **Projection** (read/queries), and **Runner** (workflow/saga + effects admin).
- Centralized **authn** (password via Argon2 + Google OIDC; extensible to more providers) and **authz** (tenant-scoped RBAC).
- Cross-cutting concerns: request validation, rate limiting, observability, and consistent error semantics.

The Gateway is responsible for enforcing multi-tenancy at the edge: it treats `x-tenant-id` as the tenant selection signal, validates it against the caller identity, and routes requests to the correct tenant shard/node.

---

## **Context: Existing Nodes**

This PRD is based on the currently implemented node repositories:
- **Aggregate**: defines gRPC Command API `aggregate.gateway.v1.CommandService/SubmitCommand` in [aggregate.proto](file:///Users/vlad/Developer/cloudlysis/aggregate/proto/aggregate.proto#L1-L31). Aggregate’s PRD explicitly expects the Gateway to route by `x-tenant-id` ([aggregate/prd.md](file:///Users/vlad/Developer/cloudlysis/aggregate/prd.md#L5-L12)).
- **Projection**: provides health/admin HTTP endpoints and implements an in-process UQF query engine as `QueryService` but does not currently expose it over HTTP/gRPC ([uqf.rs](file:///Users/vlad/Developer/cloudlysis/projection/src/query/uqf.rs#L8-L162)).
- **Runner**: uses a gRPC client to submit aggregate commands “through the gateway” (config key `aggregate_gateway_url`), propagating `x-tenant-id` as gRPC metadata ([GatewayClient](file:///Users/vlad/Developer/cloudlysis/runner/src/gateway/mod.rs#L1-L47), [OutboxRelay](file:///Users/vlad/Developer/cloudlysis/runner/src/outbox/relay.rs#L37-L110)).
- **Tenant placement**: there is precedent for **NATS JetStream KV** as a control plane for tenant placement/sharding (Runner tenant filter watcher: [tenant_placement.rs](file:///Users/vlad/Developer/cloudlysis/runner/src/tenant_placement.rs#L8-L100); Aggregate KV client helper: [swarm.rs](file:///Users/vlad/Developer/cloudlysis/aggregate/src/swarm.rs#L79-L227)). There is also a simple static mapping example in [gateway-routing.yaml](file:///Users/vlad/Developer/cloudlysis/aggregate/gateway-routing.yaml#L1-L3).

---

## **Problem Statement**

Clients (and internal workers like Runner) need a stable, secure entrypoint that:
- Authenticates identities (humans and services)
- Authorizes actions per tenant
- Routes requests to the correct node(s) for the selected tenant
- Provides consistent APIs independent of the underlying shard topology and service discovery

Without a Gateway, each node would need to re-implement auth, tenant enforcement, rate limiting, and topology discovery, increasing security risk and operational complexity.

---

## **Goals**

- Provide one entrypoint for **command submission** (Aggregate) and **query execution** (Projection), and an authenticated entrypoint for **workflow/admin actions** (Runner).
- Enforce tenant isolation using `x-tenant-id`:
  - Validate tenant selection is allowed for the caller
  - Prevent tenant spoofing
- Prioritize **independent scalability** of Aggregate, Projection, and Runner:
  - Scale each service horizontally without requiring the others to scale
  - Allow tenant assignments for each service to be rebalanced independently
- Support **authn**:
  - Username/password with Argon2 password hashing
  - Google OIDC login (future providers supported)
- Support **authz**:
  - Tenant-scoped RBAC with explicit permissions
  - Service identities for internal traffic (Runner → Gateway)
- Provide operational endpoints: `/health`, `/ready`, `/metrics`, config/routing introspection (admin-only).

---

## **Non-Goals**

- Implement the Aggregate/Projection/Runner business logic.
- Replace NATS JetStream as the event bus or the storage responsibilities of nodes.
- Provide a general-purpose API gateway for arbitrary upstreams; this Gateway is purpose-built for platform nodes.
- Provide UI/console; the Gateway only exposes APIs.

---

## **Primary Users**

- **External clients**: applications submitting commands and running queries.
- **Internal services**: Runner submitting commands on behalf of sagas.
- **Operators**: managing tenant placement and observing health/metrics.

---

## **Key Concepts**

### Tenant Selection and Enforcement

- `x-tenant-id` is the canonical tenant selector for all tenant-scoped requests.
- The Gateway MUST reject requests when:
  - The endpoint is tenant-scoped and `x-tenant-id` is missing (unless explicitly configured as single-tenant default).
  - The caller is not authorized for that tenant.
- The Gateway SHOULD normalize and validate tenant IDs using the same constraints the nodes already use (alphanumeric + `-` + `_`).

### Node Types and Traffic Classes

- **Aggregate (write path)**: synchronous command submission; returns events.
- **Projection (read path)**: query execution; returns query results; eventual consistency is expected.
- **Runner (workflow/admin path)**: operational endpoints for runner configuration, drain, reload, and diagnostics; access is admin-only.

### Tenant-Aware Routing

- Routing decision is primarily based on `tenant_id`, and secondarily on request kind (aggregate vs projection vs runner).
- The Gateway abstracts the topology: clients do not need to know which node hosts their tenant.

### Independent Scalability and Rebalancing

- Each service (Aggregate, Projection, Runner) can have its own tenant-to-shard placement. The Gateway resolves routing per `(tenant_id, service_kind)`.
- Rebalancing is defined as moving a tenant’s assignment for a specific service from one shard to another with bounded disruption.

---

## **Functional Requirements**

### 1) Authentication (AuthN)

- **AuthN surface area**:
  - Signup, signin, signout
  - Forgot password, reset password
  - MFA enrollment and MFA challenge (step-up)
  - Google OIDC login (and future providers)
  - Service identities (internal callers)

- **Password-based accounts**:
  - Store passwords hashed with **Argon2id** using per-user random salts and parameters suitable for production.
  - Signup MUST support email verification before the account becomes active (configurable per environment).
  - Signin MUST support MFA when required by policy.
  - Signout MUST revoke refresh tokens (and optionally maintain a short-lived access-token denylist only if needed).

- **Sessions and tokens**:
  - Issue a short-lived access token and a refresh token with rotation.
  - Refresh tokens MUST be stored server-side (hashed at rest) to support revocation and rotation.
  - Support both browser and API clients:
    - Browser: refresh token in an HttpOnly cookie with CSRF protections.
    - API clients: refresh token in an authorization header or secure client storage (no localStorage guidance in the PRD; implementation chooses).
- **OIDC (Google)**:
  - Support Authorization Code flow with PKCE.
  - Map OIDC identities to internal users; allow linking multiple providers per user.
  - Future providers (e.g., GitHub, Azure AD) should fit the same model.
- **Service auth** (internal):
  - Support service identities for Runner → Gateway and other future internal callers.
  - Recommended approach: mTLS and/or signed JWTs with a `sub` of `service:<name>` plus explicit RBAC grants.

- **Forgot / reset password**:
  - Forgot password MUST create a one-time reset token with an expiry and store only a hash of it.
  - Reset password MUST verify the token, enforce password policy, rotate credentials, and revoke all refresh tokens for the user.
  - Sending reset links/codes is a side effect; the Gateway SHOULD trigger it via the platform’s effect execution path (Runner effect providers) rather than embedding SMTP credentials in the Gateway.

- **MFA**:
  - Support TOTP (authenticator apps) as the default MFA method.
  - Support recovery codes (one-time use) for account recovery.
  - MFA enrollment MUST require a recent primary authentication (step-up).
  - MFA challenges MUST be bound to an auth session and have short expiration.

### 2) Authorization (AuthZ / RBAC)

- RBAC entities:
  - **User** (human identity)
  - **Service** (machine identity)
  - **Tenant**
  - **Role** (set of permissions)
  - **Assignment** (principal ↔ tenant ↔ role)
- Authorization checks:
  - Command submission permissions: per tenant, optionally scoped by `aggregate_type`.
  - Query permissions: per tenant, optionally scoped by `view_type`.
  - Admin permissions: routing/config endpoints, runner admin passthrough, tenant placement changes.

### 3) Routing to Nodes

The Gateway MUST route to:
- **Aggregate nodes** for command submission.
- **Projection nodes** for query execution.
- **Runner nodes** for admin/ops passthrough.

Routing inputs:
- `tenant_id` (from `x-tenant-id` or request body for internal gRPC; header is authoritative for external HTTP).
- A routing table defining tenant → shard/node → service endpoint(s), where placement MAY differ per service kind.

Routing behavior:
- The Gateway MUST be able to hot-reload routing configuration without restart.
- The Gateway SHOULD support both:
  - **Static config** (file-based mapping for development)
  - **Dynamic config** (NATS KV-based control plane for production)
- The Gateway MUST support routing when placements are independent:
  - `aggregate_placement[tenant_id] -> aggregate_shard_id`
  - `projection_placement[tenant_id] -> projection_shard_id`
  - `runner_placement[tenant_id] -> runner_shard_id`
- The Gateway SHOULD expose placement revisions and effective routing decisions for debugging (admin-only).

### 4) Public APIs (Initial)

The Gateway exposes two public surface areas:

#### Command Submission (Write)

- **gRPC**: implement `aggregate.gateway.v1.CommandService/SubmitCommand` for internal callers (Runner) and optional external clients.
- **HTTP**: provide a simple REST wrapper to allow browser and non-gRPC clients.

HTTP sketch:
- `POST /v1/commands/{aggregate_type}/{aggregate_id}`
  - Headers: `Authorization`, `x-tenant-id`
  - Body: JSON command payload
  - Response: JSON containing events (mirrors the gRPC response shape)

#### Query Execution (Read)

Because Projection currently implements UQF query logic but does not expose it, the Gateway defines a stable API and routes to a Projection query endpoint once it exists.

HTTP sketch:
- `POST /v1/query/{view_type}`
  - Headers: `Authorization`, `x-tenant-id`
  - Body: `{ "uqf": "<json-string>" }`
  - Response: `{ "mode": "find" | "count", ... }` compatible with Projection’s `QueryResponse` shape.

### 5) Operational APIs

- `GET /health` and `GET /ready` for load balancers.
- `GET /metrics` for Prometheus/Victoria Metrics.
- Admin-only:
  - `GET /admin/routing` (current effective routing table and revision)
  - `POST /admin/routing/reload` (force reload; should still be safe if watcher exists)
  - Runner passthrough under `/admin/runner/*` (authenticated + authorized)

### 6) AuthN Endpoints (HTTP)

The Gateway SHOULD expose a stable HTTP AuthN API (exact payloads may evolve; semantics should not):
- `POST /v1/auth/signup`
- `POST /v1/auth/signin`
- `POST /v1/auth/signout`
- `POST /v1/auth/refresh`
- `POST /v1/auth/forgot`
- `POST /v1/auth/reset`
- `POST /v1/auth/mfa/enroll/start`
- `POST /v1/auth/mfa/enroll/confirm`
- `POST /v1/auth/mfa/challenge`
- `POST /v1/auth/oidc/google/start`
- `GET /v1/auth/oidc/google/callback`

The Gateway MUST enforce rate limits on signin/forgot/reset and MUST apply abuse protections (generic error responses for account existence, IP/device throttling).

### 7) Admin IAM APIs (HTTP)

The Gateway MUST expose an admin-facing API surface for the Admin UI node to manage authentication + authorization:
- **Users**: create, read, update, disable, delete
- **Identities**: link/unlink OIDC identities, manage password credentials, enforce email verification status
- **Roles and Rights**: define permissions (rights), create/update roles, assign rights to roles
- **Assignments**: assign roles to principals (users/services) scoped to a tenant
- **Service Accounts**: create/rotate credentials for internal callers, assign tenant roles
- **MFA Admin Actions**: reset MFA for a user, revoke recovery codes, force re-enrollment
- **Sessions**: revoke refresh tokens for a user (global signout)

Endpoint sketch (admin-only, audited, paginated):
- `GET /v1/admin/iam/users`
- `POST /v1/admin/iam/users`
- `GET /v1/admin/iam/users/{user_id}`
- `PATCH /v1/admin/iam/users/{user_id}`
- `POST /v1/admin/iam/users/{user_id}/disable`
- `POST /v1/admin/iam/users/{user_id}/sessions/revoke`
- `POST /v1/admin/iam/users/{user_id}/mfa/reset`
- `GET /v1/admin/iam/rights`
- `POST /v1/admin/iam/rights`
- `GET /v1/admin/iam/roles`
- `POST /v1/admin/iam/roles`
- `GET /v1/admin/iam/roles/{role_id}`
- `PATCH /v1/admin/iam/roles/{role_id}`
- `POST /v1/admin/iam/roles/{role_id}/rights`
- `GET /v1/admin/iam/assignments`
- `POST /v1/admin/iam/assignments`
- `DELETE /v1/admin/iam/assignments/{assignment_id}`

Tenant scoping rules:
- Tenant-scoped operations MUST require `x-tenant-id` and apply within that tenant (role assignments, tenant membership, tenant admin).
- Platform-scoped operations MUST NOT depend on `x-tenant-id` (right/permission catalog, platform admins, global user search).

All admin IAM endpoints MUST require strong authorization (platform admin or tenant admin depending on the resource) and MUST produce an immutable audit trail (who changed what, from where, and when).

---

## **Non-Functional Requirements**

- **Security**
  - Reject requests missing tenant context when required.
  - Do not trust `x-tenant-id` unless it is authorized by the caller identity.
  - Rate limit authentication endpoints and command submission endpoints.
  - Ensure secrets never appear in logs (tokens, OIDC codes, passwords).
  - Enforce secure defaults for sessions:
    - HttpOnly + Secure cookies where applicable, explicit CSRF protections for browser flows.
    - Access token TTLs and refresh token rotation with revocation.
    - Account lockout / progressive throttling for credential stuffing.
  - Require key management and rotation:
    - JWT signing keys MUST support rotation; old keys remain valid only for bounded overlap.
    - Password reset tokens, email verification tokens, and refresh tokens MUST be stored as hashes.
  - Require transport security:
    - mTLS between Gateway and internal nodes (or an equivalent, explicit service-to-service auth boundary).
  - Produce auditable, immutable logs for admin IAM actions and tenant placement changes.
- **Reliability**
  - Timeouts for upstream calls; bounded retries only when safe (idempotency key present).
  - Circuit breaking per upstream endpoint.
  - Graceful degradation when routing config control plane is temporarily unavailable (serve last known good config).
- **Observability**
  - Correlate requests with `request_id` and `trace_id`.
  - Emit structured logs and Prometheus metrics (request counts, latency histograms, auth failures, upstream errors).
  - Emit security signals (failed signins, MFA failures, suspicious IP/device patterns) suitable for alerting.
- **Performance**
  - Minimize per-request allocations; use connection pools for upstreams.
  - Cache routing decisions keyed by `(tenant_id, service_kind)` with small TTL and invalidation on routing config change.
- **Compatibility**
  - Support single-tenant mode (empty tenant id) for development and early environments, without changing client code.
  - Define API versioning rules and a consistent error envelope for HTTP APIs.

---

## **Proposed Architecture**

### High-Level Flow

```
Client / Runner
  |
  |  (Authorization, x-tenant-id)
  v
Gateway
  | 1) AuthN (password/OIDC/service)
  | 2) AuthZ (RBAC per tenant + permission)
  | 3) Tenant routing (tenant_id -> node -> endpoint)
  v
Aggregate / Projection / Runner nodes
```

### Components Inside the Gateway

- **API Layer**
  - HTTP server for REST endpoints
  - gRPC server implementing `aggregate.gateway.v1.CommandService` for Runner compatibility
- **Identity Layer**
  - Credential verification (Argon2)
  - OIDC provider integration (Google)
  - Token issuance and verification (JWT access + refresh token rotation)
- **Authorization Layer**
  - RBAC policy evaluation for each request
  - Tenant membership validation for `x-tenant-id`
- **Routing Layer**
  - Routing config loader: file + NATS KV watcher
  - Routing decision: `(tenant_id, service_kind) -> endpoint` with independent placement per service kind
  - Health-aware endpoint selection (optional phase): avoid unhealthy endpoints when multiple replicas exist
- **Upstream Clients**
  - Aggregate upstream: gRPC client (forward SubmitCommand)
  - Projection upstream: HTTP or gRPC client (forward Query)
  - Runner upstream: HTTP client for admin passthrough (restricted)

### Routing Config Model (Recommended)

Represent routing as two layers:
- **Placement maps** (tenant → shard), per service kind:
  - `aggregate_placement[tenant_id] -> aggregate_shard_id`
  - `projection_placement[tenant_id] -> projection_shard_id`
  - `runner_placement[tenant_id] -> runner_shard_id`
- **Shard directory** (shard → endpoints), per service kind:
  - `aggregate_shards[aggregate_shard_id] -> { grpc_endpoint, http_endpoint, admin_endpoint? }`
  - `projection_shards[projection_shard_id] -> { http_endpoint, admin_endpoint? }`
  - `runner_shards[runner_shard_id] -> { http_endpoint, admin_endpoint }`

This supports both:
- Static YAML/JSON config files for local runs.
- Dynamic updates via NATS KV:
  - Keys like `aggregate/tenants/<tenant_id>`, `projection/tenants/<tenant_id>`, `runner/tenants/<tenant_id>`
  - Keys like `aggregate/shards/<shard_id>`, `projection/shards/<shard_id>`, `runner/shards/<shard_id>`

The Gateway keeps:
- **Last known good** routing config
- A **revision** number (KV revision or monotonic local revision) for observability/debugging

### Rebalancing Mechanism (Control Plane)

Rebalancing is driven by a small control plane that updates placement and coordinates safe handoff:
- **Placement Store**: NATS JetStream KV buckets holding placement maps and shard directory entries.
- **Rebalancer** (operator-driven initially, automated later):
  - Reads load signals (Gateway/Node metrics) and proposes moves: `(service_kind, tenant_id, from_shard, to_shard)`
  - Applies moves by writing to KV and orchestrating drain/warmup as needed
  - Provides audit trail: who moved what, when, and why

Rebalance flow (per service kind):
- Update placement (KV) to include the target shard assignment with a revision.
- Ensure the target shard is ready for the tenant (service-specific warmup).
- Drain the tenant on the old shard (stop accepting new work for that tenant, finish in-flight).
- Finalize by removing/overwriting the old assignment and triggering config reload/watchers.

Service-specific notes:
- **Projection**: can rebuild from JetStream; rebalancing can be “cold” (new shard catches up) with minimal coordination beyond tenant filtering.
- **Runner**: must stop acquiring new work for a tenant, flush outbox dispatch, and persist checkpoints before handing off.
- **Aggregate**: must ensure single-writer semantics per aggregate instance; tenant drain should block new commands during handoff, and the target shard must have state (snapshot transfer) or accept a cold rehydrate from JetStream.

---

## **Error Semantics**

- Auth failures: `401` (unauthenticated) or `403` (forbidden)
- Tenant header issues:
  - Missing `x-tenant-id` on tenant-scoped routes: `400`
  - Invalid tenant format: `400`
  - Tenant not permitted for principal: `403`
- Routing failures:
  - Unknown tenant assignment: `503` with retriable hint
  - No healthy upstream endpoints: `503`
- Upstream errors:
  - Preserve upstream error category when safe; normalize into a consistent error envelope.

---

## **Rollout Plan**

Phase 1 (Minimum viable ingress)
- Implement tenant-aware routing for Aggregate command submission.
- Implement gRPC `SubmitCommand` compatible with Runner.
- Add HTTP wrapper for command submission.
- Introduce basic authn/authz (service identity + a minimal RBAC model).

Phase 2 (Read path + OIDC)
- Add query API and route to Projection query endpoint (Projection may need an exposed endpoint).
- Add Google OIDC login and account linking.
- Harden RBAC and permissions by resource type (`aggregate_type`, `view_type`).

Phase 3 (Operations + topology)
- NATS KV routing config watcher (hot reload).
- Admin APIs for routing inspection and controlled updates.
- Health-aware routing and per-tenant rate limits.
- Introduce placement maps per service kind (independent scaling).
- Introduce a rebalancer workflow (manual first) to move tenant placements safely.

---

## **Gaps / Opportunities**

- **Tenant lifecycle APIs**: tenant creation, tenant metadata, domain verification, invite flows, default roles, and bootstrap of the first tenant admin.
- **API conventions**: standard error envelope, pagination/cursors, request IDs, idempotency semantics for command submission retries.
- **Identity hardening**: password policy, breached-password checks, device/session management, step-up authentication rules, and admin break-glass procedures.
- **SSO / enterprise**: SCIM provisioning and additional OIDC/SAML providers as a future track.
- **Audit & compliance**: immutable audit log schema, export/retention policies, and per-tenant data access trails.
- **Rebalancer safety**: explicit two-phase cutover semantics (warmup readiness gates + drain completion signals) with operator-visible status.