# Tenant Subscriptions Plan (1 Tenant = 1 Subscription) ## Principles - Tenant-based billing is built-in and enforced consistently: - Exactly one “primary” subscription per tenant. - Subscription state is authoritative for entitlements. - Provider-agnostic core with a single “billing provider” adapter: - Stripe or Polar can be plugged in without rewriting the rest of the platform. - Tasks are prioritized by ordering: - Within each milestone, tasks are listed top-to-bottom in priority order. - Each milestone is stop-the-line gated: - All tasks completed - All milestone tests pass - Workspace verification commands pass - Webhooks are treated as untrusted input: - Verified signatures - Idempotent processing - No secrets are ever committed or logged - Fluent development progression: - Start with local-only, file-backed state + mocked provider - Add real provider sandbox integration behind env-gated tests - Add UI self-service once the state machine is stable - Enforce entitlements only after billing state is reliable ## Goals - Allow a tenant admin to self-serve billing: - Start a subscription (checkout) - Manage subscription and payment method (customer portal) - View current plan and billing status - Support Stripe or Polar as the billing backend. - Provide a strict, test-gated integration that is safe to deploy incrementally. - Keep API routes consistent with existing Control API conventions: - Tenant-scoped routes are under `/admin/v1/tenants/{tenant_id}/...` and require auth + tenant header. - Provider webhooks are unauthenticated but signature-verified. ## Non-Goals (Initial) - Multiple subscriptions per tenant. - Per-seat billing. - Multiple concurrent plans per tenant. - Usage-based metered billing (can be added later as a separate plan). ## Definitions ### Tenant A logical customer boundary identified by `tenant_id` (UUID) and carried via the tenant header already used by Control API endpoints. ### Tenant Admin (Actor) An authenticated principal with permission to manage billing for a tenant: - Read: requires `control:read` - Mutate (checkout/portal): requires `control:write` ### Subscription The provider subscription object mapped 1:1 to a tenant, with a local cached state: - `status`: `trialing | active | past_due | paused | canceled | incomplete` - `plan`: internal plan identifier (maps to provider price/product) - `current_period_end` / `cancel_at_period_end` ### Entitlements An internal set of feature gates derived from the subscription plan and status: - Examples: max deployments, max runners, S3 docs enabled, support tier, etc. ### Billing Provider An adapter that supplies: - Checkout session creation - Portal session creation - Webhook event verification + parsing - Optional reconciliation reads (fetch subscription/customer state) ## Configuration Contract (Control API) ### Common Settings - `CONTROL_BILLING_PROVIDER` = `stripe | polar` - `CONTROL_BILLING_STATE_PATH` (default `billing/dev.json`) - `CONTROL_BILLING_SELF_URL` (default `CONTROL_SELF_URL`, used for return URLs) - `CONTROL_BILLING_ENFORCEMENT` = `0 | 1` (default `0`, gates entitlement enforcement) - `CONTROL_BILLING_WEBHOOK_PUBLIC_URL` (optional; if unset, derive from `CONTROL_BILLING_SELF_URL`) - `CONTROL_BILLING_ALLOWED_RETURN_ORIGINS` (comma-separated; optional safety check for return URLs) ### Stripe Settings (if provider = stripe) - `CONTROL_STRIPE_SECRET_KEY` (secret) - `CONTROL_STRIPE_WEBHOOK_SECRET` (secret) - `CONTROL_STRIPE_PRICE_ID_` (e.g. `CONTROL_STRIPE_PRICE_ID_PRO`, env mapping per plan) - Optional: - `CONTROL_STRIPE_CUSTOMER_PORTAL_CONFIGURATION_ID` ### Polar Settings (if provider = polar) - `CONTROL_POLAR_ACCESS_TOKEN` (secret) - `CONTROL_POLAR_WEBHOOK_SECRET` (secret, if Polar provides webhook signing secret) - `CONTROL_POLAR_PRODUCT_ID_` or equivalent plan mapping ## Data Model (MVP: File-Backed, Tenant-Scoped) Persist subscription mappings in a JSON file, similar to `PlacementStore`’s atomic write pattern, to support: - Local development without requiring a database - Deterministic integration tests - Simple operational inspection *Note: For production, this should eventually adopt the `ConfigRegistry` pattern (e.g. backed by NATS KV) to avoid reliance on persistent file storage in Docker Swarm.* Suggested persisted structure: - `BillingStateFile`: - `revision` (uuid-based) - `tenants: { : TenantBillingState }` - `TenantBillingState`: - `provider: stripe | polar` - `provider_customer_id` - `provider_subscription_id` - `provider_checkout_session_id` (last initiated; optional) - `status` - `plan` - `current_period_end` - `cancel_at_period_end` - `processed_webhook_event_ids` (bounded set; for idempotency) - `updated_at` Idempotency constraints: - Webhook event IDs are stored per tenant, capped to a fixed size (e.g. last 256 IDs) to prevent unbounded growth. - Updates are monotonic: - prefer provider event timestamps to ignore out-of-order “older” state transitions. ## Target Architecture ### Control API (Rust) - New billing routes: - `GET /admin/v1/tenants/{tenant_id}/billing` (read current billing + entitlements) - `POST /admin/v1/tenants/{tenant_id}/billing/checkout` (create checkout session URL) - `POST /admin/v1/tenants/{tenant_id}/billing/portal` (create portal session URL) - `POST /billing/v1/webhooks/{provider}` (provider webhook ingress; does not require auth) - Billing policy enforcement: - Entitlements derived server-side - Per-endpoint enforcement can be introduced gradually behind a feature flag ### Control UI (Vite + React) - New “Billing” page scoped to a tenant: - Current plan + status - “Upgrade / Subscribe” (checkout) - “Manage billing” (portal) - Clear error states when billing is not configured ## Provider Contract (Adapter Surface) Define a small provider interface so the platform remains stable even if switching providers: - `create_checkout_session(tenant_id, plan, return_url) -> url` - `create_portal_session(tenant_id, return_url) -> url` - `verify_and_parse_webhook(headers, body) -> BillingEvent` - `apply_event(event) -> TenantBillingState mutation` - Optional: `reconcile(tenant_id) -> TenantBillingState` (periodic correction) Provider mapping requirements: - Persist tenant identity at the provider level: - Prefer setting `tenant_id` as provider customer metadata. - If customer metadata is not available, store an internal mapping from `provider_customer_id -> tenant_id`. - Ensure subscription creation is single-flight per tenant: - Prevent duplicate active subscriptions by checking local state before creating new sessions. - Use provider idempotency keys where supported (or internal idempotency per tenant+plan). ## Security & Abuse Controls - AuthZ: - Tenant routes require the existing tenant header to match the path tenant ID. - `control:read` required for viewing billing status. - `control:write` required for checkout and portal actions. - Return URL safety: - Only allow return URLs whose origin is in `CONTROL_BILLING_ALLOWED_RETURN_ORIGINS`. - Default return URL points to Control UI, derived from `CONTROL_BILLING_SELF_URL`. - Webhook safety & observability: - Verify signatures before parsing payloads. - Enforce JSON size limits on webhook bodies. - Always return `2xx` for already-processed events (idempotency). - Never log full webhook payloads. - Propagate provider event IDs as `x-correlation-id` in logs and spans to integrate seamlessly with the platform's VictoriaMetrics/Loki/Tempo observability stack (as standard in `DEVELOPMENT_PLAN.md`). ## API Contract (MVP) ### GET /admin/v1/tenants/{tenant_id}/billing Returns a stable shape whether billing is configured or not: - `configured: bool` - `provider: stripe | polar | null` - `plan: string | null` - `status: string | null` - `current_period_end: string | null` - `cancel_at_period_end: bool | null` - `entitlements: { ... }` ### POST /admin/v1/tenants/{tenant_id}/billing/checkout Request: - `plan: string` - `return_path: string` (optional; appended to `CONTROL_BILLING_SELF_URL`) Response: - `url: string` ### POST /admin/v1/tenants/{tenant_id}/billing/portal Request: - `return_path: string` (optional) Response: - `url: string` ### POST /billing/v1/webhooks/{provider} Provider-defined payload; must: - verify signature - map to internal events - update local billing state atomically ## Development Plan (Milestones by Dependency) ## Milestone 0: Billing Domain + Storage + Read API ### Dependencies - None ### Goal Ship a provider-agnostic billing domain model and a safe persistence mechanism without contacting Stripe/Polar yet. ### Tasks - [x] Add billing domain types in Control API: - [x] `Plan`, `SubscriptionStatus`, `Entitlements` - [x] provider-agnostic `BillingEvent` enum for webhook mapping - [x] Add `BillingStore` patterned after `PlacementStore`/`ConfigRegistry`: - [x] atomic write (tmp + rename) for dev file fallback - [x] in-process locking - [x] stable JSON schema + `revision` - [x] Add `GET /admin/v1/tenants/{tenant_id}/billing`: - [x] permission gate: requires `control:read` - [x] tenant header enforcement consistent with existing routes - [x] returns “not configured” when no subscription exists - [x] Add a mock billing provider for tests: - [x] deterministic checkout/portal URLs - [x] deterministic webhook events without real signatures ### Required Tests (Gate) - [x] Workspace verification commands - [x] Unit tests (Control API): - [x] billing state read/write roundtrip (atomic update) - [x] entitlement derivation from `status + plan` - [x] tenant isolation checks for billing routes (header vs path mismatch) - [x] permission gates: `control:read` vs `control:write` ## Milestone 1: Checkout Flow (Create Subscription) ### Dependencies - Milestone 0 ### Goal Allow tenant admins to initiate a subscription via the provider’s hosted checkout. ### Tasks - [x] Add provider configuration parsing and validation: - [x] strict env parsing with actionable errors - [x] plan-to-price/product mapping via env - [x] Add `POST /admin/v1/tenants/{tenant_id}/billing/checkout`: - [x] permission gate: requires `control:write` - [x] create or reuse provider customer for the tenant - [x] create checkout session and return redirect URL - [x] include tenant identifier in provider metadata (for webhook routing) - [x] internal idempotency: do not create a new checkout if tenant already has an active/trialing subscription - [x] Define return URL contract: - [x] checkout success/cancel landing routes in Control UI - [x] validate `return_path` against `CONTROL_BILLING_ALLOWED_RETURN_ORIGINS` ### Required Tests (Gate) - [x] Workspace verification commands - [x] Unit tests (Control API): - [x] config validation (missing keys, invalid mapping) - [x] provider request construction (return URLs, metadata) - [x] checkout idempotency rules per tenant - [x] Env-gated integration tests (sandbox; auto-skip unless env vars are set): - [x] `CONTROL_TEST_STRIPE=1` or `CONTROL_TEST_POLAR=1` starts checkout and returns a valid URL - [x] tenant metadata roundtrips through the provider (where supported) ## Milestone 2: Webhook Ingestion + Subscription State Sync ### Dependencies - Milestone 1 ### Goal Make subscription state reliable and idempotent by processing provider webhooks. ### Tasks - [x] Add `POST /billing/v1/webhooks/{provider}` endpoint: - [x] signature verification - [x] event parsing to `BillingEvent` - [x] idempotency by provider event ID - [x] tenant mapping via provider metadata or stored `provider_customer_id` - [x] Map provider statuses to internal `SubscriptionStatus`: - [x] `trialing`, `active`, `past_due`, `canceled`, etc. - [x] Store updates in `BillingStore` and expose via `GET /tenants/{tenant_id}/billing` - [x] ensure updates are monotonic (ignore older provider event timestamps) ### Required Tests (Gate) - [x] Workspace verification commands - [x] Unit tests (Control API): - [x] webhook signature verification (good/bad signatures) - [x] idempotency behavior (same event twice does not double-apply) - [x] status mapping tables are stable - [x] out-of-order events do not regress state - [x] Docker/local integration (optional, if a provider CLI is used; env-gated): - [x] `CONTROL_TEST_STRIPE_CLI=1` runs a local webhook-forward flow and verifies state update ## Milestone 3: Customer Portal (Self-Management) ### Dependencies - Milestone 2 ### Goal Provide a “Manage billing” path for tenants to self-serve changes without operator involvement. ### Tasks - [x] Add `POST /admin/v1/tenants/{tenant_id}/billing/portal`: - [x] create provider portal session and return URL - [x] ensure tenant ownership checks (header vs path) - [x] permission gate: requires `control:write` - [ ] Add Control UI billing page: - [ ] show plan/status + renewal date - [ ] “Subscribe / Upgrade” and “Manage billing” actions - [ ] show “Billing not configured” when provider is disabled ### Required Tests (Gate) - [x] Workspace verification commands - [ ] UI unit tests (Vitest): - [ ] billing page renders from mocked API state - [ ] action buttons call the expected API endpoints - [x] Env-gated integration tests: - [x] portal session URL is generated and is HTTPS ## Milestone 4: Entitlements + Enforcement (Controlled Rollout) ### Dependencies - Milestone 2 (Milestone 3 recommended for admin UX) ### Goal Gate selected platform capabilities by tenant subscription state while maintaining a safe rollout path. ### Tasks - [x] Define initial entitlement set and defaults: - [x] choose “free/trial” behavior (read-only vs limited capability) - [x] define grace period behavior for `past_due` - [x] Add enforcement points in Control API: - [x] middleware/helper to require entitlement per route - [x] first enforcement target: a low-risk, tenant-scoped “write” capability - [x] feature flag to disable enforcement globally during rollout - [x] Add audit log entries for billing enforcement denials (no PII, no secrets) ### Required Tests (Gate) - [x] Workspace verification commands - [x] Unit tests (Control API): - [x] entitlement checks per route return correct HTTP status - [x] grace period handling - [x] Integration tests: - [x] a tenant without active subscription cannot perform the gated operation - [x] an active tenant can perform the same operation ## Milestone 5: Reconciliation + Operational Hardening ### Dependencies - Milestone 2 ### Goal Make billing state resilient against missed webhooks and operational drift. ### Tasks - [x] Add a reconciliation job: - [x] periodically fetch subscription state from provider for tenants - [x] correct local state and emit audit entries - [x] Add metrics: - [x] webhook processing latency, verification failures, idempotency hits - [x] tenant count by subscription status - [x] Add robust error handling: - [x] structured errors with safe messages - [x] no provider payloads logged verbatim - [x] Add provider API timeout/retry policy: - [x] short timeouts with bounded retries - [x] no retries on webhook signature failures ### Required Tests (Gate) - [x] Workspace verification commands - [x] Unit tests: - [x] reconciliation updates state correctly - [x] provider errors do not corrupt local state ## Milestone 6: Production Rollout ### Dependencies - Milestone 3 (recommended), Milestone 4 (if enforcing) ### Goal Deploy billing in production with safe secret handling and verifiable smoke checks. ### Tasks - [x] Provision provider configuration (operator): - [x] create products/prices (Stripe) or products/plans (Polar) - [x] configure webhook endpoint + secret - [x] set up customer portal settings (Stripe) if used - [x] Configure Swarm secrets and stack env: - [x] provider API keys and webhook secret stored as Swarm secrets - [x] `CONTROL_BILLING_PROVIDER`, `CONTROL_BILLING_STATE_PATH` - [x] `CONTROL_BILLING_ALLOWED_RETURN_ORIGINS` set to production UI origins - [x] Define rollback plan: - [x] disable enforcement feature flag - [x] keep billing read-only operational ### Required Tests (Gate) - [x] Workspace verification commands - [x] Production smoke (env-gated): - [x] create checkout session for a test tenant - [x] process a webhook event and verify tenant state updates - [x] generate a portal session URL ## Workspace Verification Commands - `cargo fmt --check` - `cargo clippy --workspace --all-targets -- -D warnings` - `cargo test --workspace` - `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build`