16 KiB
Tenant Subscriptions Plan (1 Tenant = 1 Subscription)
Principles
- Tenant-based billing is built-in and enforced consistently:
- Exactly one “primary” subscription per tenant.
- Subscription state is authoritative for entitlements.
- Provider-agnostic core with a single “billing provider” adapter:
- Stripe or Polar can be plugged in without rewriting the rest of the platform.
- Tasks are prioritized by ordering:
- Within each milestone, tasks are listed top-to-bottom in priority order.
- Each milestone is stop-the-line gated:
- All tasks completed
- All milestone tests pass
- Workspace verification commands pass
- Webhooks are treated as untrusted input:
- Verified signatures
- Idempotent processing
- No secrets are ever committed or logged
- Fluent development progression:
- Start with local-only, file-backed state + mocked provider
- Add real provider sandbox integration behind env-gated tests
- Add UI self-service once the state machine is stable
- Enforce entitlements only after billing state is reliable
Goals
- Allow a tenant admin to self-serve billing:
- Start a subscription (checkout)
- Manage subscription and payment method (customer portal)
- View current plan and billing status
- Support Stripe or Polar as the billing backend.
- Provide a strict, test-gated integration that is safe to deploy incrementally.
- Keep API routes consistent with existing Control API conventions:
- Tenant-scoped routes are under
/admin/v1/tenants/{tenant_id}/...and require auth + tenant header. - Provider webhooks are unauthenticated but signature-verified.
- Tenant-scoped routes are under
Non-Goals (Initial)
- Multiple subscriptions per tenant.
- Per-seat billing.
- Multiple concurrent plans per tenant.
- Usage-based metered billing (can be added later as a separate plan).
Definitions
Tenant
A logical customer boundary identified by tenant_id (UUID) and carried via the tenant header already used by Control API endpoints.
Tenant Admin (Actor)
An authenticated principal with permission to manage billing for a tenant:
- Read: requires
control:read - Mutate (checkout/portal): requires
control:write
Subscription
The provider subscription object mapped 1:1 to a tenant, with a local cached state:
status:trialing | active | past_due | paused | canceled | incompleteplan: internal plan identifier (maps to provider price/product)current_period_end/cancel_at_period_end
Entitlements
An internal set of feature gates derived from the subscription plan and status:
- Examples: max deployments, max runners, S3 docs enabled, support tier, etc.
Billing Provider
An adapter that supplies:
- Checkout session creation
- Portal session creation
- Webhook event verification + parsing
- Optional reconciliation reads (fetch subscription/customer state)
Configuration Contract (Control API)
Common Settings
CONTROL_BILLING_PROVIDER=stripe | polarCONTROL_BILLING_STATE_PATH(defaultbilling/dev.json)CONTROL_BILLING_SELF_URL(defaultCONTROL_SELF_URL, used for return URLs)CONTROL_BILLING_ENFORCEMENT=0 | 1(default0, gates entitlement enforcement)CONTROL_BILLING_WEBHOOK_PUBLIC_URL(optional; if unset, derive fromCONTROL_BILLING_SELF_URL)CONTROL_BILLING_ALLOWED_RETURN_ORIGINS(comma-separated; optional safety check for return URLs)
Stripe Settings (if provider = stripe)
CONTROL_STRIPE_SECRET_KEY(secret)CONTROL_STRIPE_WEBHOOK_SECRET(secret)CONTROL_STRIPE_PRICE_ID_<PLAN>(e.g.CONTROL_STRIPE_PRICE_ID_PRO, env mapping per plan)- Optional:
CONTROL_STRIPE_CUSTOMER_PORTAL_CONFIGURATION_ID
Polar Settings (if provider = polar)
CONTROL_POLAR_ACCESS_TOKEN(secret)CONTROL_POLAR_WEBHOOK_SECRET(secret, if Polar provides webhook signing secret)CONTROL_POLAR_PRODUCT_ID_<PLAN>or equivalent plan mapping
Data Model (MVP: File-Backed, Tenant-Scoped)
Persist subscription mappings in a JSON file, similar to PlacementStore’s atomic write pattern, to support:
- Local development without requiring a database
- Deterministic integration tests
- Simple operational inspection
Note: For production, this should eventually adopt the ConfigRegistry pattern (e.g. backed by NATS KV) to avoid reliance on persistent file storage in Docker Swarm.
Suggested persisted structure:
BillingStateFile:revision(uuid-based)tenants: { <tenant_id>: TenantBillingState }
TenantBillingState:provider: stripe | polarprovider_customer_idprovider_subscription_idprovider_checkout_session_id(last initiated; optional)statusplancurrent_period_endcancel_at_period_endprocessed_webhook_event_ids(bounded set; for idempotency)updated_at
Idempotency constraints:
- Webhook event IDs are stored per tenant, capped to a fixed size (e.g. last 256 IDs) to prevent unbounded growth.
- Updates are monotonic:
- prefer provider event timestamps to ignore out-of-order “older” state transitions.
Target Architecture
Control API (Rust)
- New billing routes:
GET /admin/v1/tenants/{tenant_id}/billing(read current billing + entitlements)POST /admin/v1/tenants/{tenant_id}/billing/checkout(create checkout session URL)POST /admin/v1/tenants/{tenant_id}/billing/portal(create portal session URL)POST /billing/v1/webhooks/{provider}(provider webhook ingress; does not require auth)
- Billing policy enforcement:
- Entitlements derived server-side
- Per-endpoint enforcement can be introduced gradually behind a feature flag
Control UI (Vite + React)
- New “Billing” page scoped to a tenant:
- Current plan + status
- “Upgrade / Subscribe” (checkout)
- “Manage billing” (portal)
- Clear error states when billing is not configured
Provider Contract (Adapter Surface)
Define a small provider interface so the platform remains stable even if switching providers:
create_checkout_session(tenant_id, plan, return_url) -> urlcreate_portal_session(tenant_id, return_url) -> urlverify_and_parse_webhook(headers, body) -> BillingEventapply_event(event) -> TenantBillingState mutation- Optional:
reconcile(tenant_id) -> TenantBillingState(periodic correction)
Provider mapping requirements:
- Persist tenant identity at the provider level:
- Prefer setting
tenant_idas provider customer metadata. - If customer metadata is not available, store an internal mapping from
provider_customer_id -> tenant_id.
- Prefer setting
- Ensure subscription creation is single-flight per tenant:
- Prevent duplicate active subscriptions by checking local state before creating new sessions.
- Use provider idempotency keys where supported (or internal idempotency per tenant+plan).
Security & Abuse Controls
- AuthZ:
- Tenant routes require the existing tenant header to match the path tenant ID.
control:readrequired for viewing billing status.control:writerequired for checkout and portal actions.
- Return URL safety:
- Only allow return URLs whose origin is in
CONTROL_BILLING_ALLOWED_RETURN_ORIGINS. - Default return URL points to Control UI, derived from
CONTROL_BILLING_SELF_URL.
- Only allow return URLs whose origin is in
- Webhook safety & observability:
- Verify signatures before parsing payloads.
- Enforce JSON size limits on webhook bodies.
- Always return
2xxfor already-processed events (idempotency). - Never log full webhook payloads.
- Propagate provider event IDs as
x-correlation-idin logs and spans to integrate seamlessly with the platform's VictoriaMetrics/Loki/Tempo observability stack (as standard inDEVELOPMENT_PLAN.md).
API Contract (MVP)
GET /admin/v1/tenants/{tenant_id}/billing
Returns a stable shape whether billing is configured or not:
configured: boolprovider: stripe | polar | nullplan: string | nullstatus: string | nullcurrent_period_end: string | nullcancel_at_period_end: bool | nullentitlements: { ... }
POST /admin/v1/tenants/{tenant_id}/billing/checkout
Request:
plan: stringreturn_path: string(optional; appended toCONTROL_BILLING_SELF_URL) Response:url: string
POST /admin/v1/tenants/{tenant_id}/billing/portal
Request:
return_path: string(optional) Response:url: string
POST /billing/v1/webhooks/{provider}
Provider-defined payload; must:
- verify signature
- map to internal events
- update local billing state atomically
Development Plan (Milestones by Dependency)
Milestone 0: Billing Domain + Storage + Read API
Dependencies
- None
Goal
Ship a provider-agnostic billing domain model and a safe persistence mechanism without contacting Stripe/Polar yet.
Tasks
- Add billing domain types in Control API:
Plan,SubscriptionStatus,Entitlements- provider-agnostic
BillingEventenum for webhook mapping
- Add
BillingStorepatterned afterPlacementStore/ConfigRegistry:- atomic write (tmp + rename) for dev file fallback
- in-process locking
- stable JSON schema +
revision
- Add
GET /admin/v1/tenants/{tenant_id}/billing:- permission gate: requires
control:read - tenant header enforcement consistent with existing routes
- returns “not configured” when no subscription exists
- permission gate: requires
- Add a mock billing provider for tests:
- deterministic checkout/portal URLs
- deterministic webhook events without real signatures
Required Tests (Gate)
- Workspace verification commands
- Unit tests (Control API):
- billing state read/write roundtrip (atomic update)
- entitlement derivation from
status + plan - tenant isolation checks for billing routes (header vs path mismatch)
- permission gates:
control:readvscontrol:write
Milestone 1: Checkout Flow (Create Subscription)
Dependencies
- Milestone 0
Goal
Allow tenant admins to initiate a subscription via the provider’s hosted checkout.
Tasks
- Add provider configuration parsing and validation:
- strict env parsing with actionable errors
- plan-to-price/product mapping via env
- Add
POST /admin/v1/tenants/{tenant_id}/billing/checkout:- permission gate: requires
control:write - create or reuse provider customer for the tenant
- create checkout session and return redirect URL
- include tenant identifier in provider metadata (for webhook routing)
- internal idempotency: do not create a new checkout if tenant already has an active/trialing subscription
- permission gate: requires
- Define return URL contract:
- checkout success/cancel landing routes in Control UI
- validate
return_pathagainstCONTROL_BILLING_ALLOWED_RETURN_ORIGINS
Required Tests (Gate)
- Workspace verification commands
- Unit tests (Control API):
- config validation (missing keys, invalid mapping)
- provider request construction (return URLs, metadata)
- checkout idempotency rules per tenant
- Env-gated integration tests (sandbox; auto-skip unless env vars are set):
CONTROL_TEST_STRIPE=1orCONTROL_TEST_POLAR=1starts checkout and returns a valid URL- tenant metadata roundtrips through the provider (where supported)
Milestone 2: Webhook Ingestion + Subscription State Sync
Dependencies
- Milestone 1
Goal
Make subscription state reliable and idempotent by processing provider webhooks.
Tasks
- Add
POST /billing/v1/webhooks/{provider}endpoint:- signature verification
- event parsing to
BillingEvent - idempotency by provider event ID
- tenant mapping via provider metadata or stored
provider_customer_id
- Map provider statuses to internal
SubscriptionStatus:trialing,active,past_due,canceled, etc.
- Store updates in
BillingStoreand expose viaGET /tenants/{tenant_id}/billing- ensure updates are monotonic (ignore older provider event timestamps)
Required Tests (Gate)
- Workspace verification commands
- Unit tests (Control API):
- webhook signature verification (good/bad signatures)
- idempotency behavior (same event twice does not double-apply)
- status mapping tables are stable
- out-of-order events do not regress state
- Docker/local integration (optional, if a provider CLI is used; env-gated):
CONTROL_TEST_STRIPE_CLI=1runs a local webhook-forward flow and verifies state update
Milestone 3: Customer Portal (Self-Management)
Dependencies
- Milestone 2
Goal
Provide a “Manage billing” path for tenants to self-serve changes without operator involvement.
Tasks
- Add
POST /admin/v1/tenants/{tenant_id}/billing/portal:- create provider portal session and return URL
- ensure tenant ownership checks (header vs path)
- permission gate: requires
control:write
- Add Control UI billing page:
- show plan/status + renewal date
- “Subscribe / Upgrade” and “Manage billing” actions
- show “Billing not configured” when provider is disabled
Required Tests (Gate)
- Workspace verification commands
- UI unit tests (Vitest):
- billing page renders from mocked API state
- action buttons call the expected API endpoints
- Env-gated integration tests:
- portal session URL is generated and is HTTPS
Milestone 4: Entitlements + Enforcement (Controlled Rollout)
Dependencies
- Milestone 2 (Milestone 3 recommended for admin UX)
Goal
Gate selected platform capabilities by tenant subscription state while maintaining a safe rollout path.
Tasks
- Define initial entitlement set and defaults:
- choose “free/trial” behavior (read-only vs limited capability)
- define grace period behavior for
past_due
- Add enforcement points in Control API:
- middleware/helper to require entitlement per route
- first enforcement target: a low-risk, tenant-scoped “write” capability
- feature flag to disable enforcement globally during rollout
- Add audit log entries for billing enforcement denials (no PII, no secrets)
Required Tests (Gate)
- Workspace verification commands
- Unit tests (Control API):
- entitlement checks per route return correct HTTP status
- grace period handling
- Integration tests:
- a tenant without active subscription cannot perform the gated operation
- an active tenant can perform the same operation
Milestone 5: Reconciliation + Operational Hardening
Dependencies
- Milestone 2
Goal
Make billing state resilient against missed webhooks and operational drift.
Tasks
- Add a reconciliation job:
- periodically fetch subscription state from provider for tenants
- correct local state and emit audit entries
- Add metrics:
- webhook processing latency, verification failures, idempotency hits
- tenant count by subscription status
- Add robust error handling:
- structured errors with safe messages
- no provider payloads logged verbatim
- Add provider API timeout/retry policy:
- short timeouts with bounded retries
- no retries on webhook signature failures
Required Tests (Gate)
- Workspace verification commands
- Unit tests:
- reconciliation updates state correctly
- provider errors do not corrupt local state
Milestone 6: Production Rollout
Dependencies
- Milestone 3 (recommended), Milestone 4 (if enforcing)
Goal
Deploy billing in production with safe secret handling and verifiable smoke checks.
Tasks
- Provision provider configuration (operator):
- create products/prices (Stripe) or products/plans (Polar)
- configure webhook endpoint + secret
- set up customer portal settings (Stripe) if used
- Configure Swarm secrets and stack env:
- provider API keys and webhook secret stored as Swarm secrets
CONTROL_BILLING_PROVIDER,CONTROL_BILLING_STATE_PATHCONTROL_BILLING_ALLOWED_RETURN_ORIGINSset to production UI origins
- Define rollback plan:
- disable enforcement feature flag
- keep billing read-only operational
Required Tests (Gate)
- Workspace verification commands
- Production smoke (env-gated):
- create checkout session for a test tenant
- process a webhook event and verify tenant state updates
- generate a portal session URL
Workspace Verification Commands
cargo fmt --checkcargo clippy --workspace --all-targets -- -D warningscargo test --workspacecd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build