feat(billing): implement tenant subscription entitlements system (milestones 0-6)
This commit is contained in:
@@ -339,3 +339,119 @@ This plan is intentionally aligned with the style and gating discipline used in
|
||||
- verify Grafana dashboards provisioned and VictoriaMetrics receives samples
|
||||
- [x] **T7.3** End-to-end “control plane can see the fleet” test (requires docker)
|
||||
- UI/API can query placement + health snapshots for all services
|
||||
|
||||
---
|
||||
|
||||
## Milestone 8: Config Registry + Safe Change Management (Plan/Apply/Rollback)
|
||||
|
||||
**Goal:** Make configuration first-class, versioned, validated, and safely mutable from the control plane, while keeping production and development sources consistent.
|
||||
|
||||
### Dependencies
|
||||
- Milestone 2 (Control Plane API foundation)
|
||||
- Milestone 5 (safe mutations baseline)
|
||||
- Milestone 7 (Swarm deployment baseline)
|
||||
|
||||
### Exit Criteria
|
||||
- Operators can list, view, validate, and safely apply config changes with audit + idempotent jobs
|
||||
- Config changes have revision semantics and are roll-backable
|
||||
- Gatekeeper safety checks prevent applying invalid or unsafe configs
|
||||
|
||||
### Tasks
|
||||
- [x] **8.1** Inventory and classify configuration surfaces (platform-wide)
|
||||
- classify as: static boot config (env/secrets), dynamic runtime config (KV), large immutable artifacts (S3/docs)
|
||||
- map current sources per domain:
|
||||
- Gateway routing config (`config/routing/dev.json` / production KV)
|
||||
- Placement config (`config/placement/dev.json` / production KV)
|
||||
- Runner definitions (effects/sagas) (documents/S3) and activation config (KV)
|
||||
- Observability provisioning (Swarm configs + repo-managed assets)
|
||||
- Control plane feature flags (KV)
|
||||
- [~] **8.2** Define a Config Registry contract in the Control API
|
||||
- **Implemented (initial)**:
|
||||
- config identity: `{domain}` (routing|placement)
|
||||
- metadata: `revision` (KV revision when using NATS), and `source` info (file vs nats)
|
||||
- storage policy per config: `source=dev_file | nats_kv`
|
||||
- **Still needed**:
|
||||
- `{domain, name, scope}` and richer metadata (`updated_at`, `updated_by`, `sha256`)
|
||||
- history API for KV-backed configs
|
||||
- [x] **8.3** Implement config storage abstraction (dev + prod)
|
||||
- dev: file-backed, atomic write (tmp + rename), hot-reload where applicable
|
||||
- prod: NATS KV for dynamic configs (revisioned values + watch streams)
|
||||
- consistent error model: decode/validate/source errors are distinguishable and safe
|
||||
- [x] **8.4** Add read-only config APIs
|
||||
- `GET /admin/v1/config` list domains
|
||||
- `GET /admin/v1/config/{domain}` fetch current value + revision + source
|
||||
- (history not implemented yet)
|
||||
- [~] **8.5** Add validate/plan/apply/rollback mutation workflows as jobs
|
||||
- **Implemented**:
|
||||
- `POST /admin/v1/jobs/config/validate` (job, idempotency key required)
|
||||
- `POST /admin/v1/jobs/config/apply` (job, idempotency key required, backup + apply)
|
||||
- `POST /admin/v1/jobs/config/rollback` (job, idempotency key required, restore last backup)
|
||||
- per-domain locking to avoid concurrent config mutations
|
||||
- **Still needed**:
|
||||
- `POST /admin/v1/plan/config/apply` deterministic plan (diff + impacted services)
|
||||
- richer post-conditions (routing resolution sampling, fleet consistency checks, etc.)
|
||||
- [~] **8.6** Implement initial config domains end-to-end
|
||||
- **Gateway routing config**:
|
||||
- implemented: schema validation via JSON decode
|
||||
- still needed: semantic validation (tenant entries/shard directories/endpoints URL parsing) + sampled routing verification
|
||||
- **Placement config**:
|
||||
- implemented: schema validation via JSON decode
|
||||
- still needed: semantic validation (targets non-empty, etc.) + fleet snapshot consistency checks
|
||||
- [x] **8.7** Implement Admin UI “Config” page for safe operations
|
||||
- list + view configs with revision/sha/audit linkage
|
||||
- editor for JSON (and YAML when supported by the domain)
|
||||
- validate button (server-side) and apply/rollback flows as jobs with reason required
|
||||
|
||||
### Tests
|
||||
- [x] **T8.1** Unit tests: config decode/encode stability for each config domain
|
||||
- routing/placement decode is enforced by server-side validate job (schema-level)
|
||||
- [ ] **T8.2** Unit tests: validation rejects unsafe configs with stable error codes/messages
|
||||
- [ ] **T8.3** Unit tests: plan generation is deterministic for same inputs
|
||||
- [x] **T8.4** Integration tests (env-gated):
|
||||
- NATS KV config apply + rollback via Control API (requires `CONTROL_TEST_NATS=1` + `CONTROL_TEST_NATS_URL`)
|
||||
- (Gateway route-resolution E2E verification still pending)
|
||||
- [x] **T8.5** UI tests: config page renders, validate/apply/rollback flows navigate to job progress
|
||||
|
||||
---
|
||||
|
||||
## Milestone 9: Control Node Management (Inventory, Drift, and Safer Ops)
|
||||
|
||||
**Goal:** Improve how the control plane understands and manages the live control node and platform state: node inventory, config drift detection, and safer operational guardrails.
|
||||
|
||||
### Dependencies
|
||||
- Milestone 7 (Swarm deployment baseline)
|
||||
- Milestone 8 (config registry + safe change management)
|
||||
|
||||
### Exit Criteria
|
||||
- Control plane provides a reliable “what is running vs what should be running” view
|
||||
- Config drift is detectable and actionable
|
||||
- Core operational actions are guarded by preflight checks and produce audit trails
|
||||
|
||||
### Tasks
|
||||
- [x] **9.1** Define a “desired vs observed” model for platform state
|
||||
- desired: Swarm stacks + config registry revisions
|
||||
- observed: live service/task state + effective runtime configs
|
||||
- drift categories: missing, extra, version mismatch, config mismatch, unhealthy
|
||||
- [~] **9.2** Improve Swarm observation fidelity
|
||||
- implemented (initial): docker-cli-backed Swarm observation (`CONTROL_SWARM_MODE=docker`)
|
||||
- still needed: direct Docker API client (avoid shelling out), richer normalization, and wiring into production stacks
|
||||
- keep file source as a dev fallback for deterministic tests
|
||||
- normalize service identity: `{service, image_tag, git_sha, updated_at}`
|
||||
- [x] **9.3** Add drift APIs and UI views
|
||||
- `GET /admin/v1/platform/drift` returns drift summary + actionable items
|
||||
- UI: “Platform Drift” page with filters and links to remediate jobs
|
||||
- [ ] **9.4** Add safer operational guardrails as reusable checks
|
||||
- preflight checks for:
|
||||
- service unhealthy / crashloop
|
||||
- tenant migration safety thresholds (lag/inflight)
|
||||
- config apply safety (impact radius, sampled verify)
|
||||
- consistent failure modes: clear reason + audit entry, no partial side effects
|
||||
- [ ] **9.5** Add operational playbooks as executable checks
|
||||
- post-deploy verification suite callable as an idempotent job
|
||||
- rollback verification suite callable as an idempotent job
|
||||
|
||||
### Tests
|
||||
- [x] **T9.1** Unit tests: drift classification for synthetic desired/observed fixtures
|
||||
- [x] **T9.2** Integration tests (docker-gated): drift view detects intentional mismatches in a local Swarm
|
||||
- requires `CONTROL_TEST_DOCKER=1` and an active local Swarm node
|
||||
- [x] **T9.3** UI tests: drift page renders in route smoke test
|
||||
|
||||
Reference in New Issue
Block a user