# Milestone 9: Control Plane Consolidation **Goal:** One control plane, one API, one source of truth for project and infrastructure management. **Depends on:** M0 (Security), M1 (Foundation), M7 (CI/CD) --- ## 9.1 — Merge the Two Control Planes ### Current state There are two parallel control plane implementations: | | In-gateway `control_plane/` | Standalone `control-plane-api/` | |---|---|---| | **Binary** | Part of `control` binary | Separate `control-plane-api` binary | | **Auth** | Admin cookie (broken, fixed in M0) | None | | **API prefix** | `/platform/v1/*` | `/api/v1/*` | | **Features** | Project CRUD, user mgmt, key rotation, DB browser | Server provisioning, scaling, health, templates | | **Database** | Control DB (projects table) | Separate DB (servers, scaling_operations tables) | | **UI** | `web/admin.html` (Vue) | `control-plane-ui/` (React/MUI) | ### Recommended approach Merge `control-plane-api` server management into the gateway's control mode: 1. **Move server management routes** from `control-plane-api/src/lib.rs` to `control_plane/src/lib.rs` under `/platform/v1/servers`, `/platform/v1/scaling`, etc. 2. **Move the `ServerManager`** from `control-plane-api/src/server_manager.rs` into a new `control_plane/src/server_manager.rs`. 3. **Move provider code** from `control-plane-api/src/providers/` into `control_plane/src/providers/`. 4. **Consolidate the database schema.** Merge the `control-plane-api/migrations/001_initial.sql` tables (`servers`, `scaling_operations`, `cluster_events`, `server_metrics`) into the main migrations directory. 5. **Deprecate the standalone binary.** Remove `control-plane-api` from `Cargo.toml` workspace members. Keep the React UI if desired, but point it at the consolidated API. 6. **Use the admin auth** (fixed in M0) for all server management routes. ### Migration steps ```bash # 1. Copy server management code cp control-plane-api/src/server_manager.rs control_plane/src/ cp -r control-plane-api/src/providers/ control_plane/src/ cp control-plane-api/src/templates.rs control_plane/src/ cp control-plane-api/src/docker.rs control_plane/src/ # 2. Copy and merge migrations cp control-plane-api/migrations/001_initial.sql migrations/20260320000000_server_management.sql # 3. Update control_plane/src/lib.rs to add new routes # 4. Update control_plane/Cargo.toml for new dependencies (reqwest, ssh2, etc.) # 5. Remove control-plane-api from workspace ``` --- ## 9.2 — Fix Server Provisioning ### 9.2.1 Implement provision_server The current `provision_server` in `server_manager.rs` is a no-op. Wire it up: 1. Call `provider.create_server()` to create the VM 2. Wait for the VM to be reachable via SSH 3. Run bootstrap script (install Docker, pull images, configure services) 4. Register the server with the cluster 5. Update server status to "active" ### 9.2.2 Implement remove_server 1. Drain the server (remove from load balancer, wait for in-flight requests) 2. Stop services 3. Call `provider.delete_server()` to destroy the VM 4. Remove from database ### 9.2.3 Fix SQL parameter binding **File:** `server_manager.rs` — search for `$2` and verify each query has matching `.bind()` calls. The known bugs: - Line ~595: `WHERE id = $2` with only one `.bind(operation_id)` → should be `$1` - Line ~610: Same issue ### 9.2.4 Real health data Replace hardcoded `cluster_health()` and `get_pillar_stats()` with queries to VictoriaMetrics: ```rust async fn get_pillar_stats(&self) -> Result { let vm_url = std::env::var("VICTORIA_METRICS_URL")?; let client = reqwest::Client::new(); let cpu_query = format!("{}/api/v1/query?query=avg(rate(process_cpu_seconds_total[5m]))", vm_url); let resp = client.get(&cpu_query).send().await?; // Parse Prometheus response format } ``` --- ## 9.3 — Multi-Provider ### 9.3.1 DigitalOcean provider **File:** `control_plane/src/providers/digitalocean.rs` Implement using the DigitalOcean API v2: - `create_server`: POST /v2/droplets - `delete_server`: DELETE /v2/droplets/{id} - `get_server`: GET /v2/droplets/{id} - `list_servers`: GET /v2/droplets ### 9.3.2 Fix Hetzner plan validation **File:** `control_plane/src/providers/mod.rs` — `validate_plan` (line ~134) Correct the RAM mapping: - CX11: 2GB (not 4GB) - CX21: 4GB (not 8GB) - CX31: 8GB - CX41: 16GB ### 9.3.3 Add pagination to Hetzner list_servers The Hetzner API returns max 25 results per page. Implement pagination: ```rust let mut all_servers = Vec::new(); let mut page = 1; loop { let resp = client.get(&format!("{}/servers?page={}&per_page=50", api_url, page))...; let page_data: HetznerListResponse = resp.json().await?; all_servers.extend(page_data.servers); if page_data.meta.pagination.next_page.is_none() { break; } page += 1; } ``` --- ## Completion Requirements This milestone is **not complete** until every item below is satisfied. ### 1. Full Test Suite — All Green - [ ] `cargo test --workspace` passes with **zero failures** - [ ] All **pre-existing tests** still pass (no regressions) - [ ] **New tests** are written for the consolidated control plane: | Test | Location | What it validates | |------|----------|-------------------| | `test_list_servers` | `control_plane/src/server_manager.rs` | `GET /platform/v1/servers` returns server list | | `test_create_server_hetzner` | `control_plane/src/providers/hetzner.rs` | `provision_server` sends correct API payload (mock HTTP) | | `test_delete_server_hetzner` | `control_plane/src/providers/hetzner.rs` | `remove_server` sends DELETE to correct API endpoint (mock HTTP) | | `test_create_server_digitalocean` | `control_plane/src/providers/digitalocean.rs` | `provision_server` sends correct Droplet payload (mock HTTP) | | `test_hetzner_plan_validation` | `control_plane/src/providers/hetzner.rs` | CX11=2GB, CX21=4GB, CX31=8GB — correct RAM mapping | | `test_hetzner_pagination` | `control_plane/src/providers/hetzner.rs` | `list_servers` paginates through multiple pages | | `test_cluster_health_real_metrics` | `control_plane/src/lib.rs` | Health endpoint queries VictoriaMetrics (mock) and returns real CPU/mem | | `test_sql_parameter_binding` | `control_plane/src/lib.rs` | All queries use `$1` binding, not string interpolation | | `test_admin_auth_on_server_routes` | `control_plane/src/lib.rs` | `GET /platform/v1/servers` without admin auth returns 401 | | `test_old_control_plane_api_removed` | workspace | `control-plane-api` is not in `Cargo.toml` workspace members | ### 2. Integration Verification - [ ] All `/platform/v1/*` routes work through the consolidated control plane - [ ] Server provisioning creates a real Hetzner VM (integration test with API key) - [ ] Server removal destroys the VM - [ ] Cluster health returns real CPU/memory metrics (not hardcoded) - [ ] The old `control-plane-api` binary is no longer needed and has been removed from the workspace - [ ] Admin auth protects all server management routes - [ ] Scaling operations are recorded in the `scaling_operations` table ### 3. CI Gate - [ ] All unit tests (with mocked HTTP) run in `cargo test --workspace` - [ ] Integration tests against real cloud providers are gated behind `#[ignore]` and require `HETZNER_API_TOKEN` / `DO_API_TOKEN` env vars - [ ] `cargo build --workspace` succeeds without the old `control-plane-api` crate