7.2 KiB
Milestone 9: Control Plane Consolidation
Goal: One control plane, one API, one source of truth for project and infrastructure management.
Depends on: M0 (Security), M1 (Foundation), M7 (CI/CD)
9.1 — Merge the Two Control Planes
Current state
There are two parallel control plane implementations:
In-gateway control_plane/ |
Standalone control-plane-api/ |
|
|---|---|---|
| Binary | Part of control binary |
Separate control-plane-api binary |
| Auth | Admin cookie (broken, fixed in M0) | None |
| API prefix | /platform/v1/* |
/api/v1/* |
| Features | Project CRUD, user mgmt, key rotation, DB browser | Server provisioning, scaling, health, templates |
| Database | Control DB (projects table) | Separate DB (servers, scaling_operations tables) |
| UI | web/admin.html (Vue) |
control-plane-ui/ (React/MUI) |
Recommended approach
Merge control-plane-api server management into the gateway's control mode:
-
Move server management routes from
control-plane-api/src/lib.rstocontrol_plane/src/lib.rsunder/platform/v1/servers,/platform/v1/scaling, etc. -
Move the
ServerManagerfromcontrol-plane-api/src/server_manager.rsinto a newcontrol_plane/src/server_manager.rs. -
Move provider code from
control-plane-api/src/providers/intocontrol_plane/src/providers/. -
Consolidate the database schema. Merge the
control-plane-api/migrations/001_initial.sqltables (servers,scaling_operations,cluster_events,server_metrics) into the main migrations directory. -
Deprecate the standalone binary. Remove
control-plane-apifromCargo.tomlworkspace members. Keep the React UI if desired, but point it at the consolidated API. -
Use the admin auth (fixed in M0) for all server management routes.
Migration steps
# 1. Copy server management code
cp control-plane-api/src/server_manager.rs control_plane/src/
cp -r control-plane-api/src/providers/ control_plane/src/
cp control-plane-api/src/templates.rs control_plane/src/
cp control-plane-api/src/docker.rs control_plane/src/
# 2. Copy and merge migrations
cp control-plane-api/migrations/001_initial.sql migrations/20260320000000_server_management.sql
# 3. Update control_plane/src/lib.rs to add new routes
# 4. Update control_plane/Cargo.toml for new dependencies (reqwest, ssh2, etc.)
# 5. Remove control-plane-api from workspace
9.2 — Fix Server Provisioning
9.2.1 Implement provision_server
The current provision_server in server_manager.rs is a no-op. Wire it up:
- Call
provider.create_server()to create the VM - Wait for the VM to be reachable via SSH
- Run bootstrap script (install Docker, pull images, configure services)
- Register the server with the cluster
- Update server status to "active"
9.2.2 Implement remove_server
- Drain the server (remove from load balancer, wait for in-flight requests)
- Stop services
- Call
provider.delete_server()to destroy the VM - Remove from database
9.2.3 Fix SQL parameter binding
File: server_manager.rs — search for $2 and verify each query has matching .bind() calls. The known bugs:
- Line ~595:
WHERE id = $2with only one.bind(operation_id)→ should be$1 - Line ~610: Same issue
9.2.4 Real health data
Replace hardcoded cluster_health() and get_pillar_stats() with queries to VictoriaMetrics:
async fn get_pillar_stats(&self) -> Result<PillarStats> {
let vm_url = std::env::var("VICTORIA_METRICS_URL")?;
let client = reqwest::Client::new();
let cpu_query = format!("{}/api/v1/query?query=avg(rate(process_cpu_seconds_total[5m]))", vm_url);
let resp = client.get(&cpu_query).send().await?;
// Parse Prometheus response format
}
9.3 — Multi-Provider
9.3.1 DigitalOcean provider
File: control_plane/src/providers/digitalocean.rs
Implement using the DigitalOcean API v2:
create_server: POST /v2/dropletsdelete_server: DELETE /v2/droplets/{id}get_server: GET /v2/droplets/{id}list_servers: GET /v2/droplets
9.3.2 Fix Hetzner plan validation
File: control_plane/src/providers/mod.rs — validate_plan (line ~134)
Correct the RAM mapping:
- CX11: 2GB (not 4GB)
- CX21: 4GB (not 8GB)
- CX31: 8GB
- CX41: 16GB
9.3.3 Add pagination to Hetzner list_servers
The Hetzner API returns max 25 results per page. Implement pagination:
let mut all_servers = Vec::new();
let mut page = 1;
loop {
let resp = client.get(&format!("{}/servers?page={}&per_page=50", api_url, page))...;
let page_data: HetznerListResponse = resp.json().await?;
all_servers.extend(page_data.servers);
if page_data.meta.pagination.next_page.is_none() { break; }
page += 1;
}
Completion Requirements
This milestone is not complete until every item below is satisfied.
1. Full Test Suite — All Green
cargo test --workspacepasses with zero failures- All pre-existing tests still pass (no regressions)
- New tests are written for the consolidated control plane:
| Test | Location | What it validates |
|---|---|---|
test_list_servers |
control_plane/src/server_manager.rs |
GET /platform/v1/servers returns server list |
test_create_server_hetzner |
control_plane/src/providers/hetzner.rs |
provision_server sends correct API payload (mock HTTP) |
test_delete_server_hetzner |
control_plane/src/providers/hetzner.rs |
remove_server sends DELETE to correct API endpoint (mock HTTP) |
test_create_server_digitalocean |
control_plane/src/providers/digitalocean.rs |
provision_server sends correct Droplet payload (mock HTTP) |
test_hetzner_plan_validation |
control_plane/src/providers/hetzner.rs |
CX11=2GB, CX21=4GB, CX31=8GB — correct RAM mapping |
test_hetzner_pagination |
control_plane/src/providers/hetzner.rs |
list_servers paginates through multiple pages |
test_cluster_health_real_metrics |
control_plane/src/lib.rs |
Health endpoint queries VictoriaMetrics (mock) and returns real CPU/mem |
test_sql_parameter_binding |
control_plane/src/lib.rs |
All queries use $1 binding, not string interpolation |
test_admin_auth_on_server_routes |
control_plane/src/lib.rs |
GET /platform/v1/servers without admin auth returns 401 |
test_old_control_plane_api_removed |
workspace | control-plane-api is not in Cargo.toml workspace members |
2. Integration Verification
- All
/platform/v1/*routes work through the consolidated control plane - Server provisioning creates a real Hetzner VM (integration test with API key)
- Server removal destroys the VM
- Cluster health returns real CPU/memory metrics (not hardcoded)
- The old
control-plane-apibinary is no longer needed and has been removed from the workspace - Admin auth protects all server management routes
- Scaling operations are recorded in the
scaling_operationstable
3. CI Gate
- All unit tests (with mocked HTTP) run in
cargo test --workspace - Integration tests against real cloud providers are gated behind
#[ignore]and requireHETZNER_API_TOKEN/DO_API_TOKENenv vars cargo build --workspacesucceeds without the oldcontrol-plane-apicrate