Files
madbase/_milestones/M9_control_plane_consolidation.md
Vlad Durnea cffdf8af86
Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
wip:milestone 0 fixes
2026-03-15 12:35:42 +02:00

7.2 KiB

Milestone 9: Control Plane Consolidation

Goal: One control plane, one API, one source of truth for project and infrastructure management.

Depends on: M0 (Security), M1 (Foundation), M7 (CI/CD)


9.1 — Merge the Two Control Planes

Current state

There are two parallel control plane implementations:

In-gateway control_plane/ Standalone control-plane-api/
Binary Part of control binary Separate control-plane-api binary
Auth Admin cookie (broken, fixed in M0) None
API prefix /platform/v1/* /api/v1/*
Features Project CRUD, user mgmt, key rotation, DB browser Server provisioning, scaling, health, templates
Database Control DB (projects table) Separate DB (servers, scaling_operations tables)
UI web/admin.html (Vue) control-plane-ui/ (React/MUI)

Merge control-plane-api server management into the gateway's control mode:

  1. Move server management routes from control-plane-api/src/lib.rs to control_plane/src/lib.rs under /platform/v1/servers, /platform/v1/scaling, etc.

  2. Move the ServerManager from control-plane-api/src/server_manager.rs into a new control_plane/src/server_manager.rs.

  3. Move provider code from control-plane-api/src/providers/ into control_plane/src/providers/.

  4. Consolidate the database schema. Merge the control-plane-api/migrations/001_initial.sql tables (servers, scaling_operations, cluster_events, server_metrics) into the main migrations directory.

  5. Deprecate the standalone binary. Remove control-plane-api from Cargo.toml workspace members. Keep the React UI if desired, but point it at the consolidated API.

  6. Use the admin auth (fixed in M0) for all server management routes.

Migration steps

# 1. Copy server management code
cp control-plane-api/src/server_manager.rs control_plane/src/
cp -r control-plane-api/src/providers/ control_plane/src/
cp control-plane-api/src/templates.rs control_plane/src/
cp control-plane-api/src/docker.rs control_plane/src/

# 2. Copy and merge migrations
cp control-plane-api/migrations/001_initial.sql migrations/20260320000000_server_management.sql

# 3. Update control_plane/src/lib.rs to add new routes
# 4. Update control_plane/Cargo.toml for new dependencies (reqwest, ssh2, etc.)
# 5. Remove control-plane-api from workspace

9.2 — Fix Server Provisioning

9.2.1 Implement provision_server

The current provision_server in server_manager.rs is a no-op. Wire it up:

  1. Call provider.create_server() to create the VM
  2. Wait for the VM to be reachable via SSH
  3. Run bootstrap script (install Docker, pull images, configure services)
  4. Register the server with the cluster
  5. Update server status to "active"

9.2.2 Implement remove_server

  1. Drain the server (remove from load balancer, wait for in-flight requests)
  2. Stop services
  3. Call provider.delete_server() to destroy the VM
  4. Remove from database

9.2.3 Fix SQL parameter binding

File: server_manager.rs — search for $2 and verify each query has matching .bind() calls. The known bugs:

  • Line ~595: WHERE id = $2 with only one .bind(operation_id) → should be $1
  • Line ~610: Same issue

9.2.4 Real health data

Replace hardcoded cluster_health() and get_pillar_stats() with queries to VictoriaMetrics:

async fn get_pillar_stats(&self) -> Result<PillarStats> {
    let vm_url = std::env::var("VICTORIA_METRICS_URL")?;
    let client = reqwest::Client::new();

    let cpu_query = format!("{}/api/v1/query?query=avg(rate(process_cpu_seconds_total[5m]))", vm_url);
    let resp = client.get(&cpu_query).send().await?;
    // Parse Prometheus response format
}

9.3 — Multi-Provider

9.3.1 DigitalOcean provider

File: control_plane/src/providers/digitalocean.rs

Implement using the DigitalOcean API v2:

  • create_server: POST /v2/droplets
  • delete_server: DELETE /v2/droplets/{id}
  • get_server: GET /v2/droplets/{id}
  • list_servers: GET /v2/droplets

9.3.2 Fix Hetzner plan validation

File: control_plane/src/providers/mod.rsvalidate_plan (line ~134)

Correct the RAM mapping:

  • CX11: 2GB (not 4GB)
  • CX21: 4GB (not 8GB)
  • CX31: 8GB
  • CX41: 16GB

9.3.3 Add pagination to Hetzner list_servers

The Hetzner API returns max 25 results per page. Implement pagination:

let mut all_servers = Vec::new();
let mut page = 1;
loop {
    let resp = client.get(&format!("{}/servers?page={}&per_page=50", api_url, page))...;
    let page_data: HetznerListResponse = resp.json().await?;
    all_servers.extend(page_data.servers);
    if page_data.meta.pagination.next_page.is_none() { break; }
    page += 1;
}

Completion Requirements

This milestone is not complete until every item below is satisfied.

1. Full Test Suite — All Green

  • cargo test --workspace passes with zero failures
  • All pre-existing tests still pass (no regressions)
  • New tests are written for the consolidated control plane:
Test Location What it validates
test_list_servers control_plane/src/server_manager.rs GET /platform/v1/servers returns server list
test_create_server_hetzner control_plane/src/providers/hetzner.rs provision_server sends correct API payload (mock HTTP)
test_delete_server_hetzner control_plane/src/providers/hetzner.rs remove_server sends DELETE to correct API endpoint (mock HTTP)
test_create_server_digitalocean control_plane/src/providers/digitalocean.rs provision_server sends correct Droplet payload (mock HTTP)
test_hetzner_plan_validation control_plane/src/providers/hetzner.rs CX11=2GB, CX21=4GB, CX31=8GB — correct RAM mapping
test_hetzner_pagination control_plane/src/providers/hetzner.rs list_servers paginates through multiple pages
test_cluster_health_real_metrics control_plane/src/lib.rs Health endpoint queries VictoriaMetrics (mock) and returns real CPU/mem
test_sql_parameter_binding control_plane/src/lib.rs All queries use $1 binding, not string interpolation
test_admin_auth_on_server_routes control_plane/src/lib.rs GET /platform/v1/servers without admin auth returns 401
test_old_control_plane_api_removed workspace control-plane-api is not in Cargo.toml workspace members

2. Integration Verification

  • All /platform/v1/* routes work through the consolidated control plane
  • Server provisioning creates a real Hetzner VM (integration test with API key)
  • Server removal destroys the VM
  • Cluster health returns real CPU/memory metrics (not hardcoded)
  • The old control-plane-api binary is no longer needed and has been removed from the workspace
  • Admin auth protects all server management routes
  • Scaling operations are recorded in the scaling_operations table

3. CI Gate

  • All unit tests (with mocked HTTP) run in cargo test --workspace
  • Integration tests against real cloud providers are gated behind #[ignore] and require HETZNER_API_TOKEN / DO_API_TOKEN env vars
  • cargo build --workspace succeeds without the old control-plane-api crate