Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
179 lines
7.2 KiB
Markdown
179 lines
7.2 KiB
Markdown
# Milestone 9: Control Plane Consolidation
|
|
|
|
**Goal:** One control plane, one API, one source of truth for project and infrastructure management.
|
|
|
|
**Depends on:** M0 (Security), M1 (Foundation), M7 (CI/CD)
|
|
|
|
---
|
|
|
|
## 9.1 — Merge the Two Control Planes
|
|
|
|
### Current state
|
|
|
|
There are two parallel control plane implementations:
|
|
|
|
| | In-gateway `control_plane/` | Standalone `control-plane-api/` |
|
|
|---|---|---|
|
|
| **Binary** | Part of `control` binary | Separate `control-plane-api` binary |
|
|
| **Auth** | Admin cookie (broken, fixed in M0) | None |
|
|
| **API prefix** | `/platform/v1/*` | `/api/v1/*` |
|
|
| **Features** | Project CRUD, user mgmt, key rotation, DB browser | Server provisioning, scaling, health, templates |
|
|
| **Database** | Control DB (projects table) | Separate DB (servers, scaling_operations tables) |
|
|
| **UI** | `web/admin.html` (Vue) | `control-plane-ui/` (React/MUI) |
|
|
|
|
### Recommended approach
|
|
|
|
Merge `control-plane-api` server management into the gateway's control mode:
|
|
|
|
1. **Move server management routes** from `control-plane-api/src/lib.rs` to `control_plane/src/lib.rs` under `/platform/v1/servers`, `/platform/v1/scaling`, etc.
|
|
|
|
2. **Move the `ServerManager`** from `control-plane-api/src/server_manager.rs` into a new `control_plane/src/server_manager.rs`.
|
|
|
|
3. **Move provider code** from `control-plane-api/src/providers/` into `control_plane/src/providers/`.
|
|
|
|
4. **Consolidate the database schema.** Merge the `control-plane-api/migrations/001_initial.sql` tables (`servers`, `scaling_operations`, `cluster_events`, `server_metrics`) into the main migrations directory.
|
|
|
|
5. **Deprecate the standalone binary.** Remove `control-plane-api` from `Cargo.toml` workspace members. Keep the React UI if desired, but point it at the consolidated API.
|
|
|
|
6. **Use the admin auth** (fixed in M0) for all server management routes.
|
|
|
|
### Migration steps
|
|
|
|
```bash
|
|
# 1. Copy server management code
|
|
cp control-plane-api/src/server_manager.rs control_plane/src/
|
|
cp -r control-plane-api/src/providers/ control_plane/src/
|
|
cp control-plane-api/src/templates.rs control_plane/src/
|
|
cp control-plane-api/src/docker.rs control_plane/src/
|
|
|
|
# 2. Copy and merge migrations
|
|
cp control-plane-api/migrations/001_initial.sql migrations/20260320000000_server_management.sql
|
|
|
|
# 3. Update control_plane/src/lib.rs to add new routes
|
|
# 4. Update control_plane/Cargo.toml for new dependencies (reqwest, ssh2, etc.)
|
|
# 5. Remove control-plane-api from workspace
|
|
```
|
|
|
|
---
|
|
|
|
## 9.2 — Fix Server Provisioning
|
|
|
|
### 9.2.1 Implement provision_server
|
|
|
|
The current `provision_server` in `server_manager.rs` is a no-op. Wire it up:
|
|
|
|
1. Call `provider.create_server()` to create the VM
|
|
2. Wait for the VM to be reachable via SSH
|
|
3. Run bootstrap script (install Docker, pull images, configure services)
|
|
4. Register the server with the cluster
|
|
5. Update server status to "active"
|
|
|
|
### 9.2.2 Implement remove_server
|
|
|
|
1. Drain the server (remove from load balancer, wait for in-flight requests)
|
|
2. Stop services
|
|
3. Call `provider.delete_server()` to destroy the VM
|
|
4. Remove from database
|
|
|
|
### 9.2.3 Fix SQL parameter binding
|
|
|
|
**File:** `server_manager.rs` — search for `$2` and verify each query has matching `.bind()` calls. The known bugs:
|
|
- Line ~595: `WHERE id = $2` with only one `.bind(operation_id)` → should be `$1`
|
|
- Line ~610: Same issue
|
|
|
|
### 9.2.4 Real health data
|
|
|
|
Replace hardcoded `cluster_health()` and `get_pillar_stats()` with queries to VictoriaMetrics:
|
|
|
|
```rust
|
|
async fn get_pillar_stats(&self) -> Result<PillarStats> {
|
|
let vm_url = std::env::var("VICTORIA_METRICS_URL")?;
|
|
let client = reqwest::Client::new();
|
|
|
|
let cpu_query = format!("{}/api/v1/query?query=avg(rate(process_cpu_seconds_total[5m]))", vm_url);
|
|
let resp = client.get(&cpu_query).send().await?;
|
|
// Parse Prometheus response format
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 9.3 — Multi-Provider
|
|
|
|
### 9.3.1 DigitalOcean provider
|
|
|
|
**File:** `control_plane/src/providers/digitalocean.rs`
|
|
|
|
Implement using the DigitalOcean API v2:
|
|
- `create_server`: POST /v2/droplets
|
|
- `delete_server`: DELETE /v2/droplets/{id}
|
|
- `get_server`: GET /v2/droplets/{id}
|
|
- `list_servers`: GET /v2/droplets
|
|
|
|
### 9.3.2 Fix Hetzner plan validation
|
|
|
|
**File:** `control_plane/src/providers/mod.rs` — `validate_plan` (line ~134)
|
|
|
|
Correct the RAM mapping:
|
|
- CX11: 2GB (not 4GB)
|
|
- CX21: 4GB (not 8GB)
|
|
- CX31: 8GB
|
|
- CX41: 16GB
|
|
|
|
### 9.3.3 Add pagination to Hetzner list_servers
|
|
|
|
The Hetzner API returns max 25 results per page. Implement pagination:
|
|
|
|
```rust
|
|
let mut all_servers = Vec::new();
|
|
let mut page = 1;
|
|
loop {
|
|
let resp = client.get(&format!("{}/servers?page={}&per_page=50", api_url, page))...;
|
|
let page_data: HetznerListResponse = resp.json().await?;
|
|
all_servers.extend(page_data.servers);
|
|
if page_data.meta.pagination.next_page.is_none() { break; }
|
|
page += 1;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Completion Requirements
|
|
|
|
This milestone is **not complete** until every item below is satisfied.
|
|
|
|
### 1. Full Test Suite — All Green
|
|
|
|
- [ ] `cargo test --workspace` passes with **zero failures**
|
|
- [ ] All **pre-existing tests** still pass (no regressions)
|
|
- [ ] **New tests** are written for the consolidated control plane:
|
|
|
|
| Test | Location | What it validates |
|
|
|------|----------|-------------------|
|
|
| `test_list_servers` | `control_plane/src/server_manager.rs` | `GET /platform/v1/servers` returns server list |
|
|
| `test_create_server_hetzner` | `control_plane/src/providers/hetzner.rs` | `provision_server` sends correct API payload (mock HTTP) |
|
|
| `test_delete_server_hetzner` | `control_plane/src/providers/hetzner.rs` | `remove_server` sends DELETE to correct API endpoint (mock HTTP) |
|
|
| `test_create_server_digitalocean` | `control_plane/src/providers/digitalocean.rs` | `provision_server` sends correct Droplet payload (mock HTTP) |
|
|
| `test_hetzner_plan_validation` | `control_plane/src/providers/hetzner.rs` | CX11=2GB, CX21=4GB, CX31=8GB — correct RAM mapping |
|
|
| `test_hetzner_pagination` | `control_plane/src/providers/hetzner.rs` | `list_servers` paginates through multiple pages |
|
|
| `test_cluster_health_real_metrics` | `control_plane/src/lib.rs` | Health endpoint queries VictoriaMetrics (mock) and returns real CPU/mem |
|
|
| `test_sql_parameter_binding` | `control_plane/src/lib.rs` | All queries use `$1` binding, not string interpolation |
|
|
| `test_admin_auth_on_server_routes` | `control_plane/src/lib.rs` | `GET /platform/v1/servers` without admin auth returns 401 |
|
|
| `test_old_control_plane_api_removed` | workspace | `control-plane-api` is not in `Cargo.toml` workspace members |
|
|
|
|
### 2. Integration Verification
|
|
|
|
- [ ] All `/platform/v1/*` routes work through the consolidated control plane
|
|
- [ ] Server provisioning creates a real Hetzner VM (integration test with API key)
|
|
- [ ] Server removal destroys the VM
|
|
- [ ] Cluster health returns real CPU/memory metrics (not hardcoded)
|
|
- [ ] The old `control-plane-api` binary is no longer needed and has been removed from the workspace
|
|
- [ ] Admin auth protects all server management routes
|
|
- [ ] Scaling operations are recorded in the `scaling_operations` table
|
|
|
|
### 3. CI Gate
|
|
|
|
- [ ] All unit tests (with mocked HTTP) run in `cargo test --workspace`
|
|
- [ ] Integration tests against real cloud providers are gated behind `#[ignore]` and require `HETZNER_API_TOKEN` / `DO_API_TOKEN` env vars
|
|
- [ ] `cargo build --workspace` succeeds without the old `control-plane-api` crate
|