Files
madbase/_milestones/M1_foundation.md
Vlad Durnea cffdf8af86
Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
wip:milestone 0 fixes
2026-03-15 12:35:42 +02:00

494 lines
15 KiB
Markdown

# Milestone 1: Foundation — Make It Compile and Run Correctly
**Goal:** A developer can `docker compose up`, hit the API with supabase-js, and get correct behavior for basic flows.
**Depends on:** M0 (Security Hardening)
---
## 1.1 — Fix Critical Bugs
### 1.1.1 Fix proxy body forwarding
**File:** `gateway/src/proxy.rs``forward_request` function (line ~172)
The proxy builds a `reqwest` request with `.headers()` but never reads or forwards the request body. Every POST/PUT/PATCH through the proxy silently drops the body.
**Current code (broken):**
```rust
let request_builder = client
.request(req.method().clone(), &target_url)
.headers(req.headers().clone());
// Body is never set!
```
**Fix:** Read the body from the incoming axum `Request` and attach it to the outgoing `reqwest` request:
```rust
// Extract body before consuming the request
let (parts, body) = req.into_parts();
let body_bytes = axum::body::to_bytes(body, 1024 * 1024 * 100) // 100MB limit
.await
.map_err(|_| StatusCode::BAD_REQUEST)?;
let request_builder = client
.request(parts.method.clone(), &target_url)
.headers(parts.headers.clone())
.body(body_bytes);
```
For streaming (large uploads), use `reqwest::Body::wrap_stream()` instead of buffering.
### 1.1.2 Fix proxy round-robin
**File:** `gateway/src/proxy.rs``proxy_request` function (line ~147)
**Current broken logic:** `get_healthy_worker()` always returns the FIRST healthy worker. Round-robin (`get_next_worker()`) is only used as a fallback when NO workers are healthy.
**Fix:** Merge the two methods — round-robin among healthy workers:
```rust
async fn get_next_healthy_worker(&self) -> Option<Upstream> {
let upstreams = self.worker_upstreams.read().await;
let len = upstreams.len();
if len == 0 { return None; }
let mut index = self.current_worker_index.write().await;
for _ in 0..len {
let candidate = &upstreams[*index % len];
*index = (*index + 1) % len;
if *candidate.healthy.read().await {
return Some(candidate.clone());
}
}
// All unhealthy — return next in rotation anyway
let fallback = upstreams[*index % len].clone();
*index = (*index + 1) % len;
Some(fallback)
}
```
### 1.1.3 Fix proxy response streaming
**File:** `gateway/src/proxy.rs``forward_request` function (line ~200)
```rust
// BEFORE — loads entire response into memory
let body_bytes = response.bytes().await.map_err(|e| { ... })?;
response_builder.body(Body::from(body_bytes.to_vec()))
// AFTER — stream the response
let stream = response.bytes_stream();
let body = Body::from_stream(stream);
response_builder.body(body)
```
This prevents OOM on large file downloads through the proxy.
### 1.1.4 Pool HTTP clients
**Files:** `gateway/src/proxy.rs`, `gateway/src/control.rs`
Create `reqwest::Client` once at startup and store it in state:
```rust
// In ProxyState::new()
let http_client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.pool_max_idle_per_host(20)
.build()
.unwrap();
```
Store in `ProxyState { http_client, ... }`. Pass to `forward_request`. Same for health check loop — use the shared client instead of creating one per iteration.
In `gateway/src/control.rs``logs_proxy_handler` (line 23): create the client in `ControlState` and pass via `State`, not `reqwest::Client::new()` per request.
### 1.1.5 Fix tracing in standalone binaries
**Files:** `gateway/src/bin/proxy.rs`, `bin/control.rs`, `bin/worker.rs`
All three have the same bug — `_rust_log` is unused:
```rust
// BEFORE
let _rust_log = std::env::var("RUST_LOG").unwrap_or_else(|_| "info".into());
tracing_subscriber::fmt::init();
// AFTER
tracing_subscriber::fmt()
.with_env_filter(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info"))
)
.init();
```
Also note `bin/worker.rs` has a typo: `RUST_log` instead of `RUST_LOG`.
---
## 1.2 — Dev Stack That Actually Works
### 1.2.1 Updated docker-compose.yml
Add Redis, MinIO, health checks, and proper startup ordering:
```yaml
services:
db:
image: postgres:15-alpine
container_name: madbase_dev_db
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
ports:
- "5432:5432"
volumes:
- dev_db_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 10
redis:
image: redis:7-alpine
container_name: madbase_dev_redis
command: redis-server --appendonly yes
ports:
- "6379:6379"
volumes:
- dev_redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
minio:
image: quay.io/minio/minio:RELEASE.2024-06-13T22-53-53Z
container_name: madbase_dev_minio
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin}
MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin}
volumes:
- dev_minio_data:/data
healthcheck:
test: ["CMD", "mc", "ready", "local"]
interval: 5s
timeout: 3s
retries: 5
worker:
build:
context: .
target: worker-runtime
container_name: madbase_dev_worker
ports:
- "8002:8002"
environment:
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
JWT_SECRET: ${JWT_SECRET}
REDIS_URL: redis://redis:6379
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: ${S3_ACCESS_KEY:-minioadmin}
S3_SECRET_KEY: ${S3_SECRET_KEY:-minioadmin}
S3_BUCKET: madbase
S3_REGION: us-east-1
RUST_LOG: info
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
minio:
condition: service_healthy
system:
build:
context: .
target: control-runtime
container_name: madbase_dev_system
ports:
- "8001:8001"
environment:
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
JWT_SECRET: ${JWT_SECRET}
ADMIN_PASSWORD: ${ADMIN_PASSWORD}
RUST_LOG: info
depends_on:
db:
condition: service_healthy
proxy:
build:
context: .
target: proxy-runtime
container_name: madbase_dev_proxy
ports:
- "8000:8000"
environment:
CONTROL_UPSTREAM_URL: http://system:8001
WORKER_UPSTREAM_URLS: http://worker:8002
RUST_LOG: info
depends_on:
- system
- worker
volumes:
dev_db_data:
dev_redis_data:
dev_minio_data:
```
### 1.2.2 Create .env.example
```env
# Required
JWT_SECRET=generate-with-openssl-rand-hex-32
ADMIN_PASSWORD=change-me-in-production
DATABASE_URL=postgres://postgres:postgres@localhost:5432/postgres
DEFAULT_TENANT_DB_URL=postgres://postgres:postgres@localhost:5432/postgres
# Storage (MinIO for dev, Hetzner/AWS for production)
S3_ENDPOINT=http://localhost:9000
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_BUCKET=madbase
S3_REGION=us-east-1
# Optional
REDIS_URL=redis://localhost:6379
RUST_LOG=info
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8000
```
### 1.2.3 Create missing config files
Create `config/prometheus.yml`:
```yaml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'madbase-worker'
static_configs:
- targets: ['worker:8002']
metrics_path: /metrics
- job_name: 'madbase-control'
static_configs:
- targets: ['control:8001']
metrics_path: /metrics
- job_name: 'madbase-proxy'
static_configs:
- targets: ['proxy:8000']
metrics_path: /metrics
```
Create `config/vmagent.yml` with the same content.
### 1.2.4 Fix Grafana port
**File:** `docker-compose.pillar-system.yml` line 33
```yaml
# BEFORE
ports:
- "3030:3030"
# AFTER — Grafana listens on 3000 by default
ports:
- "3030:3000"
```
Or add `GF_SERVER_HTTP_PORT=3030` to the environment.
---
## 1.3 — Unified Error Handling
### 1.3.1 Create ApiError type
**File:** Create `common/src/error.rs`
```rust
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response, Json};
use serde::Serialize;
#[derive(Debug)]
pub enum ApiError {
BadRequest(String),
Unauthorized(String),
Forbidden(String),
NotFound(String),
Conflict(String),
Internal(String),
Database(sqlx::Error),
}
#[derive(Serialize)]
struct ErrorResponse {
error: String,
code: u16,
#[serde(skip_serializing_if = "Option::is_none")]
detail: Option<String>,
}
impl IntoResponse for ApiError {
fn into_response(self) -> Response {
let (status, message, detail) = match &self {
ApiError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone(), None),
ApiError::Unauthorized(msg) => (StatusCode::UNAUTHORIZED, msg.clone(), None),
ApiError::Forbidden(msg) => (StatusCode::FORBIDDEN, msg.clone(), None),
ApiError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone(), None),
ApiError::Conflict(msg) => (StatusCode::CONFLICT, msg.clone(), None),
ApiError::Internal(msg) => {
tracing::error!("Internal error: {}", msg);
(StatusCode::INTERNAL_SERVER_ERROR, "Internal server error".to_string(), None)
}
ApiError::Database(e) => {
tracing::error!("Database error: {}", e);
(StatusCode::INTERNAL_SERVER_ERROR, "Database error".to_string(), None)
}
};
let body = ErrorResponse {
error: message,
code: status.as_u16(),
detail,
};
(status, Json(body)).into_response()
}
}
impl From<sqlx::Error> for ApiError {
fn from(e: sqlx::Error) -> Self {
ApiError::Database(e)
}
}
```
Gradually replace `(StatusCode, String)` return types with `Result<T, ApiError>` across all handlers.
---
## 1.4 — Extract RLS Middleware
### 1.4.1 Create RLS transaction extractor
The `BEGIN tx → SET LOCAL role → set_config` block is repeated ~15 times. Create an extractor:
**File:** Create `common/src/rls.rs`
```rust
use axum::extract::{Extension, FromRequestParts};
use auth::AuthContext;
use sqlx::{PgPool, Postgres, Transaction};
pub struct RlsTransaction {
pub tx: Transaction<'static, Postgres>,
}
impl RlsTransaction {
pub async fn begin(
pool: &PgPool,
auth_ctx: &AuthContext,
) -> Result<Self, ApiError> {
let mut tx = pool.begin().await?;
// Validate and set role
const ALLOWED_ROLES: &[&str] = &["anon", "authenticated", "service_role"];
if !ALLOWED_ROLES.contains(&auth_ctx.role.as_str()) {
return Err(ApiError::Forbidden("Invalid role".into()));
}
let role_query = format!("SET LOCAL role = '{}'", auth_ctx.role);
sqlx::query(&role_query).execute(&mut *tx).await?;
// Set JWT claims for RLS policies
if let Some(claims) = &auth_ctx.claims {
sqlx::query("SELECT set_config('request.jwt.claim.sub', $1, true)")
.bind(&claims.sub)
.execute(&mut *tx)
.await?;
}
Ok(Self { tx })
}
pub async fn commit(self) -> Result<(), ApiError> {
self.tx.commit().await.map_err(ApiError::from)
}
}
```
**Usage in handlers:**
```rust
pub async fn list_buckets(
State(state): State<StorageState>,
Extension(auth_ctx): Extension<AuthContext>,
db: Option<Extension<PgPool>>,
) -> Result<Json<Vec<Bucket>>, ApiError> {
let pool = db.map(|Extension(p)| p).unwrap_or_else(|| state.db.clone());
let mut rls = RlsTransaction::begin(&pool, &auth_ctx).await?;
let buckets = sqlx::query_as::<_, Bucket>("SELECT * FROM storage.buckets")
.fetch_all(&mut *rls.tx)
.await?;
Ok(Json(buckets))
// tx auto-rolls back on drop (read-only is fine)
}
```
This eliminates ~150 lines of duplicated error-mapping boilerplate.
---
## Completion Requirements
This milestone is **not complete** until every item below is satisfied.
### 1. Full Test Suite — All Green
- [ ] `cargo test --workspace` passes with **zero failures**
- [ ] All **pre-existing tests** still pass (no regressions)
- [ ] **New unit tests** are written for every fix in this milestone:
| Test | Location | What it validates |
|------|----------|-------------------|
| `test_proxy_forwards_body` | `gateway/src/proxy.rs` | POST with 1MB body reaches the upstream intact |
| `test_proxy_streams_response` | `gateway/src/proxy.rs` | Large response is streamed, not buffered entirely |
| `test_proxy_round_robin` | `gateway/src/proxy.rs` | 4 requests to 2 workers distribute 2+2 |
| `test_proxy_single_http_client` | `gateway/src/proxy.rs` | `reqwest::Client` is reused (shared state, not per-request) |
| `test_worker_tracing_init` | `gateway/src/bin/worker.rs` | `RUST_LOG=debug` produces debug-level spans |
| `test_api_error_json_format` | `common/src/error.rs` | `ApiError::BadRequest("x")` serializes to `{"error":"x","code":400}` |
| `test_api_error_hides_db_detail` | `common/src/error.rs` | `ApiError::Database(e)` does not leak SQL in the response body |
| `test_rls_transaction_sets_role` | `common/src/rls.rs` | `RlsTransaction::begin()` issues `SET LOCAL role` with the auth context role |
| `test_rls_transaction_rejects_bad_role` | `common/src/rls.rs` | Role outside `[anon, authenticated, service_role]` returns `Forbidden` |
| `test_rls_transaction_sets_claims` | `common/src/rls.rs` | JWT `sub` claim is available via `current_setting('request.jwt.claim.sub')` |
### 2. Integration Verification
- [ ] `docker compose up` starts all services (db, redis, minio, worker, system, proxy) without crash-loops
- [ ] `curl -X POST http://localhost:8000/auth/v1/signup -H "apikey: <anon_key>" -d '{"email":"test@test.com","password":"password123"}'` returns a user (through the proxy)
- [ ] Large file upload (>5MB) through the proxy succeeds (body forwarding works)
- [ ] Proxy distributes requests across multiple workers (if configured)
- [ ] `RUST_LOG=debug` works in all three standalone binaries
- [ ] API errors return structured JSON, never raw SQL error messages
- [ ] `docker compose down && docker compose up` — idempotent restart with no data loss
### 3. CI Gate
- [ ] All of the above unit tests are included in `cargo test --workspace`
- [ ] No `#[ignore]` on any test added in this milestone unless it requires external services (and those must be documented)