# Milestone 1: Foundation — Make It Compile and Run Correctly **Goal:** A developer can `docker compose up`, hit the API with supabase-js, and get correct behavior for basic flows. **Depends on:** M0 (Security Hardening) --- ## 1.1 — Fix Critical Bugs ### 1.1.1 Fix proxy body forwarding **File:** `gateway/src/proxy.rs` — `forward_request` function (line ~172) The proxy builds a `reqwest` request with `.headers()` but never reads or forwards the request body. Every POST/PUT/PATCH through the proxy silently drops the body. **Current code (broken):** ```rust let request_builder = client .request(req.method().clone(), &target_url) .headers(req.headers().clone()); // Body is never set! ``` **Fix:** Read the body from the incoming axum `Request` and attach it to the outgoing `reqwest` request: ```rust // Extract body before consuming the request let (parts, body) = req.into_parts(); let body_bytes = axum::body::to_bytes(body, 1024 * 1024 * 100) // 100MB limit .await .map_err(|_| StatusCode::BAD_REQUEST)?; let request_builder = client .request(parts.method.clone(), &target_url) .headers(parts.headers.clone()) .body(body_bytes); ``` For streaming (large uploads), use `reqwest::Body::wrap_stream()` instead of buffering. ### 1.1.2 Fix proxy round-robin **File:** `gateway/src/proxy.rs` — `proxy_request` function (line ~147) **Current broken logic:** `get_healthy_worker()` always returns the FIRST healthy worker. Round-robin (`get_next_worker()`) is only used as a fallback when NO workers are healthy. **Fix:** Merge the two methods — round-robin among healthy workers: ```rust async fn get_next_healthy_worker(&self) -> Option { let upstreams = self.worker_upstreams.read().await; let len = upstreams.len(); if len == 0 { return None; } let mut index = self.current_worker_index.write().await; for _ in 0..len { let candidate = &upstreams[*index % len]; *index = (*index + 1) % len; if *candidate.healthy.read().await { return Some(candidate.clone()); } } // All unhealthy — return next in rotation anyway let fallback = upstreams[*index % len].clone(); *index = (*index + 1) % len; Some(fallback) } ``` ### 1.1.3 Fix proxy response streaming **File:** `gateway/src/proxy.rs` — `forward_request` function (line ~200) ```rust // BEFORE — loads entire response into memory let body_bytes = response.bytes().await.map_err(|e| { ... })?; response_builder.body(Body::from(body_bytes.to_vec())) // AFTER — stream the response let stream = response.bytes_stream(); let body = Body::from_stream(stream); response_builder.body(body) ``` This prevents OOM on large file downloads through the proxy. ### 1.1.4 Pool HTTP clients **Files:** `gateway/src/proxy.rs`, `gateway/src/control.rs` Create `reqwest::Client` once at startup and store it in state: ```rust // In ProxyState::new() let http_client = reqwest::Client::builder() .timeout(std::time::Duration::from_secs(30)) .pool_max_idle_per_host(20) .build() .unwrap(); ``` Store in `ProxyState { http_client, ... }`. Pass to `forward_request`. Same for health check loop — use the shared client instead of creating one per iteration. In `gateway/src/control.rs` — `logs_proxy_handler` (line 23): create the client in `ControlState` and pass via `State`, not `reqwest::Client::new()` per request. ### 1.1.5 Fix tracing in standalone binaries **Files:** `gateway/src/bin/proxy.rs`, `bin/control.rs`, `bin/worker.rs` All three have the same bug — `_rust_log` is unused: ```rust // BEFORE let _rust_log = std::env::var("RUST_LOG").unwrap_or_else(|_| "info".into()); tracing_subscriber::fmt::init(); // AFTER tracing_subscriber::fmt() .with_env_filter( tracing_subscriber::EnvFilter::try_from_default_env() .unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info")) ) .init(); ``` Also note `bin/worker.rs` has a typo: `RUST_log` instead of `RUST_LOG`. --- ## 1.2 — Dev Stack That Actually Works ### 1.2.1 Updated docker-compose.yml Add Redis, MinIO, health checks, and proper startup ordering: ```yaml services: db: image: postgres:15-alpine container_name: madbase_dev_db environment: POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres} ports: - "5432:5432" volumes: - dev_db_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s timeout: 3s retries: 10 redis: image: redis:7-alpine container_name: madbase_dev_redis command: redis-server --appendonly yes ports: - "6379:6379" volumes: - dev_redis_data:/data healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 3s retries: 5 minio: image: quay.io/minio/minio:RELEASE.2024-06-13T22-53-53Z container_name: madbase_dev_minio command: server /data --console-address ":9001" ports: - "9000:9000" - "9001:9001" environment: MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin} MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin} volumes: - dev_minio_data:/data healthcheck: test: ["CMD", "mc", "ready", "local"] interval: 5s timeout: 3s retries: 5 worker: build: context: . target: worker-runtime container_name: madbase_dev_worker ports: - "8002:8002" environment: DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres JWT_SECRET: ${JWT_SECRET} REDIS_URL: redis://redis:6379 S3_ENDPOINT: http://minio:9000 S3_ACCESS_KEY: ${S3_ACCESS_KEY:-minioadmin} S3_SECRET_KEY: ${S3_SECRET_KEY:-minioadmin} S3_BUCKET: madbase S3_REGION: us-east-1 RUST_LOG: info depends_on: db: condition: service_healthy redis: condition: service_healthy minio: condition: service_healthy system: build: context: . target: control-runtime container_name: madbase_dev_system ports: - "8001:8001" environment: DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres JWT_SECRET: ${JWT_SECRET} ADMIN_PASSWORD: ${ADMIN_PASSWORD} RUST_LOG: info depends_on: db: condition: service_healthy proxy: build: context: . target: proxy-runtime container_name: madbase_dev_proxy ports: - "8000:8000" environment: CONTROL_UPSTREAM_URL: http://system:8001 WORKER_UPSTREAM_URLS: http://worker:8002 RUST_LOG: info depends_on: - system - worker volumes: dev_db_data: dev_redis_data: dev_minio_data: ``` ### 1.2.2 Create .env.example ```env # Required JWT_SECRET=generate-with-openssl-rand-hex-32 ADMIN_PASSWORD=change-me-in-production DATABASE_URL=postgres://postgres:postgres@localhost:5432/postgres DEFAULT_TENANT_DB_URL=postgres://postgres:postgres@localhost:5432/postgres # Storage (MinIO for dev, Hetzner/AWS for production) S3_ENDPOINT=http://localhost:9000 S3_ACCESS_KEY=minioadmin S3_SECRET_KEY=minioadmin S3_BUCKET=madbase S3_REGION=us-east-1 # Optional REDIS_URL=redis://localhost:6379 RUST_LOG=info ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8000 ``` ### 1.2.3 Create missing config files Create `config/prometheus.yml`: ```yaml global: scrape_interval: 15s scrape_configs: - job_name: 'madbase-worker' static_configs: - targets: ['worker:8002'] metrics_path: /metrics - job_name: 'madbase-control' static_configs: - targets: ['control:8001'] metrics_path: /metrics - job_name: 'madbase-proxy' static_configs: - targets: ['proxy:8000'] metrics_path: /metrics ``` Create `config/vmagent.yml` with the same content. ### 1.2.4 Fix Grafana port **File:** `docker-compose.pillar-system.yml` line 33 ```yaml # BEFORE ports: - "3030:3030" # AFTER — Grafana listens on 3000 by default ports: - "3030:3000" ``` Or add `GF_SERVER_HTTP_PORT=3030` to the environment. --- ## 1.3 — Unified Error Handling ### 1.3.1 Create ApiError type **File:** Create `common/src/error.rs` ```rust use axum::http::StatusCode; use axum::response::{IntoResponse, Response, Json}; use serde::Serialize; #[derive(Debug)] pub enum ApiError { BadRequest(String), Unauthorized(String), Forbidden(String), NotFound(String), Conflict(String), Internal(String), Database(sqlx::Error), } #[derive(Serialize)] struct ErrorResponse { error: String, code: u16, #[serde(skip_serializing_if = "Option::is_none")] detail: Option, } impl IntoResponse for ApiError { fn into_response(self) -> Response { let (status, message, detail) = match &self { ApiError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone(), None), ApiError::Unauthorized(msg) => (StatusCode::UNAUTHORIZED, msg.clone(), None), ApiError::Forbidden(msg) => (StatusCode::FORBIDDEN, msg.clone(), None), ApiError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone(), None), ApiError::Conflict(msg) => (StatusCode::CONFLICT, msg.clone(), None), ApiError::Internal(msg) => { tracing::error!("Internal error: {}", msg); (StatusCode::INTERNAL_SERVER_ERROR, "Internal server error".to_string(), None) } ApiError::Database(e) => { tracing::error!("Database error: {}", e); (StatusCode::INTERNAL_SERVER_ERROR, "Database error".to_string(), None) } }; let body = ErrorResponse { error: message, code: status.as_u16(), detail, }; (status, Json(body)).into_response() } } impl From for ApiError { fn from(e: sqlx::Error) -> Self { ApiError::Database(e) } } ``` Gradually replace `(StatusCode, String)` return types with `Result` across all handlers. --- ## 1.4 — Extract RLS Middleware ### 1.4.1 Create RLS transaction extractor The `BEGIN tx → SET LOCAL role → set_config` block is repeated ~15 times. Create an extractor: **File:** Create `common/src/rls.rs` ```rust use axum::extract::{Extension, FromRequestParts}; use auth::AuthContext; use sqlx::{PgPool, Postgres, Transaction}; pub struct RlsTransaction { pub tx: Transaction<'static, Postgres>, } impl RlsTransaction { pub async fn begin( pool: &PgPool, auth_ctx: &AuthContext, ) -> Result { let mut tx = pool.begin().await?; // Validate and set role const ALLOWED_ROLES: &[&str] = &["anon", "authenticated", "service_role"]; if !ALLOWED_ROLES.contains(&auth_ctx.role.as_str()) { return Err(ApiError::Forbidden("Invalid role".into())); } let role_query = format!("SET LOCAL role = '{}'", auth_ctx.role); sqlx::query(&role_query).execute(&mut *tx).await?; // Set JWT claims for RLS policies if let Some(claims) = &auth_ctx.claims { sqlx::query("SELECT set_config('request.jwt.claim.sub', $1, true)") .bind(&claims.sub) .execute(&mut *tx) .await?; } Ok(Self { tx }) } pub async fn commit(self) -> Result<(), ApiError> { self.tx.commit().await.map_err(ApiError::from) } } ``` **Usage in handlers:** ```rust pub async fn list_buckets( State(state): State, Extension(auth_ctx): Extension, db: Option>, ) -> Result>, ApiError> { let pool = db.map(|Extension(p)| p).unwrap_or_else(|| state.db.clone()); let mut rls = RlsTransaction::begin(&pool, &auth_ctx).await?; let buckets = sqlx::query_as::<_, Bucket>("SELECT * FROM storage.buckets") .fetch_all(&mut *rls.tx) .await?; Ok(Json(buckets)) // tx auto-rolls back on drop (read-only is fine) } ``` This eliminates ~150 lines of duplicated error-mapping boilerplate. --- ## Completion Requirements This milestone is **not complete** until every item below is satisfied. ### 1. Full Test Suite — All Green - [ ] `cargo test --workspace` passes with **zero failures** - [ ] All **pre-existing tests** still pass (no regressions) - [ ] **New unit tests** are written for every fix in this milestone: | Test | Location | What it validates | |------|----------|-------------------| | `test_proxy_forwards_body` | `gateway/src/proxy.rs` | POST with 1MB body reaches the upstream intact | | `test_proxy_streams_response` | `gateway/src/proxy.rs` | Large response is streamed, not buffered entirely | | `test_proxy_round_robin` | `gateway/src/proxy.rs` | 4 requests to 2 workers distribute 2+2 | | `test_proxy_single_http_client` | `gateway/src/proxy.rs` | `reqwest::Client` is reused (shared state, not per-request) | | `test_worker_tracing_init` | `gateway/src/bin/worker.rs` | `RUST_LOG=debug` produces debug-level spans | | `test_api_error_json_format` | `common/src/error.rs` | `ApiError::BadRequest("x")` serializes to `{"error":"x","code":400}` | | `test_api_error_hides_db_detail` | `common/src/error.rs` | `ApiError::Database(e)` does not leak SQL in the response body | | `test_rls_transaction_sets_role` | `common/src/rls.rs` | `RlsTransaction::begin()` issues `SET LOCAL role` with the auth context role | | `test_rls_transaction_rejects_bad_role` | `common/src/rls.rs` | Role outside `[anon, authenticated, service_role]` returns `Forbidden` | | `test_rls_transaction_sets_claims` | `common/src/rls.rs` | JWT `sub` claim is available via `current_setting('request.jwt.claim.sub')` | ### 2. Integration Verification - [ ] `docker compose up` starts all services (db, redis, minio, worker, system, proxy) without crash-loops - [ ] `curl -X POST http://localhost:8000/auth/v1/signup -H "apikey: " -d '{"email":"test@test.com","password":"password123"}'` returns a user (through the proxy) - [ ] Large file upload (>5MB) through the proxy succeeds (body forwarding works) - [ ] Proxy distributes requests across multiple workers (if configured) - [ ] `RUST_LOG=debug` works in all three standalone binaries - [ ] API errors return structured JSON, never raw SQL error messages - [ ] `docker compose down && docker compose up` — idempotent restart with no data loss ### 3. CI Gate - [ ] All of the above unit tests are included in `cargo test --workspace` - [ ] No `#[ignore]` on any test added in this milestone unless it requires external services (and those must be documented)