15 KiB
Milestone 1: Foundation — Make It Compile and Run Correctly
Goal: A developer can docker compose up, hit the API with supabase-js, and get correct behavior for basic flows.
Depends on: M0 (Security Hardening)
1.1 — Fix Critical Bugs
1.1.1 Fix proxy body forwarding
File: gateway/src/proxy.rs — forward_request function (line ~172)
The proxy builds a reqwest request with .headers() but never reads or forwards the request body. Every POST/PUT/PATCH through the proxy silently drops the body.
Current code (broken):
let request_builder = client
.request(req.method().clone(), &target_url)
.headers(req.headers().clone());
// Body is never set!
Fix: Read the body from the incoming axum Request and attach it to the outgoing reqwest request:
// Extract body before consuming the request
let (parts, body) = req.into_parts();
let body_bytes = axum::body::to_bytes(body, 1024 * 1024 * 100) // 100MB limit
.await
.map_err(|_| StatusCode::BAD_REQUEST)?;
let request_builder = client
.request(parts.method.clone(), &target_url)
.headers(parts.headers.clone())
.body(body_bytes);
For streaming (large uploads), use reqwest::Body::wrap_stream() instead of buffering.
1.1.2 Fix proxy round-robin
File: gateway/src/proxy.rs — proxy_request function (line ~147)
Current broken logic: get_healthy_worker() always returns the FIRST healthy worker. Round-robin (get_next_worker()) is only used as a fallback when NO workers are healthy.
Fix: Merge the two methods — round-robin among healthy workers:
async fn get_next_healthy_worker(&self) -> Option<Upstream> {
let upstreams = self.worker_upstreams.read().await;
let len = upstreams.len();
if len == 0 { return None; }
let mut index = self.current_worker_index.write().await;
for _ in 0..len {
let candidate = &upstreams[*index % len];
*index = (*index + 1) % len;
if *candidate.healthy.read().await {
return Some(candidate.clone());
}
}
// All unhealthy — return next in rotation anyway
let fallback = upstreams[*index % len].clone();
*index = (*index + 1) % len;
Some(fallback)
}
1.1.3 Fix proxy response streaming
File: gateway/src/proxy.rs — forward_request function (line ~200)
// BEFORE — loads entire response into memory
let body_bytes = response.bytes().await.map_err(|e| { ... })?;
response_builder.body(Body::from(body_bytes.to_vec()))
// AFTER — stream the response
let stream = response.bytes_stream();
let body = Body::from_stream(stream);
response_builder.body(body)
This prevents OOM on large file downloads through the proxy.
1.1.4 Pool HTTP clients
Files: gateway/src/proxy.rs, gateway/src/control.rs
Create reqwest::Client once at startup and store it in state:
// In ProxyState::new()
let http_client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.pool_max_idle_per_host(20)
.build()
.unwrap();
Store in ProxyState { http_client, ... }. Pass to forward_request. Same for health check loop — use the shared client instead of creating one per iteration.
In gateway/src/control.rs — logs_proxy_handler (line 23): create the client in ControlState and pass via State, not reqwest::Client::new() per request.
1.1.5 Fix tracing in standalone binaries
Files: gateway/src/bin/proxy.rs, bin/control.rs, bin/worker.rs
All three have the same bug — _rust_log is unused:
// BEFORE
let _rust_log = std::env::var("RUST_LOG").unwrap_or_else(|_| "info".into());
tracing_subscriber::fmt::init();
// AFTER
tracing_subscriber::fmt()
.with_env_filter(
tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info"))
)
.init();
Also note bin/worker.rs has a typo: RUST_log instead of RUST_LOG.
1.2 — Dev Stack That Actually Works
1.2.1 Updated docker-compose.yml
Add Redis, MinIO, health checks, and proper startup ordering:
services:
db:
image: postgres:15-alpine
container_name: madbase_dev_db
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
ports:
- "5432:5432"
volumes:
- dev_db_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 10
redis:
image: redis:7-alpine
container_name: madbase_dev_redis
command: redis-server --appendonly yes
ports:
- "6379:6379"
volumes:
- dev_redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
minio:
image: quay.io/minio/minio:RELEASE.2024-06-13T22-53-53Z
container_name: madbase_dev_minio
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin}
MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin}
volumes:
- dev_minio_data:/data
healthcheck:
test: ["CMD", "mc", "ready", "local"]
interval: 5s
timeout: 3s
retries: 5
worker:
build:
context: .
target: worker-runtime
container_name: madbase_dev_worker
ports:
- "8002:8002"
environment:
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
JWT_SECRET: ${JWT_SECRET}
REDIS_URL: redis://redis:6379
S3_ENDPOINT: http://minio:9000
S3_ACCESS_KEY: ${S3_ACCESS_KEY:-minioadmin}
S3_SECRET_KEY: ${S3_SECRET_KEY:-minioadmin}
S3_BUCKET: madbase
S3_REGION: us-east-1
RUST_LOG: info
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
minio:
condition: service_healthy
system:
build:
context: .
target: control-runtime
container_name: madbase_dev_system
ports:
- "8001:8001"
environment:
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
JWT_SECRET: ${JWT_SECRET}
ADMIN_PASSWORD: ${ADMIN_PASSWORD}
RUST_LOG: info
depends_on:
db:
condition: service_healthy
proxy:
build:
context: .
target: proxy-runtime
container_name: madbase_dev_proxy
ports:
- "8000:8000"
environment:
CONTROL_UPSTREAM_URL: http://system:8001
WORKER_UPSTREAM_URLS: http://worker:8002
RUST_LOG: info
depends_on:
- system
- worker
volumes:
dev_db_data:
dev_redis_data:
dev_minio_data:
1.2.2 Create .env.example
# Required
JWT_SECRET=generate-with-openssl-rand-hex-32
ADMIN_PASSWORD=change-me-in-production
DATABASE_URL=postgres://postgres:postgres@localhost:5432/postgres
DEFAULT_TENANT_DB_URL=postgres://postgres:postgres@localhost:5432/postgres
# Storage (MinIO for dev, Hetzner/AWS for production)
S3_ENDPOINT=http://localhost:9000
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_BUCKET=madbase
S3_REGION=us-east-1
# Optional
REDIS_URL=redis://localhost:6379
RUST_LOG=info
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8000
1.2.3 Create missing config files
Create config/prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'madbase-worker'
static_configs:
- targets: ['worker:8002']
metrics_path: /metrics
- job_name: 'madbase-control'
static_configs:
- targets: ['control:8001']
metrics_path: /metrics
- job_name: 'madbase-proxy'
static_configs:
- targets: ['proxy:8000']
metrics_path: /metrics
Create config/vmagent.yml with the same content.
1.2.4 Fix Grafana port
File: docker-compose.pillar-system.yml line 33
# BEFORE
ports:
- "3030:3030"
# AFTER — Grafana listens on 3000 by default
ports:
- "3030:3000"
Or add GF_SERVER_HTTP_PORT=3030 to the environment.
1.3 — Unified Error Handling
1.3.1 Create ApiError type
File: Create common/src/error.rs
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response, Json};
use serde::Serialize;
#[derive(Debug)]
pub enum ApiError {
BadRequest(String),
Unauthorized(String),
Forbidden(String),
NotFound(String),
Conflict(String),
Internal(String),
Database(sqlx::Error),
}
#[derive(Serialize)]
struct ErrorResponse {
error: String,
code: u16,
#[serde(skip_serializing_if = "Option::is_none")]
detail: Option<String>,
}
impl IntoResponse for ApiError {
fn into_response(self) -> Response {
let (status, message, detail) = match &self {
ApiError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone(), None),
ApiError::Unauthorized(msg) => (StatusCode::UNAUTHORIZED, msg.clone(), None),
ApiError::Forbidden(msg) => (StatusCode::FORBIDDEN, msg.clone(), None),
ApiError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone(), None),
ApiError::Conflict(msg) => (StatusCode::CONFLICT, msg.clone(), None),
ApiError::Internal(msg) => {
tracing::error!("Internal error: {}", msg);
(StatusCode::INTERNAL_SERVER_ERROR, "Internal server error".to_string(), None)
}
ApiError::Database(e) => {
tracing::error!("Database error: {}", e);
(StatusCode::INTERNAL_SERVER_ERROR, "Database error".to_string(), None)
}
};
let body = ErrorResponse {
error: message,
code: status.as_u16(),
detail,
};
(status, Json(body)).into_response()
}
}
impl From<sqlx::Error> for ApiError {
fn from(e: sqlx::Error) -> Self {
ApiError::Database(e)
}
}
Gradually replace (StatusCode, String) return types with Result<T, ApiError> across all handlers.
1.4 — Extract RLS Middleware
1.4.1 Create RLS transaction extractor
The BEGIN tx → SET LOCAL role → set_config block is repeated ~15 times. Create an extractor:
File: Create common/src/rls.rs
use axum::extract::{Extension, FromRequestParts};
use auth::AuthContext;
use sqlx::{PgPool, Postgres, Transaction};
pub struct RlsTransaction {
pub tx: Transaction<'static, Postgres>,
}
impl RlsTransaction {
pub async fn begin(
pool: &PgPool,
auth_ctx: &AuthContext,
) -> Result<Self, ApiError> {
let mut tx = pool.begin().await?;
// Validate and set role
const ALLOWED_ROLES: &[&str] = &["anon", "authenticated", "service_role"];
if !ALLOWED_ROLES.contains(&auth_ctx.role.as_str()) {
return Err(ApiError::Forbidden("Invalid role".into()));
}
let role_query = format!("SET LOCAL role = '{}'", auth_ctx.role);
sqlx::query(&role_query).execute(&mut *tx).await?;
// Set JWT claims for RLS policies
if let Some(claims) = &auth_ctx.claims {
sqlx::query("SELECT set_config('request.jwt.claim.sub', $1, true)")
.bind(&claims.sub)
.execute(&mut *tx)
.await?;
}
Ok(Self { tx })
}
pub async fn commit(self) -> Result<(), ApiError> {
self.tx.commit().await.map_err(ApiError::from)
}
}
Usage in handlers:
pub async fn list_buckets(
State(state): State<StorageState>,
Extension(auth_ctx): Extension<AuthContext>,
db: Option<Extension<PgPool>>,
) -> Result<Json<Vec<Bucket>>, ApiError> {
let pool = db.map(|Extension(p)| p).unwrap_or_else(|| state.db.clone());
let mut rls = RlsTransaction::begin(&pool, &auth_ctx).await?;
let buckets = sqlx::query_as::<_, Bucket>("SELECT * FROM storage.buckets")
.fetch_all(&mut *rls.tx)
.await?;
Ok(Json(buckets))
// tx auto-rolls back on drop (read-only is fine)
}
This eliminates ~150 lines of duplicated error-mapping boilerplate.
Completion Requirements
This milestone is not complete until every item below is satisfied.
1. Full Test Suite — All Green
cargo test --workspacepasses with zero failures- All pre-existing tests still pass (no regressions)
- New unit tests are written for every fix in this milestone:
| Test | Location | What it validates |
|---|---|---|
test_proxy_forwards_body |
gateway/src/proxy.rs |
POST with 1MB body reaches the upstream intact |
test_proxy_streams_response |
gateway/src/proxy.rs |
Large response is streamed, not buffered entirely |
test_proxy_round_robin |
gateway/src/proxy.rs |
4 requests to 2 workers distribute 2+2 |
test_proxy_single_http_client |
gateway/src/proxy.rs |
reqwest::Client is reused (shared state, not per-request) |
test_worker_tracing_init |
gateway/src/bin/worker.rs |
RUST_LOG=debug produces debug-level spans |
test_api_error_json_format |
common/src/error.rs |
ApiError::BadRequest("x") serializes to {"error":"x","code":400} |
test_api_error_hides_db_detail |
common/src/error.rs |
ApiError::Database(e) does not leak SQL in the response body |
test_rls_transaction_sets_role |
common/src/rls.rs |
RlsTransaction::begin() issues SET LOCAL role with the auth context role |
test_rls_transaction_rejects_bad_role |
common/src/rls.rs |
Role outside [anon, authenticated, service_role] returns Forbidden |
test_rls_transaction_sets_claims |
common/src/rls.rs |
JWT sub claim is available via current_setting('request.jwt.claim.sub') |
2. Integration Verification
docker compose upstarts all services (db, redis, minio, worker, system, proxy) without crash-loopscurl -X POST http://localhost:8000/auth/v1/signup -H "apikey: <anon_key>" -d '{"email":"test@test.com","password":"password123"}'returns a user (through the proxy)- Large file upload (>5MB) through the proxy succeeds (body forwarding works)
- Proxy distributes requests across multiple workers (if configured)
RUST_LOG=debugworks in all three standalone binaries- API errors return structured JSON, never raw SQL error messages
docker compose down && docker compose up— idempotent restart with no data loss
3. CI Gate
- All of the above unit tests are included in
cargo test --workspace - No
#[ignore]on any test added in this milestone unless it requires external services (and those must be documented)