Files
madbase/_milestones/M1_foundation.md
Vlad Durnea cffdf8af86
Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
wip:milestone 0 fixes
2026-03-15 12:35:42 +02:00

15 KiB

Milestone 1: Foundation — Make It Compile and Run Correctly

Goal: A developer can docker compose up, hit the API with supabase-js, and get correct behavior for basic flows.

Depends on: M0 (Security Hardening)


1.1 — Fix Critical Bugs

1.1.1 Fix proxy body forwarding

File: gateway/src/proxy.rsforward_request function (line ~172)

The proxy builds a reqwest request with .headers() but never reads or forwards the request body. Every POST/PUT/PATCH through the proxy silently drops the body.

Current code (broken):

let request_builder = client
    .request(req.method().clone(), &target_url)
    .headers(req.headers().clone());
// Body is never set!

Fix: Read the body from the incoming axum Request and attach it to the outgoing reqwest request:

// Extract body before consuming the request
let (parts, body) = req.into_parts();
let body_bytes = axum::body::to_bytes(body, 1024 * 1024 * 100) // 100MB limit
    .await
    .map_err(|_| StatusCode::BAD_REQUEST)?;

let request_builder = client
    .request(parts.method.clone(), &target_url)
    .headers(parts.headers.clone())
    .body(body_bytes);

For streaming (large uploads), use reqwest::Body::wrap_stream() instead of buffering.

1.1.2 Fix proxy round-robin

File: gateway/src/proxy.rsproxy_request function (line ~147)

Current broken logic: get_healthy_worker() always returns the FIRST healthy worker. Round-robin (get_next_worker()) is only used as a fallback when NO workers are healthy.

Fix: Merge the two methods — round-robin among healthy workers:

async fn get_next_healthy_worker(&self) -> Option<Upstream> {
    let upstreams = self.worker_upstreams.read().await;
    let len = upstreams.len();
    if len == 0 { return None; }

    let mut index = self.current_worker_index.write().await;
    for _ in 0..len {
        let candidate = &upstreams[*index % len];
        *index = (*index + 1) % len;
        if *candidate.healthy.read().await {
            return Some(candidate.clone());
        }
    }
    // All unhealthy — return next in rotation anyway
    let fallback = upstreams[*index % len].clone();
    *index = (*index + 1) % len;
    Some(fallback)
}

1.1.3 Fix proxy response streaming

File: gateway/src/proxy.rsforward_request function (line ~200)

// BEFORE — loads entire response into memory
let body_bytes = response.bytes().await.map_err(|e| { ... })?;
response_builder.body(Body::from(body_bytes.to_vec()))

// AFTER — stream the response
let stream = response.bytes_stream();
let body = Body::from_stream(stream);
response_builder.body(body)

This prevents OOM on large file downloads through the proxy.

1.1.4 Pool HTTP clients

Files: gateway/src/proxy.rs, gateway/src/control.rs

Create reqwest::Client once at startup and store it in state:

// In ProxyState::new()
let http_client = reqwest::Client::builder()
    .timeout(std::time::Duration::from_secs(30))
    .pool_max_idle_per_host(20)
    .build()
    .unwrap();

Store in ProxyState { http_client, ... }. Pass to forward_request. Same for health check loop — use the shared client instead of creating one per iteration.

In gateway/src/control.rslogs_proxy_handler (line 23): create the client in ControlState and pass via State, not reqwest::Client::new() per request.

1.1.5 Fix tracing in standalone binaries

Files: gateway/src/bin/proxy.rs, bin/control.rs, bin/worker.rs

All three have the same bug — _rust_log is unused:

// BEFORE
let _rust_log = std::env::var("RUST_LOG").unwrap_or_else(|_| "info".into());
tracing_subscriber::fmt::init();

// AFTER
tracing_subscriber::fmt()
    .with_env_filter(
        tracing_subscriber::EnvFilter::try_from_default_env()
            .unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info"))
    )
    .init();

Also note bin/worker.rs has a typo: RUST_log instead of RUST_LOG.


1.2 — Dev Stack That Actually Works

1.2.1 Updated docker-compose.yml

Add Redis, MinIO, health checks, and proper startup ordering:

services:
  db:
    image: postgres:15-alpine
    container_name: madbase_dev_db
    environment:
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
    ports:
      - "5432:5432"
    volumes:
      - dev_db_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 10

  redis:
    image: redis:7-alpine
    container_name: madbase_dev_redis
    command: redis-server --appendonly yes
    ports:
      - "6379:6379"
    volumes:
      - dev_redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

  minio:
    image: quay.io/minio/minio:RELEASE.2024-06-13T22-53-53Z
    container_name: madbase_dev_minio
    command: server /data --console-address ":9001"
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin}
      MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin}
    volumes:
      - dev_minio_data:/data
    healthcheck:
      test: ["CMD", "mc", "ready", "local"]
      interval: 5s
      timeout: 3s
      retries: 5

  worker:
    build:
      context: .
      target: worker-runtime
    container_name: madbase_dev_worker
    ports:
      - "8002:8002"
    environment:
      DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
      DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
      JWT_SECRET: ${JWT_SECRET}
      REDIS_URL: redis://redis:6379
      S3_ENDPOINT: http://minio:9000
      S3_ACCESS_KEY: ${S3_ACCESS_KEY:-minioadmin}
      S3_SECRET_KEY: ${S3_SECRET_KEY:-minioadmin}
      S3_BUCKET: madbase
      S3_REGION: us-east-1
      RUST_LOG: info
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
      minio:
        condition: service_healthy

  system:
    build:
      context: .
      target: control-runtime
    container_name: madbase_dev_system
    ports:
      - "8001:8001"
    environment:
      DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
      DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
      JWT_SECRET: ${JWT_SECRET}
      ADMIN_PASSWORD: ${ADMIN_PASSWORD}
      RUST_LOG: info
    depends_on:
      db:
        condition: service_healthy

  proxy:
    build:
      context: .
      target: proxy-runtime
    container_name: madbase_dev_proxy
    ports:
      - "8000:8000"
    environment:
      CONTROL_UPSTREAM_URL: http://system:8001
      WORKER_UPSTREAM_URLS: http://worker:8002
      RUST_LOG: info
    depends_on:
      - system
      - worker

volumes:
  dev_db_data:
  dev_redis_data:
  dev_minio_data:

1.2.2 Create .env.example

# Required
JWT_SECRET=generate-with-openssl-rand-hex-32
ADMIN_PASSWORD=change-me-in-production
DATABASE_URL=postgres://postgres:postgres@localhost:5432/postgres
DEFAULT_TENANT_DB_URL=postgres://postgres:postgres@localhost:5432/postgres

# Storage (MinIO for dev, Hetzner/AWS for production)
S3_ENDPOINT=http://localhost:9000
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_BUCKET=madbase
S3_REGION=us-east-1

# Optional
REDIS_URL=redis://localhost:6379
RUST_LOG=info
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8000

1.2.3 Create missing config files

Create config/prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'madbase-worker'
    static_configs:
      - targets: ['worker:8002']
    metrics_path: /metrics

  - job_name: 'madbase-control'
    static_configs:
      - targets: ['control:8001']
    metrics_path: /metrics

  - job_name: 'madbase-proxy'
    static_configs:
      - targets: ['proxy:8000']
    metrics_path: /metrics

Create config/vmagent.yml with the same content.

1.2.4 Fix Grafana port

File: docker-compose.pillar-system.yml line 33

# BEFORE
ports:
  - "3030:3030"

# AFTER — Grafana listens on 3000 by default
ports:
  - "3030:3000"

Or add GF_SERVER_HTTP_PORT=3030 to the environment.


1.3 — Unified Error Handling

1.3.1 Create ApiError type

File: Create common/src/error.rs

use axum::http::StatusCode;
use axum::response::{IntoResponse, Response, Json};
use serde::Serialize;

#[derive(Debug)]
pub enum ApiError {
    BadRequest(String),
    Unauthorized(String),
    Forbidden(String),
    NotFound(String),
    Conflict(String),
    Internal(String),
    Database(sqlx::Error),
}

#[derive(Serialize)]
struct ErrorResponse {
    error: String,
    code: u16,
    #[serde(skip_serializing_if = "Option::is_none")]
    detail: Option<String>,
}

impl IntoResponse for ApiError {
    fn into_response(self) -> Response {
        let (status, message, detail) = match &self {
            ApiError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone(), None),
            ApiError::Unauthorized(msg) => (StatusCode::UNAUTHORIZED, msg.clone(), None),
            ApiError::Forbidden(msg) => (StatusCode::FORBIDDEN, msg.clone(), None),
            ApiError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone(), None),
            ApiError::Conflict(msg) => (StatusCode::CONFLICT, msg.clone(), None),
            ApiError::Internal(msg) => {
                tracing::error!("Internal error: {}", msg);
                (StatusCode::INTERNAL_SERVER_ERROR, "Internal server error".to_string(), None)
            }
            ApiError::Database(e) => {
                tracing::error!("Database error: {}", e);
                (StatusCode::INTERNAL_SERVER_ERROR, "Database error".to_string(), None)
            }
        };

        let body = ErrorResponse {
            error: message,
            code: status.as_u16(),
            detail,
        };

        (status, Json(body)).into_response()
    }
}

impl From<sqlx::Error> for ApiError {
    fn from(e: sqlx::Error) -> Self {
        ApiError::Database(e)
    }
}

Gradually replace (StatusCode, String) return types with Result<T, ApiError> across all handlers.


1.4 — Extract RLS Middleware

1.4.1 Create RLS transaction extractor

The BEGIN tx → SET LOCAL role → set_config block is repeated ~15 times. Create an extractor:

File: Create common/src/rls.rs

use axum::extract::{Extension, FromRequestParts};
use auth::AuthContext;
use sqlx::{PgPool, Postgres, Transaction};

pub struct RlsTransaction {
    pub tx: Transaction<'static, Postgres>,
}

impl RlsTransaction {
    pub async fn begin(
        pool: &PgPool,
        auth_ctx: &AuthContext,
    ) -> Result<Self, ApiError> {
        let mut tx = pool.begin().await?;

        // Validate and set role
        const ALLOWED_ROLES: &[&str] = &["anon", "authenticated", "service_role"];
        if !ALLOWED_ROLES.contains(&auth_ctx.role.as_str()) {
            return Err(ApiError::Forbidden("Invalid role".into()));
        }
        let role_query = format!("SET LOCAL role = '{}'", auth_ctx.role);
        sqlx::query(&role_query).execute(&mut *tx).await?;

        // Set JWT claims for RLS policies
        if let Some(claims) = &auth_ctx.claims {
            sqlx::query("SELECT set_config('request.jwt.claim.sub', $1, true)")
                .bind(&claims.sub)
                .execute(&mut *tx)
                .await?;
        }

        Ok(Self { tx })
    }

    pub async fn commit(self) -> Result<(), ApiError> {
        self.tx.commit().await.map_err(ApiError::from)
    }
}

Usage in handlers:

pub async fn list_buckets(
    State(state): State<StorageState>,
    Extension(auth_ctx): Extension<AuthContext>,
    db: Option<Extension<PgPool>>,
) -> Result<Json<Vec<Bucket>>, ApiError> {
    let pool = db.map(|Extension(p)| p).unwrap_or_else(|| state.db.clone());
    let mut rls = RlsTransaction::begin(&pool, &auth_ctx).await?;

    let buckets = sqlx::query_as::<_, Bucket>("SELECT * FROM storage.buckets")
        .fetch_all(&mut *rls.tx)
        .await?;

    Ok(Json(buckets))
    // tx auto-rolls back on drop (read-only is fine)
}

This eliminates ~150 lines of duplicated error-mapping boilerplate.


Completion Requirements

This milestone is not complete until every item below is satisfied.

1. Full Test Suite — All Green

  • cargo test --workspace passes with zero failures
  • All pre-existing tests still pass (no regressions)
  • New unit tests are written for every fix in this milestone:
Test Location What it validates
test_proxy_forwards_body gateway/src/proxy.rs POST with 1MB body reaches the upstream intact
test_proxy_streams_response gateway/src/proxy.rs Large response is streamed, not buffered entirely
test_proxy_round_robin gateway/src/proxy.rs 4 requests to 2 workers distribute 2+2
test_proxy_single_http_client gateway/src/proxy.rs reqwest::Client is reused (shared state, not per-request)
test_worker_tracing_init gateway/src/bin/worker.rs RUST_LOG=debug produces debug-level spans
test_api_error_json_format common/src/error.rs ApiError::BadRequest("x") serializes to {"error":"x","code":400}
test_api_error_hides_db_detail common/src/error.rs ApiError::Database(e) does not leak SQL in the response body
test_rls_transaction_sets_role common/src/rls.rs RlsTransaction::begin() issues SET LOCAL role with the auth context role
test_rls_transaction_rejects_bad_role common/src/rls.rs Role outside [anon, authenticated, service_role] returns Forbidden
test_rls_transaction_sets_claims common/src/rls.rs JWT sub claim is available via current_setting('request.jwt.claim.sub')

2. Integration Verification

  • docker compose up starts all services (db, redis, minio, worker, system, proxy) without crash-loops
  • curl -X POST http://localhost:8000/auth/v1/signup -H "apikey: <anon_key>" -d '{"email":"test@test.com","password":"password123"}' returns a user (through the proxy)
  • Large file upload (>5MB) through the proxy succeeds (body forwarding works)
  • Proxy distributes requests across multiple workers (if configured)
  • RUST_LOG=debug works in all three standalone binaries
  • API errors return structured JSON, never raw SQL error messages
  • docker compose down && docker compose up — idempotent restart with no data loss

3. CI Gate

  • All of the above unit tests are included in cargo test --workspace
  • No #[ignore] on any test added in this milestone unless it requires external services (and those must be documented)