Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
494 lines
15 KiB
Markdown
494 lines
15 KiB
Markdown
# Milestone 1: Foundation — Make It Compile and Run Correctly
|
|
|
|
**Goal:** A developer can `docker compose up`, hit the API with supabase-js, and get correct behavior for basic flows.
|
|
|
|
**Depends on:** M0 (Security Hardening)
|
|
|
|
---
|
|
|
|
## 1.1 — Fix Critical Bugs
|
|
|
|
### 1.1.1 Fix proxy body forwarding
|
|
|
|
**File:** `gateway/src/proxy.rs` — `forward_request` function (line ~172)
|
|
|
|
The proxy builds a `reqwest` request with `.headers()` but never reads or forwards the request body. Every POST/PUT/PATCH through the proxy silently drops the body.
|
|
|
|
**Current code (broken):**
|
|
```rust
|
|
let request_builder = client
|
|
.request(req.method().clone(), &target_url)
|
|
.headers(req.headers().clone());
|
|
// Body is never set!
|
|
```
|
|
|
|
**Fix:** Read the body from the incoming axum `Request` and attach it to the outgoing `reqwest` request:
|
|
|
|
```rust
|
|
// Extract body before consuming the request
|
|
let (parts, body) = req.into_parts();
|
|
let body_bytes = axum::body::to_bytes(body, 1024 * 1024 * 100) // 100MB limit
|
|
.await
|
|
.map_err(|_| StatusCode::BAD_REQUEST)?;
|
|
|
|
let request_builder = client
|
|
.request(parts.method.clone(), &target_url)
|
|
.headers(parts.headers.clone())
|
|
.body(body_bytes);
|
|
```
|
|
|
|
For streaming (large uploads), use `reqwest::Body::wrap_stream()` instead of buffering.
|
|
|
|
### 1.1.2 Fix proxy round-robin
|
|
|
|
**File:** `gateway/src/proxy.rs` — `proxy_request` function (line ~147)
|
|
|
|
**Current broken logic:** `get_healthy_worker()` always returns the FIRST healthy worker. Round-robin (`get_next_worker()`) is only used as a fallback when NO workers are healthy.
|
|
|
|
**Fix:** Merge the two methods — round-robin among healthy workers:
|
|
|
|
```rust
|
|
async fn get_next_healthy_worker(&self) -> Option<Upstream> {
|
|
let upstreams = self.worker_upstreams.read().await;
|
|
let len = upstreams.len();
|
|
if len == 0 { return None; }
|
|
|
|
let mut index = self.current_worker_index.write().await;
|
|
for _ in 0..len {
|
|
let candidate = &upstreams[*index % len];
|
|
*index = (*index + 1) % len;
|
|
if *candidate.healthy.read().await {
|
|
return Some(candidate.clone());
|
|
}
|
|
}
|
|
// All unhealthy — return next in rotation anyway
|
|
let fallback = upstreams[*index % len].clone();
|
|
*index = (*index + 1) % len;
|
|
Some(fallback)
|
|
}
|
|
```
|
|
|
|
### 1.1.3 Fix proxy response streaming
|
|
|
|
**File:** `gateway/src/proxy.rs` — `forward_request` function (line ~200)
|
|
|
|
```rust
|
|
// BEFORE — loads entire response into memory
|
|
let body_bytes = response.bytes().await.map_err(|e| { ... })?;
|
|
response_builder.body(Body::from(body_bytes.to_vec()))
|
|
|
|
// AFTER — stream the response
|
|
let stream = response.bytes_stream();
|
|
let body = Body::from_stream(stream);
|
|
response_builder.body(body)
|
|
```
|
|
|
|
This prevents OOM on large file downloads through the proxy.
|
|
|
|
### 1.1.4 Pool HTTP clients
|
|
|
|
**Files:** `gateway/src/proxy.rs`, `gateway/src/control.rs`
|
|
|
|
Create `reqwest::Client` once at startup and store it in state:
|
|
|
|
```rust
|
|
// In ProxyState::new()
|
|
let http_client = reqwest::Client::builder()
|
|
.timeout(std::time::Duration::from_secs(30))
|
|
.pool_max_idle_per_host(20)
|
|
.build()
|
|
.unwrap();
|
|
```
|
|
|
|
Store in `ProxyState { http_client, ... }`. Pass to `forward_request`. Same for health check loop — use the shared client instead of creating one per iteration.
|
|
|
|
In `gateway/src/control.rs` — `logs_proxy_handler` (line 23): create the client in `ControlState` and pass via `State`, not `reqwest::Client::new()` per request.
|
|
|
|
### 1.1.5 Fix tracing in standalone binaries
|
|
|
|
**Files:** `gateway/src/bin/proxy.rs`, `bin/control.rs`, `bin/worker.rs`
|
|
|
|
All three have the same bug — `_rust_log` is unused:
|
|
|
|
```rust
|
|
// BEFORE
|
|
let _rust_log = std::env::var("RUST_LOG").unwrap_or_else(|_| "info".into());
|
|
tracing_subscriber::fmt::init();
|
|
|
|
// AFTER
|
|
tracing_subscriber::fmt()
|
|
.with_env_filter(
|
|
tracing_subscriber::EnvFilter::try_from_default_env()
|
|
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info"))
|
|
)
|
|
.init();
|
|
```
|
|
|
|
Also note `bin/worker.rs` has a typo: `RUST_log` instead of `RUST_LOG`.
|
|
|
|
---
|
|
|
|
## 1.2 — Dev Stack That Actually Works
|
|
|
|
### 1.2.1 Updated docker-compose.yml
|
|
|
|
Add Redis, MinIO, health checks, and proper startup ordering:
|
|
|
|
```yaml
|
|
services:
|
|
db:
|
|
image: postgres:15-alpine
|
|
container_name: madbase_dev_db
|
|
environment:
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
|
|
ports:
|
|
- "5432:5432"
|
|
volumes:
|
|
- dev_db_data:/var/lib/postgresql/data
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
|
interval: 5s
|
|
timeout: 3s
|
|
retries: 10
|
|
|
|
redis:
|
|
image: redis:7-alpine
|
|
container_name: madbase_dev_redis
|
|
command: redis-server --appendonly yes
|
|
ports:
|
|
- "6379:6379"
|
|
volumes:
|
|
- dev_redis_data:/data
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "ping"]
|
|
interval: 5s
|
|
timeout: 3s
|
|
retries: 5
|
|
|
|
minio:
|
|
image: quay.io/minio/minio:RELEASE.2024-06-13T22-53-53Z
|
|
container_name: madbase_dev_minio
|
|
command: server /data --console-address ":9001"
|
|
ports:
|
|
- "9000:9000"
|
|
- "9001:9001"
|
|
environment:
|
|
MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin}
|
|
MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin}
|
|
volumes:
|
|
- dev_minio_data:/data
|
|
healthcheck:
|
|
test: ["CMD", "mc", "ready", "local"]
|
|
interval: 5s
|
|
timeout: 3s
|
|
retries: 5
|
|
|
|
worker:
|
|
build:
|
|
context: .
|
|
target: worker-runtime
|
|
container_name: madbase_dev_worker
|
|
ports:
|
|
- "8002:8002"
|
|
environment:
|
|
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
|
|
DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
|
|
JWT_SECRET: ${JWT_SECRET}
|
|
REDIS_URL: redis://redis:6379
|
|
S3_ENDPOINT: http://minio:9000
|
|
S3_ACCESS_KEY: ${S3_ACCESS_KEY:-minioadmin}
|
|
S3_SECRET_KEY: ${S3_SECRET_KEY:-minioadmin}
|
|
S3_BUCKET: madbase
|
|
S3_REGION: us-east-1
|
|
RUST_LOG: info
|
|
depends_on:
|
|
db:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
minio:
|
|
condition: service_healthy
|
|
|
|
system:
|
|
build:
|
|
context: .
|
|
target: control-runtime
|
|
container_name: madbase_dev_system
|
|
ports:
|
|
- "8001:8001"
|
|
environment:
|
|
DATABASE_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
|
|
DEFAULT_TENANT_DB_URL: postgres://postgres:${POSTGRES_PASSWORD:-postgres}@db:5432/postgres
|
|
JWT_SECRET: ${JWT_SECRET}
|
|
ADMIN_PASSWORD: ${ADMIN_PASSWORD}
|
|
RUST_LOG: info
|
|
depends_on:
|
|
db:
|
|
condition: service_healthy
|
|
|
|
proxy:
|
|
build:
|
|
context: .
|
|
target: proxy-runtime
|
|
container_name: madbase_dev_proxy
|
|
ports:
|
|
- "8000:8000"
|
|
environment:
|
|
CONTROL_UPSTREAM_URL: http://system:8001
|
|
WORKER_UPSTREAM_URLS: http://worker:8002
|
|
RUST_LOG: info
|
|
depends_on:
|
|
- system
|
|
- worker
|
|
|
|
volumes:
|
|
dev_db_data:
|
|
dev_redis_data:
|
|
dev_minio_data:
|
|
```
|
|
|
|
### 1.2.2 Create .env.example
|
|
|
|
```env
|
|
# Required
|
|
JWT_SECRET=generate-with-openssl-rand-hex-32
|
|
ADMIN_PASSWORD=change-me-in-production
|
|
DATABASE_URL=postgres://postgres:postgres@localhost:5432/postgres
|
|
DEFAULT_TENANT_DB_URL=postgres://postgres:postgres@localhost:5432/postgres
|
|
|
|
# Storage (MinIO for dev, Hetzner/AWS for production)
|
|
S3_ENDPOINT=http://localhost:9000
|
|
S3_ACCESS_KEY=minioadmin
|
|
S3_SECRET_KEY=minioadmin
|
|
S3_BUCKET=madbase
|
|
S3_REGION=us-east-1
|
|
|
|
# Optional
|
|
REDIS_URL=redis://localhost:6379
|
|
RUST_LOG=info
|
|
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8000
|
|
```
|
|
|
|
### 1.2.3 Create missing config files
|
|
|
|
Create `config/prometheus.yml`:
|
|
```yaml
|
|
global:
|
|
scrape_interval: 15s
|
|
|
|
scrape_configs:
|
|
- job_name: 'madbase-worker'
|
|
static_configs:
|
|
- targets: ['worker:8002']
|
|
metrics_path: /metrics
|
|
|
|
- job_name: 'madbase-control'
|
|
static_configs:
|
|
- targets: ['control:8001']
|
|
metrics_path: /metrics
|
|
|
|
- job_name: 'madbase-proxy'
|
|
static_configs:
|
|
- targets: ['proxy:8000']
|
|
metrics_path: /metrics
|
|
```
|
|
|
|
Create `config/vmagent.yml` with the same content.
|
|
|
|
### 1.2.4 Fix Grafana port
|
|
|
|
**File:** `docker-compose.pillar-system.yml` line 33
|
|
|
|
```yaml
|
|
# BEFORE
|
|
ports:
|
|
- "3030:3030"
|
|
|
|
# AFTER — Grafana listens on 3000 by default
|
|
ports:
|
|
- "3030:3000"
|
|
```
|
|
|
|
Or add `GF_SERVER_HTTP_PORT=3030` to the environment.
|
|
|
|
---
|
|
|
|
## 1.3 — Unified Error Handling
|
|
|
|
### 1.3.1 Create ApiError type
|
|
|
|
**File:** Create `common/src/error.rs`
|
|
|
|
```rust
|
|
use axum::http::StatusCode;
|
|
use axum::response::{IntoResponse, Response, Json};
|
|
use serde::Serialize;
|
|
|
|
#[derive(Debug)]
|
|
pub enum ApiError {
|
|
BadRequest(String),
|
|
Unauthorized(String),
|
|
Forbidden(String),
|
|
NotFound(String),
|
|
Conflict(String),
|
|
Internal(String),
|
|
Database(sqlx::Error),
|
|
}
|
|
|
|
#[derive(Serialize)]
|
|
struct ErrorResponse {
|
|
error: String,
|
|
code: u16,
|
|
#[serde(skip_serializing_if = "Option::is_none")]
|
|
detail: Option<String>,
|
|
}
|
|
|
|
impl IntoResponse for ApiError {
|
|
fn into_response(self) -> Response {
|
|
let (status, message, detail) = match &self {
|
|
ApiError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone(), None),
|
|
ApiError::Unauthorized(msg) => (StatusCode::UNAUTHORIZED, msg.clone(), None),
|
|
ApiError::Forbidden(msg) => (StatusCode::FORBIDDEN, msg.clone(), None),
|
|
ApiError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone(), None),
|
|
ApiError::Conflict(msg) => (StatusCode::CONFLICT, msg.clone(), None),
|
|
ApiError::Internal(msg) => {
|
|
tracing::error!("Internal error: {}", msg);
|
|
(StatusCode::INTERNAL_SERVER_ERROR, "Internal server error".to_string(), None)
|
|
}
|
|
ApiError::Database(e) => {
|
|
tracing::error!("Database error: {}", e);
|
|
(StatusCode::INTERNAL_SERVER_ERROR, "Database error".to_string(), None)
|
|
}
|
|
};
|
|
|
|
let body = ErrorResponse {
|
|
error: message,
|
|
code: status.as_u16(),
|
|
detail,
|
|
};
|
|
|
|
(status, Json(body)).into_response()
|
|
}
|
|
}
|
|
|
|
impl From<sqlx::Error> for ApiError {
|
|
fn from(e: sqlx::Error) -> Self {
|
|
ApiError::Database(e)
|
|
}
|
|
}
|
|
```
|
|
|
|
Gradually replace `(StatusCode, String)` return types with `Result<T, ApiError>` across all handlers.
|
|
|
|
---
|
|
|
|
## 1.4 — Extract RLS Middleware
|
|
|
|
### 1.4.1 Create RLS transaction extractor
|
|
|
|
The `BEGIN tx → SET LOCAL role → set_config` block is repeated ~15 times. Create an extractor:
|
|
|
|
**File:** Create `common/src/rls.rs`
|
|
|
|
```rust
|
|
use axum::extract::{Extension, FromRequestParts};
|
|
use auth::AuthContext;
|
|
use sqlx::{PgPool, Postgres, Transaction};
|
|
|
|
pub struct RlsTransaction {
|
|
pub tx: Transaction<'static, Postgres>,
|
|
}
|
|
|
|
impl RlsTransaction {
|
|
pub async fn begin(
|
|
pool: &PgPool,
|
|
auth_ctx: &AuthContext,
|
|
) -> Result<Self, ApiError> {
|
|
let mut tx = pool.begin().await?;
|
|
|
|
// Validate and set role
|
|
const ALLOWED_ROLES: &[&str] = &["anon", "authenticated", "service_role"];
|
|
if !ALLOWED_ROLES.contains(&auth_ctx.role.as_str()) {
|
|
return Err(ApiError::Forbidden("Invalid role".into()));
|
|
}
|
|
let role_query = format!("SET LOCAL role = '{}'", auth_ctx.role);
|
|
sqlx::query(&role_query).execute(&mut *tx).await?;
|
|
|
|
// Set JWT claims for RLS policies
|
|
if let Some(claims) = &auth_ctx.claims {
|
|
sqlx::query("SELECT set_config('request.jwt.claim.sub', $1, true)")
|
|
.bind(&claims.sub)
|
|
.execute(&mut *tx)
|
|
.await?;
|
|
}
|
|
|
|
Ok(Self { tx })
|
|
}
|
|
|
|
pub async fn commit(self) -> Result<(), ApiError> {
|
|
self.tx.commit().await.map_err(ApiError::from)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Usage in handlers:**
|
|
```rust
|
|
pub async fn list_buckets(
|
|
State(state): State<StorageState>,
|
|
Extension(auth_ctx): Extension<AuthContext>,
|
|
db: Option<Extension<PgPool>>,
|
|
) -> Result<Json<Vec<Bucket>>, ApiError> {
|
|
let pool = db.map(|Extension(p)| p).unwrap_or_else(|| state.db.clone());
|
|
let mut rls = RlsTransaction::begin(&pool, &auth_ctx).await?;
|
|
|
|
let buckets = sqlx::query_as::<_, Bucket>("SELECT * FROM storage.buckets")
|
|
.fetch_all(&mut *rls.tx)
|
|
.await?;
|
|
|
|
Ok(Json(buckets))
|
|
// tx auto-rolls back on drop (read-only is fine)
|
|
}
|
|
```
|
|
|
|
This eliminates ~150 lines of duplicated error-mapping boilerplate.
|
|
|
|
---
|
|
|
|
## Completion Requirements
|
|
|
|
This milestone is **not complete** until every item below is satisfied.
|
|
|
|
### 1. Full Test Suite — All Green
|
|
|
|
- [ ] `cargo test --workspace` passes with **zero failures**
|
|
- [ ] All **pre-existing tests** still pass (no regressions)
|
|
- [ ] **New unit tests** are written for every fix in this milestone:
|
|
|
|
| Test | Location | What it validates |
|
|
|------|----------|-------------------|
|
|
| `test_proxy_forwards_body` | `gateway/src/proxy.rs` | POST with 1MB body reaches the upstream intact |
|
|
| `test_proxy_streams_response` | `gateway/src/proxy.rs` | Large response is streamed, not buffered entirely |
|
|
| `test_proxy_round_robin` | `gateway/src/proxy.rs` | 4 requests to 2 workers distribute 2+2 |
|
|
| `test_proxy_single_http_client` | `gateway/src/proxy.rs` | `reqwest::Client` is reused (shared state, not per-request) |
|
|
| `test_worker_tracing_init` | `gateway/src/bin/worker.rs` | `RUST_LOG=debug` produces debug-level spans |
|
|
| `test_api_error_json_format` | `common/src/error.rs` | `ApiError::BadRequest("x")` serializes to `{"error":"x","code":400}` |
|
|
| `test_api_error_hides_db_detail` | `common/src/error.rs` | `ApiError::Database(e)` does not leak SQL in the response body |
|
|
| `test_rls_transaction_sets_role` | `common/src/rls.rs` | `RlsTransaction::begin()` issues `SET LOCAL role` with the auth context role |
|
|
| `test_rls_transaction_rejects_bad_role` | `common/src/rls.rs` | Role outside `[anon, authenticated, service_role]` returns `Forbidden` |
|
|
| `test_rls_transaction_sets_claims` | `common/src/rls.rs` | JWT `sub` claim is available via `current_setting('request.jwt.claim.sub')` |
|
|
|
|
### 2. Integration Verification
|
|
|
|
- [ ] `docker compose up` starts all services (db, redis, minio, worker, system, proxy) without crash-loops
|
|
- [ ] `curl -X POST http://localhost:8000/auth/v1/signup -H "apikey: <anon_key>" -d '{"email":"test@test.com","password":"password123"}'` returns a user (through the proxy)
|
|
- [ ] Large file upload (>5MB) through the proxy succeeds (body forwarding works)
|
|
- [ ] Proxy distributes requests across multiple workers (if configured)
|
|
- [ ] `RUST_LOG=debug` works in all three standalone binaries
|
|
- [ ] API errors return structured JSON, never raw SQL error messages
|
|
- [ ] `docker compose down && docker compose up` — idempotent restart with no data loss
|
|
|
|
### 3. CI Gate
|
|
|
|
- [ ] All of the above unit tests are included in `cargo test --workspace`
|
|
- [ ] No `#[ignore]` on any test added in this milestone unless it requires external services (and those must be documented)
|