### /Users/vlad/Developer/madapes/madbase/_milestones/M7_cicd_operability.md ```markdown 1: # Milestone 7: CI/CD & Operability 2: 3: **Goal:** Every commit is validated. Deployments are reproducible and observable. 4: 5: **Infrastructure:** 6: - Container runtime: Podman 7: - Container orchestration: Podman Compose 8: - CI/CD platform: Gitea Actions (git.madapes.com) 9: - Container registry: git.madapes.com 10: 11: **Depends on:** M0 (Security), M1 (Foundation) 12: 13: --- 14: 15: ## 7.1 — Rust CI Pipeline 16: 17: ### 7.1.1 Add Rust jobs to CI 18: 19: **File:** `.gitea/workflows/ci.yml` 20: 21: Add a new job before the existing frontend jobs: 22: 23: ```yaml 24: rust: 25: runs-on: ubuntu-latest 26: services: 27: postgres: 28: image: postgres:15 29: env: 30: POSTGRES_PASSWORD: postgres 31: ports: 32: - 5432:5432 33: options: >- 34: --health-cmd pg_isready 35: --health-interval 10s 36: --health-timeout 5s 37: --health-retries 5 38: steps: 39: - uses: actions/checkout@v4 40: 41: - name: Install Rust toolchain 42: uses: dtolnay/rust-toolchain@stable 43: with: 44: components: rustfmt, clippy 45: 46: - name: Cache cargo registry and build 47: uses: actions/cache@v4 48: with: 49: path: | 50: ~/.cargo/registry 51: ~/.cargo/git 52: target 53: key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }} 54: 55: - name: Check formatting 56: run: cargo fmt --all --check 57: 58: - name: Run clippy 59: run: cargo clippy --workspace -- -D warnings 60: 61: - name: Build workspace 62: run: cargo build --workspace 63: 64: - name: Run tests 65: run: cargo test --workspace 66: env: 67: DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres 68: JWT_SECRET: test-secret-for-ci-only-not-production 69: DEFAULT_TENANT_DB_URL: postgres://postgres:postgres@localhost:5432/postgres 70: 71: - name: Verify sqlx offline data 72: run: cargo sqlx prepare --check --workspace 73: env: 74: DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres 75: ``` 76: 77: ### 7.1.2 Enable sqlx offline mode 78: 79: Run locally: 80: ```bash 81: cargo sqlx prepare --workspace 82: ``` 83: 84: This creates `.sqlx/` directory with query metadata. Check it into git. Add the CI step above to verify it stays in sync. 85: 86: ### 7.1.3 Fix the lint job 87: 88: **File:** `.gitea/workflows/ci.yml` 89: 90: ```yaml 91: # BEFORE 92: run: npm run lint || true 93: 94: # AFTER 95: run: npm run lint 96: ``` 97: 98: ### 7.1.4 Pin Gitea Actions 99: 100: Update all `@v3` to `@v4` throughout the file: 101: - `actions/checkout@v3` → `@v4` 102: - `actions/setup-node@v3` → `@v4` 103: - `actions/upload-artifact@v3` → `@v4` 104: - `codecov/codecov-action@v3` → `@v4` 105: 106: ### 7.1.5 Add Podman build job 107: 108: ```yaml 109: podman-build: 110: runs-on: ubuntu-latest 111: needs: rust 112: container: 113: image: docker.io/podman/stable:latest 114: steps: 115: - uses: actions/checkout@v4 116: 117: - name: Build gateway-runtime 118: run: podman build --target gateway-runtime -t git.madapes.com/madbase/gateway:ci . 119: 120: - name: Build worker-runtime 121: run: podman build --target worker-runtime -t git.madapes.com/madbase/worker:ci . 122: 123: - name: Build control-runtime 124: run: podman build --target control-runtime -t git.madapes.com/madbase/control:ci . 125: 126: - name: Build proxy-runtime 127: run: podman build --target proxy-runtime -t git.madapes.com/madbase/proxy:ci . 128: 129: - name: Login to registry 130: if: github.ref == 'refs/heads/main' 131: run: podman login git.madapes.com -u ${{ secrets.REGISTRY_USER }} -p ${{ secrets.REGISTRY_PASSWORD }} 132: 133: - name: Push images 134: if: github.ref == 'refs/heads/main' 135: run: | 136: podman push git.madapes.com/madbase/gateway:ci 137: podman push git.madapes.com/madbase/worker:ci 138: podman push git.madapes.com/madbase/control:ci 139: podman push git.madapes.com/madbase/proxy:ci 140: ``` 141: 142: --- 143: 144: ## 7.2 — Container Improvements (Podman) 145: 146: ### 7.2.1 Slim runtime images 147: 148: **File:** `Dockerfile` — all runtime stages (compatible with Podman) 149: 150: ```dockerfile 151: # BEFORE 152: FROM rust:latest AS worker-runtime 153: 154: # AFTER — shared base 155: FROM debian:bookworm-slim AS runtime-base 156: RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/* 157: RUN useradd -r -s /bin/false madbase 158: 159: FROM runtime-base AS worker-runtime 160: WORKDIR /app 161: COPY --from=builder /app/target/release/worker . 162: USER madbase 163: EXPOSE 8002 164: HEALTHCHECK --interval=10s --timeout=3s CMD curl -f http://localhost:8002/health || exit 1 165: CMD ["./worker"] 166: ``` 167: 168: ### 7.2.2 Create .containerignore 169: 170: ``` 171: .git 172: target 173: docs 174: *.md 175: env 176: scripts 177: _milestones 178: .gitea 179: control-plane-ui/node_modules 180: control-plane-ui/dist 181: ``` 182: 183: > **Note:** While `.dockerignore` also works with Podman, `.containerignore` is the modern standard that works across all OCI-compliant container runtimes. 184: 185: ### 7.2.3 Pin image tags 186: 187: Replace all ` :latest` tags: 188: - `cargo-chef:latest-rust-latest` → `cargo-chef:0.1.68-rust-1.77` 189: - `victoriametrics/victoria-metrics:latest` → `v1.101.0` 190: - `grafana/loki:latest` → `2.9.6` 191: - `grafana/grafana:latest` → `10.4.2` 192: - `victoriametrics/vmagent:latest` → `v1.101.0` 193: 194: ### 7.2.4 Update compose configuration for Podman Compose 195: 196: **File:** `compose.yaml` (or `docker-compose.yaml`) 197: 198: Ensure compatibility with Podman Compose: 199: 200: ```yaml 201: services: 202: gateway: 203: image: git.madapes.com/madbase/gateway:latest 204: ports: 205: - "8000:8000" 206: environment: 207: - DATABASE_URL=${DATABASE_URL} 208: - JWT_SECRET=${JWT_SECRET} 209: depends_on: 210: - postgres 211: restart: unless-stopped 212: healthcheck: 213: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] 214: interval: 10s 215: timeout: 3s 216: retries: 3 217: 218: # ... other services ... 219: ``` 220: 221: Run with Podman Compose: 222: ```bash 223: podman-compose up -d 224: ``` 225: 226: --- 227: 228: ## 7.3 — Observability 229: 230: ### 7.3.1 Create config files 231: 232: See M1 for `config/prometheus.yml` and `config/vmagent.yml` content. 233: 234: ### 7.3.2 Request correlation IDs 235: 236: **File:** `gateway/src/proxy.rs` — `proxy_request` function 237: 238: ```rust 239: use uuid::Uuid; 240: 241: // Generate or propagate request ID 242: let request_id = req.headers() 243: .get("x-request-id") 244: .and_then(|v| v.to_str().ok()) 245: .map(|s| s.to_string()) 246: .unwrap_or_else(|| Uuid::new_v4().to_string()); 247: 248: // Add to proxied request 249: request_builder = request_builder.header("x-request-id", &request_id); 250: 251: // Add to response 252: response_builder = response_builder.header("x-request-id", &request_id); 253: ``` 254: 255: Use `tracing::Span` with the request ID for log correlation: 256: ```rust 257: let span = tracing::info_span!("request", id = %request_id); 258: ``` 259: 260: ### 7.3.3 OpenTelemetry tracing 261: 262: Add dependencies: 263: ```toml 264: opentelemetry = "0.22" 265: opentelemetry-otlp = "0.15" 266: tracing-opentelemetry = "0.23" 267: ``` 268: 269: Initialize in `gateway/src/main.rs`: 270: ```rust 271: if let Ok(otlp_endpoint) = std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT") { 272: let tracer = opentelemetry_otlp::new_pipeline() 273: .tracing() 274: .with_exporter(opentelemetry_otlp::new_exporter().tonic().with_endpoint(otlp_endpoint)) 275: .install_batch(opentelemetry_sdk::runtime::Tokio)?; 276: 277: let telemetry = tracing_opentelemetry::layer().with_tracer(tracer); 278: // Add to the subscriber registry 279: } 280: ``` 281: 282: ### 7.3.4 Alerting rules 283: 284: Create `config/alerts.yml` for Grafana alerting or VictoriaMetrics vmalert: 285: 286: ```yaml 287: groups: 288: - name: madbase 289: rules: 290: - alert: ServiceDown 291: expr: up == 0 292: for: 1m 293: labels: 294: severity: critical 295: 296: - alert: HighErrorRate 297: expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1 298: for: 5m 299: labels: 300: severity: warning 301: 302: - alert: HighLatency 303: expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2 304: for: 5m 305: labels: 306: severity: warning 307: ``` 308: 309: --- 310: 311: ## Completion Requirements 312: 313: This milestone is **not complete** until every item below is satisfied. 314: 315: ### 1. Full Test Suite — All Green 316: 317: - [ ] `cargo test --workspace` passes with **zero failures** 318: - [ ] `cargo fmt --all -- --check` passes (no formatting issues) 319: - [ ] `cargo clippy --workspace -- -D warnings` passes (no warnings) 320: - [ ] `cargo sqlx prepare --check` passes (offline query data is up to date) 321: - [ ] All **pre-existing tests** still pass (no regressions) 322: - [ ] **New tests** are written for CI/operability features: 323: 324: | Test | Location | What it validates | 325: |------|----------|-------------------| 326: | `test_request_id_middleware` | `gateway/src/middleware.rs` | Request without `X-Request-Id` gets one generated; request with one keeps it | 327: | `test_request_id_propagated` | `gateway/src/proxy.rs` | `X-Request-Id` from proxy request appears in upstream headers | 328: | `test_health_endpoint_worker` | `gateway/src/bin/worker.rs` | `GET /health` returns 200 with JSON status | 329: | `test_health_endpoint_system` | `gateway/src/bin/system.rs` | `GET /health` returns 200 with JSON status | 330: | `test_health_endpoint_proxy` | `gateway/src/bin/proxy.rs` | `GET /health` returns 200 with JSON status | 331: | `test_podman_build_proxy` | `.gitea/workflows/ci.yml` | Podman build target `proxy-runtime` succeeds (CI job) | 332: | `test_podman_build_worker` | `.gitea/workflows/ci.yml` | Podman build target `worker-runtime` succeeds (CI job) | 333: | `test_podman_build_control` | `.gitea/workflows/ci.yml` | Podman build target `control-runtime` succeeds (CI job) | 334: 335: ### 2. CI Pipeline Verification 336: 337: - [ ] CI passes on a clean PR: `cargo fmt`, `cargo clippy`, `cargo build`, `cargo test` all green 338: - [ ] `cargo sqlx prepare --check` passes in CI 339: - [ ] Podman build succeeds for all 4 targets (proxy, worker, control, functions) 340: - [ ] CI caches Rust build artifacts (via `actions-rust-lang/setup-rust-toolchain` or `Swatinem/rust-cache`) 341: - [ ] CI runs in under 15 minutes for a clean build 342: - [ ] Images are successfully pushed to `git.madapes.com` on main branch 343: 344: ### 3. Podman / Operability Verification 345: 346: - [ ] Runtime images are under 200MB each (down from ~1.5GB) 347: - [ ] Containers run as non-root user (`USER madbase`) 348: - [ ] `podman inspect ` shows a `HEALTHCHECK` for each runtime image 349: - [ ] `.containerignore` exists and excludes `target/`, `.git/`, `env/`, `_milestones/`, `docs/` 350: - [ ] All container image tags are pinned (no ` :latest` in Dockerfile) 351: - [ ] `podman-compose up -d` successfully starts all services 352: - [ ] Images can be pulled from `git.madapes.com` in production 353: 354: ### 4. Observability Verification 355: 356: - [ ] `X-Request-Id` header appears in proxy responses 357: - [ ] Logs contain structured JSON with request IDs (verify via `podman logs proxy | jq .`) 358: - [ ] Prometheus/VictoriaMetrics scrapes metrics from all services 359: - [ ] Grafana dashboards show request rate, latency p50/p95/p99, error rate 360: - [ ] Alerting rules fire for: service down >1min, error rate >5%, p99 latency >2s 361: 362: ### 5. CI Gate 363: 364: - [ ] The CI workflow itself is the gate — this milestone's success means CI is the gatekeeper for all future milestones 365: - [ ] All milestones M0–M6 tests pass in the CI pipeline retroactively 366: - [ ] Gitea Actions workflows are properly configured with secrets for registry access ```