Files
madbase/_milestones/M7_cicd_operability.md
Vlad Durnea a66d908eff
Some checks failed
CI / podman-build (push) Has been cancelled
CI / rust (push) Has been cancelled
chore: full stack stability and migration fixes, plus react UI progress
2026-03-18 09:01:38 +02:00

12 KiB
Raw Blame History

/Users/vlad/Developer/madapes/madbase/_milestones/M7_cicd_operability.md

1: # Milestone 7: CI/CD & Operability
2: 
3: **Goal:** Every commit is validated. Deployments are reproducible and observable.
4: 
5: **Infrastructure:** 
6: - Container runtime: Podman
7: - Container orchestration: Podman Compose
8: - CI/CD platform: Gitea Actions (git.madapes.com)
9: - Container registry: git.madapes.com
10: 
11: **Depends on:** M0 (Security), M1 (Foundation)
12: 
13: ---
14: 
15: ## 7.1 — Rust CI Pipeline
16: 
17: ### 7.1.1 Add Rust jobs to CI
18: 
19: **File:** `.gitea/workflows/ci.yml`
20: 
21: Add a new job before the existing frontend jobs:
22: 
23: ```yaml
24:   rust:
25:     runs-on: ubuntu-latest
26:     services:
27:       postgres:
28:         image: postgres:15
29:         env:
30:           POSTGRES_PASSWORD: postgres
31:         ports:
32:           - 5432:5432
33:         options: >-
34:           --health-cmd pg_isready
35:           --health-interval 10s
36:           --health-timeout 5s
37:           --health-retries 5
38:     steps:
39:       - uses: actions/checkout@v4
40: 
41:       - name: Install Rust toolchain
42:         uses: dtolnay/rust-toolchain@stable
43:         with:
44:           components: rustfmt, clippy
45: 
46:       - name: Cache cargo registry and build
47:         uses: actions/cache@v4
48:         with:
49:           path: |
50:             ~/.cargo/registry
51:             ~/.cargo/git
52:             target
53:           key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
54: 
55:       - name: Check formatting
56:         run: cargo fmt --all --check
57: 
58:       - name: Run clippy
59:         run: cargo clippy --workspace -- -D warnings
60: 
61:       - name: Build workspace
62:         run: cargo build --workspace
63: 
64:       - name: Run tests
65:         run: cargo test --workspace
66:         env:
67:           DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres
68:           JWT_SECRET: test-secret-for-ci-only-not-production
69:           DEFAULT_TENANT_DB_URL: postgres://postgres:postgres@localhost:5432/postgres
70: 
71:       - name: Verify sqlx offline data
72:         run: cargo sqlx prepare --check --workspace
73:         env:
74:           DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres
75: ```
76: 
77: ### 7.1.2 Enable sqlx offline mode
78: 
79: Run locally:
80: ```bash
81: cargo sqlx prepare --workspace
82: ```
83: 
84: This creates `.sqlx/` directory with query metadata. Check it into git. Add the CI step above to verify it stays in sync.
85: 
86: ### 7.1.3 Fix the lint job
87: 
88: **File:** `.gitea/workflows/ci.yml`
89: 
90: ```yaml
91: # BEFORE
92: run: npm run lint || true
93: 
94: # AFTER
95: run: npm run lint
96: ```
97: 
98: ### 7.1.4 Pin Gitea Actions
99: 
100: Update all `@v3` to `@v4` throughout the file:
101: - `actions/checkout@v3` → `@v4`
102: - `actions/setup-node@v3` → `@v4`
103: - `actions/upload-artifact@v3` → `@v4`
104: - `codecov/codecov-action@v3` → `@v4`
105: 
106: ### 7.1.5 Add Podman build job
107: 
108: ```yaml
109:   podman-build:
110:     runs-on: ubuntu-latest
111:     needs: rust
112:     container:
113:       image: docker.io/podman/stable:latest
114:     steps:
115:       - uses: actions/checkout@v4
116: 
117:       - name: Build gateway-runtime
118:         run: podman build --target gateway-runtime -t git.madapes.com/madbase/gateway:ci .
119: 
120:       - name: Build worker-runtime
121:         run: podman build --target worker-runtime -t git.madapes.com/madbase/worker:ci .
122: 
123:       - name: Build control-runtime
124:         run: podman build --target control-runtime -t git.madapes.com/madbase/control:ci .
125: 
126:       - name: Build proxy-runtime
127:         run: podman build --target proxy-runtime -t git.madapes.com/madbase/proxy:ci .
128: 
129:       - name: Login to registry
130:         if: github.ref == 'refs/heads/main'
131:         run: podman login git.madapes.com -u ${{ secrets.REGISTRY_USER }} -p ${{ secrets.REGISTRY_PASSWORD }}
132: 
133:       - name: Push images
134:         if: github.ref == 'refs/heads/main'
135:         run: |
136:           podman push git.madapes.com/madbase/gateway:ci
137:           podman push git.madapes.com/madbase/worker:ci
138:           podman push git.madapes.com/madbase/control:ci
139:           podman push git.madapes.com/madbase/proxy:ci
140: ```
141: 
142: ---
143: 
144: ## 7.2 — Container Improvements (Podman)
145: 
146: ### 7.2.1 Slim runtime images
147: 
148: **File:** `Dockerfile` — all runtime stages (compatible with Podman)
149: 
150: ```dockerfile
151: # BEFORE
152: FROM rust:latest AS worker-runtime
153: 
154: # AFTER — shared base
155: FROM debian:bookworm-slim AS runtime-base
156: RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
157: RUN useradd -r -s /bin/false madbase
158: 
159: FROM runtime-base AS worker-runtime
160: WORKDIR /app
161: COPY --from=builder /app/target/release/worker .
162: USER madbase
163: EXPOSE 8002
164: HEALTHCHECK --interval=10s --timeout=3s CMD curl -f http://localhost:8002/health || exit 1
165: CMD ["./worker"]
166: ```
167: 
168: ### 7.2.2 Create .containerignore
169: 
170: ```
171: .git
172: target
173: docs
174: *.md
175: env
176: scripts
177: _milestones
178: .gitea
179: control-plane-ui/node_modules
180: control-plane-ui/dist
181: ```
182: 
183: > **Note:** While `.dockerignore` also works with Podman, `.containerignore` is the modern standard that works across all OCI-compliant container runtimes.
184: 
185: ### 7.2.3 Pin image tags
186: 
187: Replace all ` :latest` tags:
188: - `cargo-chef:latest-rust-latest` → `cargo-chef:0.1.68-rust-1.77`
189: - `victoriametrics/victoria-metrics:latest` → `v1.101.0`
190: - `grafana/loki:latest` → `2.9.6`
191: - `grafana/grafana:latest` → `10.4.2`
192: - `victoriametrics/vmagent:latest` → `v1.101.0`
193: 
194: ### 7.2.4 Update compose configuration for Podman Compose
195: 
196: **File:** `compose.yaml` (or `docker-compose.yaml`)
197: 
198: Ensure compatibility with Podman Compose:
199: 
200: ```yaml
201: services:
202:   gateway:
203:     image: git.madapes.com/madbase/gateway:latest
204:     ports:
205:       - "8000:8000"
206:     environment:
207:       - DATABASE_URL=${DATABASE_URL}
208:       - JWT_SECRET=${JWT_SECRET}
209:     depends_on:
210:       - postgres
211:     restart: unless-stopped
212:     healthcheck:
213:       test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
214:       interval: 10s
215:       timeout: 3s
216:       retries: 3
217: 
218:   # ... other services ...
219: ```
220: 
221: Run with Podman Compose:
222: ```bash
223: podman-compose up -d
224: ```
225: 
226: ---
227: 
228: ## 7.3 — Observability
229: 
230: ### 7.3.1 Create config files
231: 
232: See M1 for `config/prometheus.yml` and `config/vmagent.yml` content.
233: 
234: ### 7.3.2 Request correlation IDs
235: 
236: **File:** `gateway/src/proxy.rs` — `proxy_request` function
237: 
238: ```rust
239: use uuid::Uuid;
240: 
241: // Generate or propagate request ID
242: let request_id = req.headers()
243:     .get("x-request-id")
244:     .and_then(|v| v.to_str().ok())
245:     .map(|s| s.to_string())
246:     .unwrap_or_else(|| Uuid::new_v4().to_string());
247: 
248: // Add to proxied request
249: request_builder = request_builder.header("x-request-id", &request_id);
250: 
251: // Add to response
252: response_builder = response_builder.header("x-request-id", &request_id);
253: ```
254: 
255: Use `tracing::Span` with the request ID for log correlation:
256: ```rust
257: let span = tracing::info_span!("request", id = %request_id);
258: ```
259: 
260: ### 7.3.3 OpenTelemetry tracing
261: 
262: Add dependencies:
263: ```toml
264: opentelemetry = "0.22"
265: opentelemetry-otlp = "0.15"
266: tracing-opentelemetry = "0.23"
267: ```
268: 
269: Initialize in `gateway/src/main.rs`:
270: ```rust
271: if let Ok(otlp_endpoint) = std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT") {
272:     let tracer = opentelemetry_otlp::new_pipeline()
273:         .tracing()
274:         .with_exporter(opentelemetry_otlp::new_exporter().tonic().with_endpoint(otlp_endpoint))
275:         .install_batch(opentelemetry_sdk::runtime::Tokio)?;
276: 
277:     let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
278:     // Add to the subscriber registry
279: }
280: ```
281: 
282: ### 7.3.4 Alerting rules
283: 
284: Create `config/alerts.yml` for Grafana alerting or VictoriaMetrics vmalert:
285: 
286: ```yaml
287: groups:
288:   - name: madbase
289:     rules:
290:       - alert: ServiceDown
291:         expr: up == 0
292:         for: 1m
293:         labels:
294:           severity: critical
295: 
296:       - alert: HighErrorRate
297:         expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
298:         for: 5m
299:         labels:
300:           severity: warning
301: 
302:       - alert: HighLatency
303:         expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
304:         for: 5m
305:         labels:
306:           severity: warning
307: ```
308: 
309: ---
310: 
311: ## Completion Requirements
312: 
313: This milestone is **not complete** until every item below is satisfied.
314: 
315: ### 1. Full Test Suite — All Green
316: 
317: - [ ] `cargo test --workspace` passes with **zero failures**
318: - [ ] `cargo fmt --all -- --check` passes (no formatting issues)
319: - [ ] `cargo clippy --workspace -- -D warnings` passes (no warnings)
320: - [ ] `cargo sqlx prepare --check` passes (offline query data is up to date)
321: - [ ] All **pre-existing tests** still pass (no regressions)
322: - [ ] **New tests** are written for CI/operability features:
323: 
324: | Test | Location | What it validates |
325: |------|----------|-------------------|
326: | `test_request_id_middleware` | `gateway/src/middleware.rs` | Request without `X-Request-Id` gets one generated; request with one keeps it |
327: | `test_request_id_propagated` | `gateway/src/proxy.rs` | `X-Request-Id` from proxy request appears in upstream headers |
328: | `test_health_endpoint_worker` | `gateway/src/bin/worker.rs` | `GET /health` returns 200 with JSON status |
329: | `test_health_endpoint_system` | `gateway/src/bin/system.rs` | `GET /health` returns 200 with JSON status |
330: | `test_health_endpoint_proxy` | `gateway/src/bin/proxy.rs` | `GET /health` returns 200 with JSON status |
331: | `test_podman_build_proxy` | `.gitea/workflows/ci.yml` | Podman build target `proxy-runtime` succeeds (CI job) |
332: | `test_podman_build_worker` | `.gitea/workflows/ci.yml` | Podman build target `worker-runtime` succeeds (CI job) |
333: | `test_podman_build_control` | `.gitea/workflows/ci.yml` | Podman build target `control-runtime` succeeds (CI job) |
334: 
335: ### 2. CI Pipeline Verification
336: 
337: - [ ] CI passes on a clean PR: `cargo fmt`, `cargo clippy`, `cargo build`, `cargo test` all green
338: - [ ] `cargo sqlx prepare --check` passes in CI
339: - [ ] Podman build succeeds for all 4 targets (proxy, worker, control, functions)
340: - [ ] CI caches Rust build artifacts (via `actions-rust-lang/setup-rust-toolchain` or `Swatinem/rust-cache`)
341: - [ ] CI runs in under 15 minutes for a clean build
342: - [ ] Images are successfully pushed to `git.madapes.com` on main branch
343: 
344: ### 3. Podman / Operability Verification
345: 
346: - [ ] Runtime images are under 200MB each (down from ~1.5GB)
347: - [ ] Containers run as non-root user (`USER madbase`)
348: - [ ] `podman inspect <image>` shows a `HEALTHCHECK` for each runtime image
349: - [ ] `.containerignore` exists and excludes `target/`, `.git/`, `env/`, `_milestones/`, `docs/`
350: - [ ] All container image tags are pinned (no ` :latest` in Dockerfile)
351: - [ ] `podman-compose up -d` successfully starts all services
352: - [ ] Images can be pulled from `git.madapes.com` in production
353: 
354: ### 4. Observability Verification
355: 
356: - [ ] `X-Request-Id` header appears in proxy responses
357: - [ ] Logs contain structured JSON with request IDs (verify via `podman logs proxy | jq .`)
358: - [ ] Prometheus/VictoriaMetrics scrapes metrics from all services
359: - [ ] Grafana dashboards show request rate, latency p50/p95/p99, error rate
360: - [ ] Alerting rules fire for: service down >1min, error rate >5%, p99 latency >2s
361: 
362: ### 5. CI Gate
363: 
364: - [ ] The CI workflow itself is the gate — this milestone's success means CI is the gatekeeper for all future milestones
365: - [ ] All milestones M0M6 tests pass in the CI pipeline retroactively
366: - [ ] Gitea Actions workflows are properly configured with secrets for registry access