Files
madbase/_milestones/M7_cicd_operability.md
Vlad Durnea a66d908eff
Some checks failed
CI / podman-build (push) Has been cancelled
CI / rust (push) Has been cancelled
chore: full stack stability and migration fixes, plus react UI progress
2026-03-18 09:01:38 +02:00

370 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
### /Users/vlad/Developer/madapes/madbase/_milestones/M7_cicd_operability.md
```markdown
1: # Milestone 7: CI/CD & Operability
2:
3: **Goal:** Every commit is validated. Deployments are reproducible and observable.
4:
5: **Infrastructure:**
6: - Container runtime: Podman
7: - Container orchestration: Podman Compose
8: - CI/CD platform: Gitea Actions (git.madapes.com)
9: - Container registry: git.madapes.com
10:
11: **Depends on:** M0 (Security), M1 (Foundation)
12:
13: ---
14:
15: ## 7.1 — Rust CI Pipeline
16:
17: ### 7.1.1 Add Rust jobs to CI
18:
19: **File:** `.gitea/workflows/ci.yml`
20:
21: Add a new job before the existing frontend jobs:
22:
23: ```yaml
24: rust:
25: runs-on: ubuntu-latest
26: services:
27: postgres:
28: image: postgres:15
29: env:
30: POSTGRES_PASSWORD: postgres
31: ports:
32: - 5432:5432
33: options: >-
34: --health-cmd pg_isready
35: --health-interval 10s
36: --health-timeout 5s
37: --health-retries 5
38: steps:
39: - uses: actions/checkout@v4
40:
41: - name: Install Rust toolchain
42: uses: dtolnay/rust-toolchain@stable
43: with:
44: components: rustfmt, clippy
45:
46: - name: Cache cargo registry and build
47: uses: actions/cache@v4
48: with:
49: path: |
50: ~/.cargo/registry
51: ~/.cargo/git
52: target
53: key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
54:
55: - name: Check formatting
56: run: cargo fmt --all --check
57:
58: - name: Run clippy
59: run: cargo clippy --workspace -- -D warnings
60:
61: - name: Build workspace
62: run: cargo build --workspace
63:
64: - name: Run tests
65: run: cargo test --workspace
66: env:
67: DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres
68: JWT_SECRET: test-secret-for-ci-only-not-production
69: DEFAULT_TENANT_DB_URL: postgres://postgres:postgres@localhost:5432/postgres
70:
71: - name: Verify sqlx offline data
72: run: cargo sqlx prepare --check --workspace
73: env:
74: DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres
75: ```
76:
77: ### 7.1.2 Enable sqlx offline mode
78:
79: Run locally:
80: ```bash
81: cargo sqlx prepare --workspace
82: ```
83:
84: This creates `.sqlx/` directory with query metadata. Check it into git. Add the CI step above to verify it stays in sync.
85:
86: ### 7.1.3 Fix the lint job
87:
88: **File:** `.gitea/workflows/ci.yml`
89:
90: ```yaml
91: # BEFORE
92: run: npm run lint || true
93:
94: # AFTER
95: run: npm run lint
96: ```
97:
98: ### 7.1.4 Pin Gitea Actions
99:
100: Update all `@v3` to `@v4` throughout the file:
101: - `actions/checkout@v3` → `@v4`
102: - `actions/setup-node@v3` → `@v4`
103: - `actions/upload-artifact@v3` → `@v4`
104: - `codecov/codecov-action@v3` → `@v4`
105:
106: ### 7.1.5 Add Podman build job
107:
108: ```yaml
109: podman-build:
110: runs-on: ubuntu-latest
111: needs: rust
112: container:
113: image: docker.io/podman/stable:latest
114: steps:
115: - uses: actions/checkout@v4
116:
117: - name: Build gateway-runtime
118: run: podman build --target gateway-runtime -t git.madapes.com/madbase/gateway:ci .
119:
120: - name: Build worker-runtime
121: run: podman build --target worker-runtime -t git.madapes.com/madbase/worker:ci .
122:
123: - name: Build control-runtime
124: run: podman build --target control-runtime -t git.madapes.com/madbase/control:ci .
125:
126: - name: Build proxy-runtime
127: run: podman build --target proxy-runtime -t git.madapes.com/madbase/proxy:ci .
128:
129: - name: Login to registry
130: if: github.ref == 'refs/heads/main'
131: run: podman login git.madapes.com -u ${{ secrets.REGISTRY_USER }} -p ${{ secrets.REGISTRY_PASSWORD }}
132:
133: - name: Push images
134: if: github.ref == 'refs/heads/main'
135: run: |
136: podman push git.madapes.com/madbase/gateway:ci
137: podman push git.madapes.com/madbase/worker:ci
138: podman push git.madapes.com/madbase/control:ci
139: podman push git.madapes.com/madbase/proxy:ci
140: ```
141:
142: ---
143:
144: ## 7.2 — Container Improvements (Podman)
145:
146: ### 7.2.1 Slim runtime images
147:
148: **File:** `Dockerfile` — all runtime stages (compatible with Podman)
149:
150: ```dockerfile
151: # BEFORE
152: FROM rust:latest AS worker-runtime
153:
154: # AFTER — shared base
155: FROM debian:bookworm-slim AS runtime-base
156: RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
157: RUN useradd -r -s /bin/false madbase
158:
159: FROM runtime-base AS worker-runtime
160: WORKDIR /app
161: COPY --from=builder /app/target/release/worker .
162: USER madbase
163: EXPOSE 8002
164: HEALTHCHECK --interval=10s --timeout=3s CMD curl -f http://localhost:8002/health || exit 1
165: CMD ["./worker"]
166: ```
167:
168: ### 7.2.2 Create .containerignore
169:
170: ```
171: .git
172: target
173: docs
174: *.md
175: env
176: scripts
177: _milestones
178: .gitea
179: control-plane-ui/node_modules
180: control-plane-ui/dist
181: ```
182:
183: > **Note:** While `.dockerignore` also works with Podman, `.containerignore` is the modern standard that works across all OCI-compliant container runtimes.
184:
185: ### 7.2.3 Pin image tags
186:
187: Replace all ` :latest` tags:
188: - `cargo-chef:latest-rust-latest` → `cargo-chef:0.1.68-rust-1.77`
189: - `victoriametrics/victoria-metrics:latest` → `v1.101.0`
190: - `grafana/loki:latest` → `2.9.6`
191: - `grafana/grafana:latest` → `10.4.2`
192: - `victoriametrics/vmagent:latest` → `v1.101.0`
193:
194: ### 7.2.4 Update compose configuration for Podman Compose
195:
196: **File:** `compose.yaml` (or `docker-compose.yaml`)
197:
198: Ensure compatibility with Podman Compose:
199:
200: ```yaml
201: services:
202: gateway:
203: image: git.madapes.com/madbase/gateway:latest
204: ports:
205: - "8000:8000"
206: environment:
207: - DATABASE_URL=${DATABASE_URL}
208: - JWT_SECRET=${JWT_SECRET}
209: depends_on:
210: - postgres
211: restart: unless-stopped
212: healthcheck:
213: test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
214: interval: 10s
215: timeout: 3s
216: retries: 3
217:
218: # ... other services ...
219: ```
220:
221: Run with Podman Compose:
222: ```bash
223: podman-compose up -d
224: ```
225:
226: ---
227:
228: ## 7.3 — Observability
229:
230: ### 7.3.1 Create config files
231:
232: See M1 for `config/prometheus.yml` and `config/vmagent.yml` content.
233:
234: ### 7.3.2 Request correlation IDs
235:
236: **File:** `gateway/src/proxy.rs` — `proxy_request` function
237:
238: ```rust
239: use uuid::Uuid;
240:
241: // Generate or propagate request ID
242: let request_id = req.headers()
243: .get("x-request-id")
244: .and_then(|v| v.to_str().ok())
245: .map(|s| s.to_string())
246: .unwrap_or_else(|| Uuid::new_v4().to_string());
247:
248: // Add to proxied request
249: request_builder = request_builder.header("x-request-id", &request_id);
250:
251: // Add to response
252: response_builder = response_builder.header("x-request-id", &request_id);
253: ```
254:
255: Use `tracing::Span` with the request ID for log correlation:
256: ```rust
257: let span = tracing::info_span!("request", id = %request_id);
258: ```
259:
260: ### 7.3.3 OpenTelemetry tracing
261:
262: Add dependencies:
263: ```toml
264: opentelemetry = "0.22"
265: opentelemetry-otlp = "0.15"
266: tracing-opentelemetry = "0.23"
267: ```
268:
269: Initialize in `gateway/src/main.rs`:
270: ```rust
271: if let Ok(otlp_endpoint) = std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT") {
272: let tracer = opentelemetry_otlp::new_pipeline()
273: .tracing()
274: .with_exporter(opentelemetry_otlp::new_exporter().tonic().with_endpoint(otlp_endpoint))
275: .install_batch(opentelemetry_sdk::runtime::Tokio)?;
276:
277: let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
278: // Add to the subscriber registry
279: }
280: ```
281:
282: ### 7.3.4 Alerting rules
283:
284: Create `config/alerts.yml` for Grafana alerting or VictoriaMetrics vmalert:
285:
286: ```yaml
287: groups:
288: - name: madbase
289: rules:
290: - alert: ServiceDown
291: expr: up == 0
292: for: 1m
293: labels:
294: severity: critical
295:
296: - alert: HighErrorRate
297: expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
298: for: 5m
299: labels:
300: severity: warning
301:
302: - alert: HighLatency
303: expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
304: for: 5m
305: labels:
306: severity: warning
307: ```
308:
309: ---
310:
311: ## Completion Requirements
312:
313: This milestone is **not complete** until every item below is satisfied.
314:
315: ### 1. Full Test Suite — All Green
316:
317: - [ ] `cargo test --workspace` passes with **zero failures**
318: - [ ] `cargo fmt --all -- --check` passes (no formatting issues)
319: - [ ] `cargo clippy --workspace -- -D warnings` passes (no warnings)
320: - [ ] `cargo sqlx prepare --check` passes (offline query data is up to date)
321: - [ ] All **pre-existing tests** still pass (no regressions)
322: - [ ] **New tests** are written for CI/operability features:
323:
324: | Test | Location | What it validates |
325: |------|----------|-------------------|
326: | `test_request_id_middleware` | `gateway/src/middleware.rs` | Request without `X-Request-Id` gets one generated; request with one keeps it |
327: | `test_request_id_propagated` | `gateway/src/proxy.rs` | `X-Request-Id` from proxy request appears in upstream headers |
328: | `test_health_endpoint_worker` | `gateway/src/bin/worker.rs` | `GET /health` returns 200 with JSON status |
329: | `test_health_endpoint_system` | `gateway/src/bin/system.rs` | `GET /health` returns 200 with JSON status |
330: | `test_health_endpoint_proxy` | `gateway/src/bin/proxy.rs` | `GET /health` returns 200 with JSON status |
331: | `test_podman_build_proxy` | `.gitea/workflows/ci.yml` | Podman build target `proxy-runtime` succeeds (CI job) |
332: | `test_podman_build_worker` | `.gitea/workflows/ci.yml` | Podman build target `worker-runtime` succeeds (CI job) |
333: | `test_podman_build_control` | `.gitea/workflows/ci.yml` | Podman build target `control-runtime` succeeds (CI job) |
334:
335: ### 2. CI Pipeline Verification
336:
337: - [ ] CI passes on a clean PR: `cargo fmt`, `cargo clippy`, `cargo build`, `cargo test` all green
338: - [ ] `cargo sqlx prepare --check` passes in CI
339: - [ ] Podman build succeeds for all 4 targets (proxy, worker, control, functions)
340: - [ ] CI caches Rust build artifacts (via `actions-rust-lang/setup-rust-toolchain` or `Swatinem/rust-cache`)
341: - [ ] CI runs in under 15 minutes for a clean build
342: - [ ] Images are successfully pushed to `git.madapes.com` on main branch
343:
344: ### 3. Podman / Operability Verification
345:
346: - [ ] Runtime images are under 200MB each (down from ~1.5GB)
347: - [ ] Containers run as non-root user (`USER madbase`)
348: - [ ] `podman inspect <image>` shows a `HEALTHCHECK` for each runtime image
349: - [ ] `.containerignore` exists and excludes `target/`, `.git/`, `env/`, `_milestones/`, `docs/`
350: - [ ] All container image tags are pinned (no ` :latest` in Dockerfile)
351: - [ ] `podman-compose up -d` successfully starts all services
352: - [ ] Images can be pulled from `git.madapes.com` in production
353:
354: ### 4. Observability Verification
355:
356: - [ ] `X-Request-Id` header appears in proxy responses
357: - [ ] Logs contain structured JSON with request IDs (verify via `podman logs proxy | jq .`)
358: - [ ] Prometheus/VictoriaMetrics scrapes metrics from all services
359: - [ ] Grafana dashboards show request rate, latency p50/p95/p99, error rate
360: - [ ] Alerting rules fire for: service down >1min, error rate >5%, p99 latency >2s
361:
362: ### 5. CI Gate
363:
364: - [ ] The CI workflow itself is the gate — this milestone's success means CI is the gatekeeper for all future milestones
365: - [ ] All milestones M0M6 tests pass in the CI pipeline retroactively
366: - [ ] Gitea Actions workflows are properly configured with secrets for registry access
```