# S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally) ## Goals - Add S3-compatible object storage as an optional infrastructure dependency. - Use Hetzner Object Storage in production (S3 API compatible). - Use MinIO for local development to mirror production behavior. - Start by moving observability storage (Loki + Tempo) to object storage, keeping local filesystem as the default fallback. ## Scope (Phase 1) ### Observability (Primary) - Loki: store chunks/index in S3-compatible object storage. - Tempo: store traces in S3-compatible object storage. ### Local Dev Parity - Add MinIO to local compose and provide a documented way to provision required buckets. ## Non-Goals (Phase 1) - Replacing MDBX/KV primary service storage with S3. - Implementing multi-region replication, object-lock governance, or WORM retention. - Centralized artifact storage for deployments (can be a follow-on). ## Target Architecture - Local: - `docker compose up` uses filesystem/local volumes by default. - `docker compose -f docker-compose.yml -f docker-compose.s3.yml -f observability/docker-compose.yml -f observability/docker-compose.s3.yml up` enables MinIO-backed Loki/Tempo. - Production: - Loki + Tempo configured for S3 with Hetzner endpoint. - Credentials injected via Swarm secrets or environment injection (never committed). ## Configuration Model Define a single configuration surface for “S3-compatible storage” and reuse it across Loki/Tempo and future features. ### Common Settings - Endpoint: `S3_ENDPOINT` (e.g., `https://.your-objectstorage.com`) - Region: `S3_REGION` (string; Hetzner typically requires a region value) - Access key: `S3_ACCESS_KEY_ID` (secret) - Secret key: `S3_SECRET_ACCESS_KEY` (secret) - Force path-style: `S3_FORCE_PATH_STYLE` (`true/false`, depends on provider) - TLS: enabled by default; allow `S3_INSECURE=true` only for local MinIO if needed - Prefixes: - `S3_PREFIX_LOKI` (e.g., `loki/`) - `S3_PREFIX_TEMPO` (e.g., `tempo/`) ### Buckets - `S3_BUCKET_LOKI` - `S3_BUCKET_TEMPO` ## Local Dev: MinIO ### Compose Additions - Add a `minio` service (console + API ports). - Add a `minio-init` one-shot job (or `mc` container) to create buckets: - `cloudlysis-loki` - `cloudlysis-tempo` ### Developer Workflow - Default (no S3): - `docker compose -f docker-compose.yml -f observability/docker-compose.yml up -d --build` - S3-enabled (MinIO): - bring up MinIO + observability S3 overrides - verify Loki/Tempo can write objects (logs/traces show up and buckets have objects) ## Production: Hetzner Object Storage ### Provisioning - Create buckets for Loki and Tempo (or a shared bucket with distinct prefixes). - Enable bucket-level lifecycle policies: - Loki: retention aligned with schema/index period and desired log retention. - Tempo: retention aligned with `compactor.block_retention` and operational needs. ### Secrets - Store `S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY` as Swarm secrets. - Inject into Loki/Tempo containers as environment variables at runtime. ### Operational Considerations - Timeouts and retries: rely on Loki/Tempo defaults; tune only after measuring. - Cost controls: lifecycle rules and retention budgets. - Failure mode: if S3 is unavailable, Loki/Tempo ingest may degrade; decide whether to fail-closed (strict) or allow temporary local buffering. ## Implementation Plan (Milestones) ### Milestone A: Local MinIO Baseline - Add `docker-compose.s3.yml`: - MinIO - minio-init bucket provisioning - Add `docs/` (or wiki) instructions for enabling S3 mode locally. - Add a gated smoke test script (manual) verifying buckets exist and can be listed. ### Milestone B: Loki S3 Backend - Add `observability/docker-compose.s3.yml` enabling Loki S3 config. - Add `observability/loki/config.s3.yml` (separate from default filesystem config). - Validate: - logs are queryable in Grafana - Loki writes objects to the bucket/prefix ### Milestone C: Tempo S3 Backend - Add `observability/docker-compose.s3.yml` enabling Tempo S3 config. - Add `observability/tempo/config.s3.yml` (separate from default local config). - Validate: - traces appear in Tempo - objects are written to the bucket/prefix ### Milestone D: Production Rollout - Add Swarm stack overlays or configs for Loki/Tempo S3 mode: - S3 endpoint + region + credentials as secrets - bucket/prefix configuration - Provide a rollback plan: - switch back to filesystem/local (requires persistent volumes for continuity) ## Testing and Verification - Workspace: - `cargo fmt --check` - `cargo clippy --workspace --all-targets -- -D warnings` - `cargo test --workspace` - `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build` - Local S3 validation (manual, documented): - MinIO buckets created - Loki bucket contains objects after ingest - Tempo bucket contains objects after ingest ## Follow-On Opportunities (Phase 2) - Backup/restore to S3 for MDBX data directories (Aggregate/Projection/Runner/Gateway). - Artifact storage (projection programs, definitions, deployment bundles) via S3 with signed URLs. - Multi-tenant isolation at the bucket/prefix policy level.