5.1 KiB
5.1 KiB
S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally)
Goals
- Add S3-compatible object storage as an optional infrastructure dependency.
- Use Hetzner Object Storage in production (S3 API compatible).
- Use MinIO for local development to mirror production behavior.
- Start by moving observability storage (Loki + Tempo) to object storage, keeping local filesystem as the default fallback.
Scope (Phase 1)
Observability (Primary)
- Loki: store chunks/index in S3-compatible object storage.
- Tempo: store traces in S3-compatible object storage.
Local Dev Parity
- Add MinIO to local compose and provide a documented way to provision required buckets.
Non-Goals (Phase 1)
- Replacing MDBX/KV primary service storage with S3.
- Implementing multi-region replication, object-lock governance, or WORM retention.
- Centralized artifact storage for deployments (can be a follow-on).
Target Architecture
- Local:
docker compose upuses filesystem/local volumes by default.docker compose -f docker-compose.yml -f docker-compose.s3.yml -f observability/docker-compose.yml -f observability/docker-compose.s3.yml upenables MinIO-backed Loki/Tempo.
- Production:
- Loki + Tempo configured for S3 with Hetzner endpoint.
- Credentials injected via Swarm secrets or environment injection (never committed).
Configuration Model
Define a single configuration surface for “S3-compatible storage” and reuse it across Loki/Tempo and future features.
Common Settings
- Endpoint:
S3_ENDPOINT(e.g.,https://<region>.your-objectstorage.com) - Region:
S3_REGION(string; Hetzner typically requires a region value) - Access key:
S3_ACCESS_KEY_ID(secret) - Secret key:
S3_SECRET_ACCESS_KEY(secret) - Force path-style:
S3_FORCE_PATH_STYLE(true/false, depends on provider) - TLS: enabled by default; allow
S3_INSECURE=trueonly for local MinIO if needed - Prefixes:
S3_PREFIX_LOKI(e.g.,loki/)S3_PREFIX_TEMPO(e.g.,tempo/)
Buckets
S3_BUCKET_LOKIS3_BUCKET_TEMPO
Local Dev: MinIO
Compose Additions
- Add a
minioservice (console + API ports). - Add a
minio-initone-shot job (ormccontainer) to create buckets:cloudlysis-lokicloudlysis-tempo
Developer Workflow
- Default (no S3):
docker compose -f docker-compose.yml -f observability/docker-compose.yml up -d --build
- S3-enabled (MinIO):
- bring up MinIO + observability S3 overrides
- verify Loki/Tempo can write objects (logs/traces show up and buckets have objects)
Production: Hetzner Object Storage
Provisioning
- Create buckets for Loki and Tempo (or a shared bucket with distinct prefixes).
- Enable bucket-level lifecycle policies:
- Loki: retention aligned with schema/index period and desired log retention.
- Tempo: retention aligned with
compactor.block_retentionand operational needs.
Secrets
- Store
S3_ACCESS_KEY_ID/S3_SECRET_ACCESS_KEYas Swarm secrets. - Inject into Loki/Tempo containers as environment variables at runtime.
Operational Considerations
- Timeouts and retries: rely on Loki/Tempo defaults; tune only after measuring.
- Cost controls: lifecycle rules and retention budgets.
- Failure mode: if S3 is unavailable, Loki/Tempo ingest may degrade; decide whether to fail-closed (strict) or allow temporary local buffering.
Implementation Plan (Milestones)
Milestone A: Local MinIO Baseline
- Add
docker-compose.s3.yml:- MinIO
- minio-init bucket provisioning
- Add
docs/(or wiki) instructions for enabling S3 mode locally. - Add a gated smoke test script (manual) verifying buckets exist and can be listed.
Milestone B: Loki S3 Backend
- Add
observability/docker-compose.s3.ymlenabling Loki S3 config. - Add
observability/loki/config.s3.yml(separate from default filesystem config). - Validate:
- logs are queryable in Grafana
- Loki writes objects to the bucket/prefix
Milestone C: Tempo S3 Backend
- Add
observability/docker-compose.s3.ymlenabling Tempo S3 config. - Add
observability/tempo/config.s3.yml(separate from default local config). - Validate:
- traces appear in Tempo
- objects are written to the bucket/prefix
Milestone D: Production Rollout
- Add Swarm stack overlays or configs for Loki/Tempo S3 mode:
- S3 endpoint + region + credentials as secrets
- bucket/prefix configuration
- Provide a rollback plan:
- switch back to filesystem/local (requires persistent volumes for continuity)
Testing and Verification
- Workspace:
cargo fmt --checkcargo clippy --workspace --all-targets -- -D warningscargo test --workspacecd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build
- Local S3 validation (manual, documented):
- MinIO buckets created
- Loki bucket contains objects after ingest
- Tempo bucket contains objects after ingest
Follow-On Opportunities (Phase 2)
- Backup/restore to S3 for MDBX data directories (Aggregate/Projection/Runner/Gateway).
- Artifact storage (projection programs, definitions, deployment bundles) via S3 with signed URLs.
- Multi-tenant isolation at the bucket/prefix policy level.