docs: add S3_PLAN for Hetzner S3 + local MinIO
This commit is contained in:
121
S3_PLAN.md
Normal file
121
S3_PLAN.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
# S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally)
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
- Add S3-compatible object storage as an optional infrastructure dependency.
|
||||||
|
- Use Hetzner Object Storage in production (S3 API compatible).
|
||||||
|
- Use MinIO for local development to mirror production behavior.
|
||||||
|
- Start by moving observability storage (Loki + Tempo) to object storage, keeping local filesystem as the default fallback.
|
||||||
|
|
||||||
|
## Scope (Phase 1)
|
||||||
|
### Observability (Primary)
|
||||||
|
- Loki: store chunks/index in S3-compatible object storage.
|
||||||
|
- Tempo: store traces in S3-compatible object storage.
|
||||||
|
|
||||||
|
### Local Dev Parity
|
||||||
|
- Add MinIO to local compose and provide a documented way to provision required buckets.
|
||||||
|
|
||||||
|
## Non-Goals (Phase 1)
|
||||||
|
- Replacing MDBX/KV primary service storage with S3.
|
||||||
|
- Implementing multi-region replication, object-lock governance, or WORM retention.
|
||||||
|
- Centralized artifact storage for deployments (can be a follow-on).
|
||||||
|
|
||||||
|
## Target Architecture
|
||||||
|
- Local:
|
||||||
|
- `docker compose up` uses filesystem/local volumes by default.
|
||||||
|
- `docker compose -f docker-compose.yml -f docker-compose.s3.yml -f observability/docker-compose.yml -f observability/docker-compose.s3.yml up` enables MinIO-backed Loki/Tempo.
|
||||||
|
- Production:
|
||||||
|
- Loki + Tempo configured for S3 with Hetzner endpoint.
|
||||||
|
- Credentials injected via Swarm secrets or environment injection (never committed).
|
||||||
|
|
||||||
|
## Configuration Model
|
||||||
|
Define a single configuration surface for “S3-compatible storage” and reuse it across Loki/Tempo and future features.
|
||||||
|
|
||||||
|
### Common Settings
|
||||||
|
- Endpoint: `S3_ENDPOINT` (e.g., `https://<region>.your-objectstorage.com`)
|
||||||
|
- Region: `S3_REGION` (string; Hetzner typically requires a region value)
|
||||||
|
- Access key: `S3_ACCESS_KEY_ID` (secret)
|
||||||
|
- Secret key: `S3_SECRET_ACCESS_KEY` (secret)
|
||||||
|
- Force path-style: `S3_FORCE_PATH_STYLE` (`true/false`, depends on provider)
|
||||||
|
- TLS: enabled by default; allow `S3_INSECURE=true` only for local MinIO if needed
|
||||||
|
- Prefixes:
|
||||||
|
- `S3_PREFIX_LOKI` (e.g., `loki/`)
|
||||||
|
- `S3_PREFIX_TEMPO` (e.g., `tempo/`)
|
||||||
|
|
||||||
|
### Buckets
|
||||||
|
- `S3_BUCKET_LOKI`
|
||||||
|
- `S3_BUCKET_TEMPO`
|
||||||
|
|
||||||
|
## Local Dev: MinIO
|
||||||
|
### Compose Additions
|
||||||
|
- Add a `minio` service (console + API ports).
|
||||||
|
- Add a `minio-init` one-shot job (or `mc` container) to create buckets:
|
||||||
|
- `cloudlysis-loki`
|
||||||
|
- `cloudlysis-tempo`
|
||||||
|
|
||||||
|
### Developer Workflow
|
||||||
|
- Default (no S3):
|
||||||
|
- `docker compose -f docker-compose.yml -f observability/docker-compose.yml up -d --build`
|
||||||
|
- S3-enabled (MinIO):
|
||||||
|
- bring up MinIO + observability S3 overrides
|
||||||
|
- verify Loki/Tempo can write objects (logs/traces show up and buckets have objects)
|
||||||
|
|
||||||
|
## Production: Hetzner Object Storage
|
||||||
|
### Provisioning
|
||||||
|
- Create buckets for Loki and Tempo (or a shared bucket with distinct prefixes).
|
||||||
|
- Enable bucket-level lifecycle policies:
|
||||||
|
- Loki: retention aligned with schema/index period and desired log retention.
|
||||||
|
- Tempo: retention aligned with `compactor.block_retention` and operational needs.
|
||||||
|
|
||||||
|
### Secrets
|
||||||
|
- Store `S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY` as Swarm secrets.
|
||||||
|
- Inject into Loki/Tempo containers as environment variables at runtime.
|
||||||
|
|
||||||
|
### Operational Considerations
|
||||||
|
- Timeouts and retries: rely on Loki/Tempo defaults; tune only after measuring.
|
||||||
|
- Cost controls: lifecycle rules and retention budgets.
|
||||||
|
- Failure mode: if S3 is unavailable, Loki/Tempo ingest may degrade; decide whether to fail-closed (strict) or allow temporary local buffering.
|
||||||
|
|
||||||
|
## Implementation Plan (Milestones)
|
||||||
|
### Milestone A: Local MinIO Baseline
|
||||||
|
- Add `docker-compose.s3.yml`:
|
||||||
|
- MinIO
|
||||||
|
- minio-init bucket provisioning
|
||||||
|
- Add `docs/` (or wiki) instructions for enabling S3 mode locally.
|
||||||
|
- Add a gated smoke test script (manual) verifying buckets exist and can be listed.
|
||||||
|
|
||||||
|
### Milestone B: Loki S3 Backend
|
||||||
|
- Add `observability/docker-compose.s3.yml` enabling Loki S3 config.
|
||||||
|
- Add `observability/loki/config.s3.yml` (separate from default filesystem config).
|
||||||
|
- Validate:
|
||||||
|
- logs are queryable in Grafana
|
||||||
|
- Loki writes objects to the bucket/prefix
|
||||||
|
|
||||||
|
### Milestone C: Tempo S3 Backend
|
||||||
|
- Add `observability/docker-compose.s3.yml` enabling Tempo S3 config.
|
||||||
|
- Add `observability/tempo/config.s3.yml` (separate from default local config).
|
||||||
|
- Validate:
|
||||||
|
- traces appear in Tempo
|
||||||
|
- objects are written to the bucket/prefix
|
||||||
|
|
||||||
|
### Milestone D: Production Rollout
|
||||||
|
- Add Swarm stack overlays or configs for Loki/Tempo S3 mode:
|
||||||
|
- S3 endpoint + region + credentials as secrets
|
||||||
|
- bucket/prefix configuration
|
||||||
|
- Provide a rollback plan:
|
||||||
|
- switch back to filesystem/local (requires persistent volumes for continuity)
|
||||||
|
|
||||||
|
## Testing and Verification
|
||||||
|
- Workspace:
|
||||||
|
- `cargo fmt --check`
|
||||||
|
- `cargo clippy --workspace --all-targets -- -D warnings`
|
||||||
|
- `cargo test --workspace`
|
||||||
|
- `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build`
|
||||||
|
- Local S3 validation (manual, documented):
|
||||||
|
- MinIO buckets created
|
||||||
|
- Loki bucket contains objects after ingest
|
||||||
|
- Tempo bucket contains objects after ingest
|
||||||
|
|
||||||
|
## Follow-On Opportunities (Phase 2)
|
||||||
|
- Backup/restore to S3 for MDBX data directories (Aggregate/Projection/Runner/Gateway).
|
||||||
|
- Artifact storage (projection programs, definitions, deployment bundles) via S3 with signed URLs.
|
||||||
|
- Multi-tenant isolation at the bucket/prefix policy level.
|
||||||
Reference in New Issue
Block a user