From 8f9713fb0e70662bfd3bcd373ee032612e0ffb82 Mon Sep 17 00:00:00 2001 From: Vlad Durnea Date: Mon, 30 Mar 2026 14:44:07 +0300 Subject: [PATCH] docs: add S3_PLAN for Hetzner S3 + local MinIO --- S3_PLAN.md | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 S3_PLAN.md diff --git a/S3_PLAN.md b/S3_PLAN.md new file mode 100644 index 0000000..f57b1de --- /dev/null +++ b/S3_PLAN.md @@ -0,0 +1,121 @@ +# S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally) + +## Goals +- Add S3-compatible object storage as an optional infrastructure dependency. +- Use Hetzner Object Storage in production (S3 API compatible). +- Use MinIO for local development to mirror production behavior. +- Start by moving observability storage (Loki + Tempo) to object storage, keeping local filesystem as the default fallback. + +## Scope (Phase 1) +### Observability (Primary) +- Loki: store chunks/index in S3-compatible object storage. +- Tempo: store traces in S3-compatible object storage. + +### Local Dev Parity +- Add MinIO to local compose and provide a documented way to provision required buckets. + +## Non-Goals (Phase 1) +- Replacing MDBX/KV primary service storage with S3. +- Implementing multi-region replication, object-lock governance, or WORM retention. +- Centralized artifact storage for deployments (can be a follow-on). + +## Target Architecture +- Local: + - `docker compose up` uses filesystem/local volumes by default. + - `docker compose -f docker-compose.yml -f docker-compose.s3.yml -f observability/docker-compose.yml -f observability/docker-compose.s3.yml up` enables MinIO-backed Loki/Tempo. +- Production: + - Loki + Tempo configured for S3 with Hetzner endpoint. + - Credentials injected via Swarm secrets or environment injection (never committed). + +## Configuration Model +Define a single configuration surface for “S3-compatible storage” and reuse it across Loki/Tempo and future features. + +### Common Settings +- Endpoint: `S3_ENDPOINT` (e.g., `https://.your-objectstorage.com`) +- Region: `S3_REGION` (string; Hetzner typically requires a region value) +- Access key: `S3_ACCESS_KEY_ID` (secret) +- Secret key: `S3_SECRET_ACCESS_KEY` (secret) +- Force path-style: `S3_FORCE_PATH_STYLE` (`true/false`, depends on provider) +- TLS: enabled by default; allow `S3_INSECURE=true` only for local MinIO if needed +- Prefixes: + - `S3_PREFIX_LOKI` (e.g., `loki/`) + - `S3_PREFIX_TEMPO` (e.g., `tempo/`) + +### Buckets +- `S3_BUCKET_LOKI` +- `S3_BUCKET_TEMPO` + +## Local Dev: MinIO +### Compose Additions +- Add a `minio` service (console + API ports). +- Add a `minio-init` one-shot job (or `mc` container) to create buckets: + - `cloudlysis-loki` + - `cloudlysis-tempo` + +### Developer Workflow +- Default (no S3): + - `docker compose -f docker-compose.yml -f observability/docker-compose.yml up -d --build` +- S3-enabled (MinIO): + - bring up MinIO + observability S3 overrides + - verify Loki/Tempo can write objects (logs/traces show up and buckets have objects) + +## Production: Hetzner Object Storage +### Provisioning +- Create buckets for Loki and Tempo (or a shared bucket with distinct prefixes). +- Enable bucket-level lifecycle policies: + - Loki: retention aligned with schema/index period and desired log retention. + - Tempo: retention aligned with `compactor.block_retention` and operational needs. + +### Secrets +- Store `S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY` as Swarm secrets. +- Inject into Loki/Tempo containers as environment variables at runtime. + +### Operational Considerations +- Timeouts and retries: rely on Loki/Tempo defaults; tune only after measuring. +- Cost controls: lifecycle rules and retention budgets. +- Failure mode: if S3 is unavailable, Loki/Tempo ingest may degrade; decide whether to fail-closed (strict) or allow temporary local buffering. + +## Implementation Plan (Milestones) +### Milestone A: Local MinIO Baseline +- Add `docker-compose.s3.yml`: + - MinIO + - minio-init bucket provisioning +- Add `docs/` (or wiki) instructions for enabling S3 mode locally. +- Add a gated smoke test script (manual) verifying buckets exist and can be listed. + +### Milestone B: Loki S3 Backend +- Add `observability/docker-compose.s3.yml` enabling Loki S3 config. +- Add `observability/loki/config.s3.yml` (separate from default filesystem config). +- Validate: + - logs are queryable in Grafana + - Loki writes objects to the bucket/prefix + +### Milestone C: Tempo S3 Backend +- Add `observability/docker-compose.s3.yml` enabling Tempo S3 config. +- Add `observability/tempo/config.s3.yml` (separate from default local config). +- Validate: + - traces appear in Tempo + - objects are written to the bucket/prefix + +### Milestone D: Production Rollout +- Add Swarm stack overlays or configs for Loki/Tempo S3 mode: + - S3 endpoint + region + credentials as secrets + - bucket/prefix configuration +- Provide a rollback plan: + - switch back to filesystem/local (requires persistent volumes for continuity) + +## Testing and Verification +- Workspace: + - `cargo fmt --check` + - `cargo clippy --workspace --all-targets -- -D warnings` + - `cargo test --workspace` + - `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build` +- Local S3 validation (manual, documented): + - MinIO buckets created + - Loki bucket contains objects after ingest + - Tempo bucket contains objects after ingest + +## Follow-On Opportunities (Phase 2) +- Backup/restore to S3 for MDBX data directories (Aggregate/Projection/Runner/Gateway). +- Artifact storage (projection programs, definitions, deployment bundles) via S3 with signed URLs. +- Multi-tenant isolation at the bucket/prefix policy level.