Files
cloudlysis/S3_PLAN.md
Vlad Durnea 8f9713fb0e
Some checks failed
ci / ui (push) Failing after 28s
images / build-and-push (push) Failing after 18s
ci / rust (push) Failing after 2m28s
docs: add S3_PLAN for Hetzner S3 + local MinIO
2026-03-30 14:44:07 +03:00

5.1 KiB

S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally)

Goals

  • Add S3-compatible object storage as an optional infrastructure dependency.
  • Use Hetzner Object Storage in production (S3 API compatible).
  • Use MinIO for local development to mirror production behavior.
  • Start by moving observability storage (Loki + Tempo) to object storage, keeping local filesystem as the default fallback.

Scope (Phase 1)

Observability (Primary)

  • Loki: store chunks/index in S3-compatible object storage.
  • Tempo: store traces in S3-compatible object storage.

Local Dev Parity

  • Add MinIO to local compose and provide a documented way to provision required buckets.

Non-Goals (Phase 1)

  • Replacing MDBX/KV primary service storage with S3.
  • Implementing multi-region replication, object-lock governance, or WORM retention.
  • Centralized artifact storage for deployments (can be a follow-on).

Target Architecture

  • Local:
    • docker compose up uses filesystem/local volumes by default.
    • docker compose -f docker-compose.yml -f docker-compose.s3.yml -f observability/docker-compose.yml -f observability/docker-compose.s3.yml up enables MinIO-backed Loki/Tempo.
  • Production:
    • Loki + Tempo configured for S3 with Hetzner endpoint.
    • Credentials injected via Swarm secrets or environment injection (never committed).

Configuration Model

Define a single configuration surface for “S3-compatible storage” and reuse it across Loki/Tempo and future features.

Common Settings

  • Endpoint: S3_ENDPOINT (e.g., https://<region>.your-objectstorage.com)
  • Region: S3_REGION (string; Hetzner typically requires a region value)
  • Access key: S3_ACCESS_KEY_ID (secret)
  • Secret key: S3_SECRET_ACCESS_KEY (secret)
  • Force path-style: S3_FORCE_PATH_STYLE (true/false, depends on provider)
  • TLS: enabled by default; allow S3_INSECURE=true only for local MinIO if needed
  • Prefixes:
    • S3_PREFIX_LOKI (e.g., loki/)
    • S3_PREFIX_TEMPO (e.g., tempo/)

Buckets

  • S3_BUCKET_LOKI
  • S3_BUCKET_TEMPO

Local Dev: MinIO

Compose Additions

  • Add a minio service (console + API ports).
  • Add a minio-init one-shot job (or mc container) to create buckets:
    • cloudlysis-loki
    • cloudlysis-tempo

Developer Workflow

  • Default (no S3):
    • docker compose -f docker-compose.yml -f observability/docker-compose.yml up -d --build
  • S3-enabled (MinIO):
    • bring up MinIO + observability S3 overrides
    • verify Loki/Tempo can write objects (logs/traces show up and buckets have objects)

Production: Hetzner Object Storage

Provisioning

  • Create buckets for Loki and Tempo (or a shared bucket with distinct prefixes).
  • Enable bucket-level lifecycle policies:
    • Loki: retention aligned with schema/index period and desired log retention.
    • Tempo: retention aligned with compactor.block_retention and operational needs.

Secrets

  • Store S3_ACCESS_KEY_ID / S3_SECRET_ACCESS_KEY as Swarm secrets.
  • Inject into Loki/Tempo containers as environment variables at runtime.

Operational Considerations

  • Timeouts and retries: rely on Loki/Tempo defaults; tune only after measuring.
  • Cost controls: lifecycle rules and retention budgets.
  • Failure mode: if S3 is unavailable, Loki/Tempo ingest may degrade; decide whether to fail-closed (strict) or allow temporary local buffering.

Implementation Plan (Milestones)

Milestone A: Local MinIO Baseline

  • Add docker-compose.s3.yml:
    • MinIO
    • minio-init bucket provisioning
  • Add docs/ (or wiki) instructions for enabling S3 mode locally.
  • Add a gated smoke test script (manual) verifying buckets exist and can be listed.

Milestone B: Loki S3 Backend

  • Add observability/docker-compose.s3.yml enabling Loki S3 config.
  • Add observability/loki/config.s3.yml (separate from default filesystem config).
  • Validate:
    • logs are queryable in Grafana
    • Loki writes objects to the bucket/prefix

Milestone C: Tempo S3 Backend

  • Add observability/docker-compose.s3.yml enabling Tempo S3 config.
  • Add observability/tempo/config.s3.yml (separate from default local config).
  • Validate:
    • traces appear in Tempo
    • objects are written to the bucket/prefix

Milestone D: Production Rollout

  • Add Swarm stack overlays or configs for Loki/Tempo S3 mode:
    • S3 endpoint + region + credentials as secrets
    • bucket/prefix configuration
  • Provide a rollback plan:
    • switch back to filesystem/local (requires persistent volumes for continuity)

Testing and Verification

  • Workspace:
    • cargo fmt --check
    • cargo clippy --workspace --all-targets -- -D warnings
    • cargo test --workspace
    • cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build
  • Local S3 validation (manual, documented):
    • MinIO buckets created
    • Loki bucket contains objects after ingest
    • Tempo bucket contains objects after ingest

Follow-On Opportunities (Phase 2)

  • Backup/restore to S3 for MDBX data directories (Aggregate/Projection/Runner/Gateway).
  • Artifact storage (projection programs, definitions, deployment bundles) via S3 with signed URLs.
  • Multi-tenant isolation at the bucket/prefix policy level.