diff --git a/S3_PLAN.md b/S3_PLAN.md index f57b1de..4b1867e 100644 --- a/S3_PLAN.md +++ b/S3_PLAN.md @@ -1,121 +1,187 @@ # S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally) +## Principles +- S3-compatible object storage is mandatory for platform document storage in every environment: + - Local development uses MinIO. + - Production uses Hetzner Object Storage (S3 API compatible). +- Each milestone is stop-the-line gated: + - All tasks completed + - All milestone tests pass + - Workspace verification commands pass +- Secrets are never committed and never logged: + - Access keys via Swarm secrets in production + - `.env` or compose env in local dev + ## Goals -- Add S3-compatible object storage as an optional infrastructure dependency. -- Use Hetzner Object Storage in production (S3 API compatible). -- Use MinIO for local development to mirror production behavior. -- Start by moving observability storage (Loki + Tempo) to object storage, keeping local filesystem as the default fallback. +- Introduce a single, shared S3-compatible configuration surface for the platform. +- Make document storage always backed by S3 (no filesystem fallback for documents). +- Keep the implementation incremental and test-gated per milestone. +- Optionally expand to observability object storage after document storage is stable. -## Scope (Phase 1) -### Observability (Primary) -- Loki: store chunks/index in S3-compatible object storage. -- Tempo: store traces in S3-compatible object storage. +## Definitions +### Document Storage +“Documents” are versioned blobs the platform needs to store and retrieve reliably: +- Deployment bundles and artifacts +- Definitions/manifests (projection programs, saga/effects definitions, schema bundles) +- Exported audit/log bundles, diagnostics, or snapshots that are not part of the primary KV/MDBX state -### Local Dev Parity -- Add MinIO to local compose and provide a documented way to provision required buckets. +Document storage must support: +- Tenant-scoped namespaces (prefixes) +- Content-addressed or versioned keys (immutability preferred) +- Listing by prefix for admin workflows -## Non-Goals (Phase 1) -- Replacing MDBX/KV primary service storage with S3. -- Implementing multi-region replication, object-lock governance, or WORM retention. -- Centralized artifact storage for deployments (can be a follow-on). +## Configuration Contract (Platform-Wide) +### Common Settings +- `S3_ENDPOINT` (Hetzner: HTTPS endpoint; MinIO: `http://minio:9000`) +- `S3_REGION` (required even for some S3-compatible providers) +- `S3_ACCESS_KEY_ID` (secret) +- `S3_SECRET_ACCESS_KEY` (secret) +- `S3_FORCE_PATH_STYLE` (`true/false`) +- `S3_INSECURE` (`true/false`, only allowed for local MinIO) + +### Buckets and Prefixes +- `S3_BUCKET_DOCS` (required everywhere) +- `S3_PREFIX_DOCS` (default `docs/`) + +Optional (later milestones): +- `S3_BUCKET_LOKI`, `S3_PREFIX_LOKI` +- `S3_BUCKET_TEMPO`, `S3_PREFIX_TEMPO` ## Target Architecture -- Local: - - `docker compose up` uses filesystem/local volumes by default. - - `docker compose -f docker-compose.yml -f docker-compose.s3.yml -f observability/docker-compose.yml -f observability/docker-compose.s3.yml up` enables MinIO-backed Loki/Tempo. -- Production: - - Loki + Tempo configured for S3 with Hetzner endpoint. - - Credentials injected via Swarm secrets or environment injection (never committed). +### Local Development +- MinIO is part of the local stack for parity. +- Control API is the document gateway: + - Upload/download via signed URLs or streamed proxy endpoints + - Metadata stored in existing storage/KV (document index) or derived from key scheme -## Configuration Model -Define a single configuration surface for “S3-compatible storage” and reuse it across Loki/Tempo and future features. +### Production +- Hetzner Object Storage provides S3-compatible bucket(s). +- Credentials and bucket details injected via Swarm secrets and stack env. -### Common Settings -- Endpoint: `S3_ENDPOINT` (e.g., `https://.your-objectstorage.com`) -- Region: `S3_REGION` (string; Hetzner typically requires a region value) -- Access key: `S3_ACCESS_KEY_ID` (secret) -- Secret key: `S3_SECRET_ACCESS_KEY` (secret) -- Force path-style: `S3_FORCE_PATH_STYLE` (`true/false`, depends on provider) -- TLS: enabled by default; allow `S3_INSECURE=true` only for local MinIO if needed -- Prefixes: - - `S3_PREFIX_LOKI` (e.g., `loki/`) - - `S3_PREFIX_TEMPO` (e.g., `tempo/`) +## Development Plan (Milestones by Dependency) -### Buckets -- `S3_BUCKET_LOKI` -- `S3_BUCKET_TEMPO` +## Milestone 0: S3 Contract + Local MinIO Baseline +### Dependencies +- None -## Local Dev: MinIO -### Compose Additions -- Add a `minio` service (console + API ports). -- Add a `minio-init` one-shot job (or `mc` container) to create buckets: - - `cloudlysis-loki` - - `cloudlysis-tempo` +### Goal +Provide a consistent local S3-compatible endpoint and stable bucket naming to unblock higher milestones. -### Developer Workflow -- Default (no S3): - - `docker compose -f docker-compose.yml -f observability/docker-compose.yml up -d --build` -- S3-enabled (MinIO): - - bring up MinIO + observability S3 overrides - - verify Loki/Tempo can write objects (logs/traces show up and buckets have objects) +### Tasks +- [ ] Add MinIO to local development stack: + - [ ] Add `minio` service to compose (API + console) + - [ ] Add `minio-init` job to create required buckets +- [ ] Define standard bucket/prefix defaults for local dev: + - [ ] `S3_BUCKET_DOCS=cloudlysis-docs` + - [ ] `S3_PREFIX_DOCS=docs/` +- [ ] Document local workflow to enable MinIO-backed document storage. -## Production: Hetzner Object Storage -### Provisioning -- Create buckets for Loki and Tempo (or a shared bucket with distinct prefixes). -- Enable bucket-level lifecycle policies: - - Loki: retention aligned with schema/index period and desired log retention. - - Tempo: retention aligned with `compactor.block_retention` and operational needs. +### Required Tests (Gate) +- [ ] Workspace verification commands +- [ ] Local manual verification checklist: + - [ ] `cloudlysis-docs` bucket exists + - [ ] credentials work from a container in the compose network -### Secrets -- Store `S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY` as Swarm secrets. -- Inject into Loki/Tempo containers as environment variables at runtime. +## Milestone 1: Document Storage API (Control API) +### Dependencies +- Milestone 0 -### Operational Considerations -- Timeouts and retries: rely on Loki/Tempo defaults; tune only after measuring. -- Cost controls: lifecycle rules and retention budgets. -- Failure mode: if S3 is unavailable, Loki/Tempo ingest may degrade; decide whether to fail-closed (strict) or allow temporary local buffering. +### Goal +Make document storage a first-class platform API and require it in all environments. -## Implementation Plan (Milestones) -### Milestone A: Local MinIO Baseline -- Add `docker-compose.s3.yml`: - - MinIO - - minio-init bucket provisioning -- Add `docs/` (or wiki) instructions for enabling S3 mode locally. -- Add a gated smoke test script (manual) verifying buckets exist and can be listed. +### Tasks +- [ ] Add an S3 client module to Control API: + - [ ] parse config from env with strict validation (endpoint, bucket, keys) + - [ ] support path-style and TLS/insecure options +- [ ] Implement document primitives: + - [ ] Put (upload) and Get (download) + - [ ] List by prefix (tenant + doc-type) + - [ ] Delete (admin-only) if needed +- [ ] Decide and document a key scheme: + - [ ] tenant-scoped prefix + - [ ] immutable keys preferred (content hash + metadata) +- [ ] Add authz rules for document operations (deny-by-default, tenant-scoped). -### Milestone B: Loki S3 Backend -- Add `observability/docker-compose.s3.yml` enabling Loki S3 config. -- Add `observability/loki/config.s3.yml` (separate from default filesystem config). -- Validate: - - logs are queryable in Grafana - - Loki writes objects to the bucket/prefix +### Required Tests (Gate) +- [ ] Workspace verification commands +- [ ] Unit tests: + - [ ] config parsing/validation + - [ ] key generation stability +- [ ] Gated integration tests (MinIO): + - [ ] put/get roundtrip + - [ ] list by prefix + - [ ] tenant isolation (cannot read other tenant prefix) -### Milestone C: Tempo S3 Backend -- Add `observability/docker-compose.s3.yml` enabling Tempo S3 config. -- Add `observability/tempo/config.s3.yml` (separate from default local config). -- Validate: - - traces appear in Tempo - - objects are written to the bucket/prefix +## Milestone 2: Control UI Integration (Upload/Download Flows) +### Dependencies +- Milestone 1 -### Milestone D: Production Rollout -- Add Swarm stack overlays or configs for Loki/Tempo S3 mode: - - S3 endpoint + region + credentials as secrets - - bucket/prefix configuration -- Provide a rollback plan: - - switch back to filesystem/local (requires persistent volumes for continuity) +### Goal +Make document workflows usable from the Control UI without leaking credentials. -## Testing and Verification -- Workspace: - - `cargo fmt --check` - - `cargo clippy --workspace --all-targets -- -D warnings` - - `cargo test --workspace` - - `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build` -- Local S3 validation (manual, documented): - - MinIO buckets created - - Loki bucket contains objects after ingest - - Tempo bucket contains objects after ingest +### Tasks +- [ ] Add Control API endpoints for signed URLs (recommended) or streamed proxy: + - [ ] create upload URL (PUT) + - [ ] create download URL (GET) +- [ ] Implement Control UI flows for a first document type: + - [ ] upload + - [ ] list + - [ ] download +- [ ] Ensure correlation/trace propagation on Control API operations. -## Follow-On Opportunities (Phase 2) -- Backup/restore to S3 for MDBX data directories (Aggregate/Projection/Runner/Gateway). -- Artifact storage (projection programs, definitions, deployment bundles) via S3 with signed URLs. -- Multi-tenant isolation at the bucket/prefix policy level. +### Required Tests (Gate) +- [ ] Workspace verification commands +- [ ] Control UI unit tests for routing/component render stability +- [ ] Gated end-to-end checklist (local): + - [ ] upload appears in list + - [ ] download returns expected bytes + +## Milestone 3: Production Rollout (Hetzner) +### Dependencies +- Milestone 2 + +### Goal +Deploy document storage on Hetzner S3-compatible backend with production-grade secret handling. + +### Tasks +- [ ] Provision buckets and lifecycle policies (docs bucket): + - [ ] retention rules appropriate to documents + - [ ] access policy scoped to required actions +- [ ] Swarm deployment: + - [ ] add secrets for access keys + - [ ] configure Control API with endpoint/region/bucket/prefix +- [ ] Rollback plan: + - [ ] switch to a fallback bucket or MinIO-on-prod if needed + +### Required Tests (Gate) +- [ ] Workspace verification commands +- [ ] Production smoke runbook: + - [ ] upload/list/download for a tenant + - [ ] verify objects exist under expected prefixes + +## Milestone 4 (Optional): Observability Storage on S3 (Loki + Tempo) +### Dependencies +- Milestone 3 + +### Goal +Store logs and traces in S3-compatible storage (MinIO locally; Hetzner in production). + +### Tasks +- [ ] Loki: + - [ ] add S3 config variant and compose overlay + - [ ] validate log query and bucket objects +- [ ] Tempo: + - [ ] add S3 config variant and compose overlay + - [ ] validate traces and bucket objects + +### Required Tests (Gate) +- [ ] Workspace verification commands +- [ ] Gated local validation: + - [ ] Loki writes objects to bucket/prefix after ingest + - [ ] Tempo writes objects to bucket/prefix after ingest + +## Workspace Verification Commands +- `cargo fmt --check` +- `cargo clippy --workspace --all-targets -- -D warnings` +- `cargo test --workspace` +- `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build`