docs: restructure S3 plan into dep-ordered milestones; make S3 mandatory for document storage
This commit is contained in:
264
S3_PLAN.md
264
S3_PLAN.md
@@ -1,121 +1,187 @@
|
|||||||
# S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally)
|
# S3-Compatible Object Storage Plan (Hetzner in Prod, MinIO Locally)
|
||||||
|
|
||||||
|
## Principles
|
||||||
|
- S3-compatible object storage is mandatory for platform document storage in every environment:
|
||||||
|
- Local development uses MinIO.
|
||||||
|
- Production uses Hetzner Object Storage (S3 API compatible).
|
||||||
|
- Each milestone is stop-the-line gated:
|
||||||
|
- All tasks completed
|
||||||
|
- All milestone tests pass
|
||||||
|
- Workspace verification commands pass
|
||||||
|
- Secrets are never committed and never logged:
|
||||||
|
- Access keys via Swarm secrets in production
|
||||||
|
- `.env` or compose env in local dev
|
||||||
|
|
||||||
## Goals
|
## Goals
|
||||||
- Add S3-compatible object storage as an optional infrastructure dependency.
|
- Introduce a single, shared S3-compatible configuration surface for the platform.
|
||||||
- Use Hetzner Object Storage in production (S3 API compatible).
|
- Make document storage always backed by S3 (no filesystem fallback for documents).
|
||||||
- Use MinIO for local development to mirror production behavior.
|
- Keep the implementation incremental and test-gated per milestone.
|
||||||
- Start by moving observability storage (Loki + Tempo) to object storage, keeping local filesystem as the default fallback.
|
- Optionally expand to observability object storage after document storage is stable.
|
||||||
|
|
||||||
## Scope (Phase 1)
|
## Definitions
|
||||||
### Observability (Primary)
|
### Document Storage
|
||||||
- Loki: store chunks/index in S3-compatible object storage.
|
“Documents” are versioned blobs the platform needs to store and retrieve reliably:
|
||||||
- Tempo: store traces in S3-compatible object storage.
|
- Deployment bundles and artifacts
|
||||||
|
- Definitions/manifests (projection programs, saga/effects definitions, schema bundles)
|
||||||
|
- Exported audit/log bundles, diagnostics, or snapshots that are not part of the primary KV/MDBX state
|
||||||
|
|
||||||
### Local Dev Parity
|
Document storage must support:
|
||||||
- Add MinIO to local compose and provide a documented way to provision required buckets.
|
- Tenant-scoped namespaces (prefixes)
|
||||||
|
- Content-addressed or versioned keys (immutability preferred)
|
||||||
|
- Listing by prefix for admin workflows
|
||||||
|
|
||||||
## Non-Goals (Phase 1)
|
## Configuration Contract (Platform-Wide)
|
||||||
- Replacing MDBX/KV primary service storage with S3.
|
### Common Settings
|
||||||
- Implementing multi-region replication, object-lock governance, or WORM retention.
|
- `S3_ENDPOINT` (Hetzner: HTTPS endpoint; MinIO: `http://minio:9000`)
|
||||||
- Centralized artifact storage for deployments (can be a follow-on).
|
- `S3_REGION` (required even for some S3-compatible providers)
|
||||||
|
- `S3_ACCESS_KEY_ID` (secret)
|
||||||
|
- `S3_SECRET_ACCESS_KEY` (secret)
|
||||||
|
- `S3_FORCE_PATH_STYLE` (`true/false`)
|
||||||
|
- `S3_INSECURE` (`true/false`, only allowed for local MinIO)
|
||||||
|
|
||||||
|
### Buckets and Prefixes
|
||||||
|
- `S3_BUCKET_DOCS` (required everywhere)
|
||||||
|
- `S3_PREFIX_DOCS` (default `docs/`)
|
||||||
|
|
||||||
|
Optional (later milestones):
|
||||||
|
- `S3_BUCKET_LOKI`, `S3_PREFIX_LOKI`
|
||||||
|
- `S3_BUCKET_TEMPO`, `S3_PREFIX_TEMPO`
|
||||||
|
|
||||||
## Target Architecture
|
## Target Architecture
|
||||||
- Local:
|
### Local Development
|
||||||
- `docker compose up` uses filesystem/local volumes by default.
|
- MinIO is part of the local stack for parity.
|
||||||
- `docker compose -f docker-compose.yml -f docker-compose.s3.yml -f observability/docker-compose.yml -f observability/docker-compose.s3.yml up` enables MinIO-backed Loki/Tempo.
|
- Control API is the document gateway:
|
||||||
- Production:
|
- Upload/download via signed URLs or streamed proxy endpoints
|
||||||
- Loki + Tempo configured for S3 with Hetzner endpoint.
|
- Metadata stored in existing storage/KV (document index) or derived from key scheme
|
||||||
- Credentials injected via Swarm secrets or environment injection (never committed).
|
|
||||||
|
|
||||||
## Configuration Model
|
### Production
|
||||||
Define a single configuration surface for “S3-compatible storage” and reuse it across Loki/Tempo and future features.
|
- Hetzner Object Storage provides S3-compatible bucket(s).
|
||||||
|
- Credentials and bucket details injected via Swarm secrets and stack env.
|
||||||
|
|
||||||
### Common Settings
|
## Development Plan (Milestones by Dependency)
|
||||||
- Endpoint: `S3_ENDPOINT` (e.g., `https://<region>.your-objectstorage.com`)
|
|
||||||
- Region: `S3_REGION` (string; Hetzner typically requires a region value)
|
|
||||||
- Access key: `S3_ACCESS_KEY_ID` (secret)
|
|
||||||
- Secret key: `S3_SECRET_ACCESS_KEY` (secret)
|
|
||||||
- Force path-style: `S3_FORCE_PATH_STYLE` (`true/false`, depends on provider)
|
|
||||||
- TLS: enabled by default; allow `S3_INSECURE=true` only for local MinIO if needed
|
|
||||||
- Prefixes:
|
|
||||||
- `S3_PREFIX_LOKI` (e.g., `loki/`)
|
|
||||||
- `S3_PREFIX_TEMPO` (e.g., `tempo/`)
|
|
||||||
|
|
||||||
### Buckets
|
## Milestone 0: S3 Contract + Local MinIO Baseline
|
||||||
- `S3_BUCKET_LOKI`
|
### Dependencies
|
||||||
- `S3_BUCKET_TEMPO`
|
- None
|
||||||
|
|
||||||
## Local Dev: MinIO
|
### Goal
|
||||||
### Compose Additions
|
Provide a consistent local S3-compatible endpoint and stable bucket naming to unblock higher milestones.
|
||||||
- Add a `minio` service (console + API ports).
|
|
||||||
- Add a `minio-init` one-shot job (or `mc` container) to create buckets:
|
|
||||||
- `cloudlysis-loki`
|
|
||||||
- `cloudlysis-tempo`
|
|
||||||
|
|
||||||
### Developer Workflow
|
### Tasks
|
||||||
- Default (no S3):
|
- [ ] Add MinIO to local development stack:
|
||||||
- `docker compose -f docker-compose.yml -f observability/docker-compose.yml up -d --build`
|
- [ ] Add `minio` service to compose (API + console)
|
||||||
- S3-enabled (MinIO):
|
- [ ] Add `minio-init` job to create required buckets
|
||||||
- bring up MinIO + observability S3 overrides
|
- [ ] Define standard bucket/prefix defaults for local dev:
|
||||||
- verify Loki/Tempo can write objects (logs/traces show up and buckets have objects)
|
- [ ] `S3_BUCKET_DOCS=cloudlysis-docs`
|
||||||
|
- [ ] `S3_PREFIX_DOCS=docs/`
|
||||||
|
- [ ] Document local workflow to enable MinIO-backed document storage.
|
||||||
|
|
||||||
## Production: Hetzner Object Storage
|
### Required Tests (Gate)
|
||||||
### Provisioning
|
- [ ] Workspace verification commands
|
||||||
- Create buckets for Loki and Tempo (or a shared bucket with distinct prefixes).
|
- [ ] Local manual verification checklist:
|
||||||
- Enable bucket-level lifecycle policies:
|
- [ ] `cloudlysis-docs` bucket exists
|
||||||
- Loki: retention aligned with schema/index period and desired log retention.
|
- [ ] credentials work from a container in the compose network
|
||||||
- Tempo: retention aligned with `compactor.block_retention` and operational needs.
|
|
||||||
|
|
||||||
### Secrets
|
## Milestone 1: Document Storage API (Control API)
|
||||||
- Store `S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY` as Swarm secrets.
|
### Dependencies
|
||||||
- Inject into Loki/Tempo containers as environment variables at runtime.
|
- Milestone 0
|
||||||
|
|
||||||
### Operational Considerations
|
### Goal
|
||||||
- Timeouts and retries: rely on Loki/Tempo defaults; tune only after measuring.
|
Make document storage a first-class platform API and require it in all environments.
|
||||||
- Cost controls: lifecycle rules and retention budgets.
|
|
||||||
- Failure mode: if S3 is unavailable, Loki/Tempo ingest may degrade; decide whether to fail-closed (strict) or allow temporary local buffering.
|
|
||||||
|
|
||||||
## Implementation Plan (Milestones)
|
### Tasks
|
||||||
### Milestone A: Local MinIO Baseline
|
- [ ] Add an S3 client module to Control API:
|
||||||
- Add `docker-compose.s3.yml`:
|
- [ ] parse config from env with strict validation (endpoint, bucket, keys)
|
||||||
- MinIO
|
- [ ] support path-style and TLS/insecure options
|
||||||
- minio-init bucket provisioning
|
- [ ] Implement document primitives:
|
||||||
- Add `docs/` (or wiki) instructions for enabling S3 mode locally.
|
- [ ] Put (upload) and Get (download)
|
||||||
- Add a gated smoke test script (manual) verifying buckets exist and can be listed.
|
- [ ] List by prefix (tenant + doc-type)
|
||||||
|
- [ ] Delete (admin-only) if needed
|
||||||
|
- [ ] Decide and document a key scheme:
|
||||||
|
- [ ] tenant-scoped prefix
|
||||||
|
- [ ] immutable keys preferred (content hash + metadata)
|
||||||
|
- [ ] Add authz rules for document operations (deny-by-default, tenant-scoped).
|
||||||
|
|
||||||
### Milestone B: Loki S3 Backend
|
### Required Tests (Gate)
|
||||||
- Add `observability/docker-compose.s3.yml` enabling Loki S3 config.
|
- [ ] Workspace verification commands
|
||||||
- Add `observability/loki/config.s3.yml` (separate from default filesystem config).
|
- [ ] Unit tests:
|
||||||
- Validate:
|
- [ ] config parsing/validation
|
||||||
- logs are queryable in Grafana
|
- [ ] key generation stability
|
||||||
- Loki writes objects to the bucket/prefix
|
- [ ] Gated integration tests (MinIO):
|
||||||
|
- [ ] put/get roundtrip
|
||||||
|
- [ ] list by prefix
|
||||||
|
- [ ] tenant isolation (cannot read other tenant prefix)
|
||||||
|
|
||||||
### Milestone C: Tempo S3 Backend
|
## Milestone 2: Control UI Integration (Upload/Download Flows)
|
||||||
- Add `observability/docker-compose.s3.yml` enabling Tempo S3 config.
|
### Dependencies
|
||||||
- Add `observability/tempo/config.s3.yml` (separate from default local config).
|
- Milestone 1
|
||||||
- Validate:
|
|
||||||
- traces appear in Tempo
|
|
||||||
- objects are written to the bucket/prefix
|
|
||||||
|
|
||||||
### Milestone D: Production Rollout
|
### Goal
|
||||||
- Add Swarm stack overlays or configs for Loki/Tempo S3 mode:
|
Make document workflows usable from the Control UI without leaking credentials.
|
||||||
- S3 endpoint + region + credentials as secrets
|
|
||||||
- bucket/prefix configuration
|
|
||||||
- Provide a rollback plan:
|
|
||||||
- switch back to filesystem/local (requires persistent volumes for continuity)
|
|
||||||
|
|
||||||
## Testing and Verification
|
### Tasks
|
||||||
- Workspace:
|
- [ ] Add Control API endpoints for signed URLs (recommended) or streamed proxy:
|
||||||
- `cargo fmt --check`
|
- [ ] create upload URL (PUT)
|
||||||
- `cargo clippy --workspace --all-targets -- -D warnings`
|
- [ ] create download URL (GET)
|
||||||
- `cargo test --workspace`
|
- [ ] Implement Control UI flows for a first document type:
|
||||||
- `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build`
|
- [ ] upload
|
||||||
- Local S3 validation (manual, documented):
|
- [ ] list
|
||||||
- MinIO buckets created
|
- [ ] download
|
||||||
- Loki bucket contains objects after ingest
|
- [ ] Ensure correlation/trace propagation on Control API operations.
|
||||||
- Tempo bucket contains objects after ingest
|
|
||||||
|
|
||||||
## Follow-On Opportunities (Phase 2)
|
### Required Tests (Gate)
|
||||||
- Backup/restore to S3 for MDBX data directories (Aggregate/Projection/Runner/Gateway).
|
- [ ] Workspace verification commands
|
||||||
- Artifact storage (projection programs, definitions, deployment bundles) via S3 with signed URLs.
|
- [ ] Control UI unit tests for routing/component render stability
|
||||||
- Multi-tenant isolation at the bucket/prefix policy level.
|
- [ ] Gated end-to-end checklist (local):
|
||||||
|
- [ ] upload appears in list
|
||||||
|
- [ ] download returns expected bytes
|
||||||
|
|
||||||
|
## Milestone 3: Production Rollout (Hetzner)
|
||||||
|
### Dependencies
|
||||||
|
- Milestone 2
|
||||||
|
|
||||||
|
### Goal
|
||||||
|
Deploy document storage on Hetzner S3-compatible backend with production-grade secret handling.
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
- [ ] Provision buckets and lifecycle policies (docs bucket):
|
||||||
|
- [ ] retention rules appropriate to documents
|
||||||
|
- [ ] access policy scoped to required actions
|
||||||
|
- [ ] Swarm deployment:
|
||||||
|
- [ ] add secrets for access keys
|
||||||
|
- [ ] configure Control API with endpoint/region/bucket/prefix
|
||||||
|
- [ ] Rollback plan:
|
||||||
|
- [ ] switch to a fallback bucket or MinIO-on-prod if needed
|
||||||
|
|
||||||
|
### Required Tests (Gate)
|
||||||
|
- [ ] Workspace verification commands
|
||||||
|
- [ ] Production smoke runbook:
|
||||||
|
- [ ] upload/list/download for a tenant
|
||||||
|
- [ ] verify objects exist under expected prefixes
|
||||||
|
|
||||||
|
## Milestone 4 (Optional): Observability Storage on S3 (Loki + Tempo)
|
||||||
|
### Dependencies
|
||||||
|
- Milestone 3
|
||||||
|
|
||||||
|
### Goal
|
||||||
|
Store logs and traces in S3-compatible storage (MinIO locally; Hetzner in production).
|
||||||
|
|
||||||
|
### Tasks
|
||||||
|
- [ ] Loki:
|
||||||
|
- [ ] add S3 config variant and compose overlay
|
||||||
|
- [ ] validate log query and bucket objects
|
||||||
|
- [ ] Tempo:
|
||||||
|
- [ ] add S3 config variant and compose overlay
|
||||||
|
- [ ] validate traces and bucket objects
|
||||||
|
|
||||||
|
### Required Tests (Gate)
|
||||||
|
- [ ] Workspace verification commands
|
||||||
|
- [ ] Gated local validation:
|
||||||
|
- [ ] Loki writes objects to bucket/prefix after ingest
|
||||||
|
- [ ] Tempo writes objects to bucket/prefix after ingest
|
||||||
|
|
||||||
|
## Workspace Verification Commands
|
||||||
|
- `cargo fmt --check`
|
||||||
|
- `cargo clippy --workspace --all-targets -- -D warnings`
|
||||||
|
- `cargo test --workspace`
|
||||||
|
- `cd control/ui && npm ci && npm run lint && npm run typecheck && npm run test && npm run build`
|
||||||
|
|||||||
Reference in New Issue
Block a user