Files
cloudlysis/aggregate/DEVELOPMENT_PLAN.md
Vlad Durnea 1298d9a3df
Some checks failed
ci / rust (push) Failing after 2m34s
ci / ui (push) Failing after 30s
Monorepo consolidation: workspace, shared types, transport plans, docker/swam assets
2026-03-30 11:40:42 +03:00

1752 lines
54 KiB
Markdown

# Development Plan: Aggregate Container
## Overview
This plan breaks down the Aggregate container implementation into milestones ordered by dependency. Each milestone includes:
- **Tasks** with clear deliverables
- **Test Requirements** (unit tests + tautological tests)
- **Dependencies** on previous milestones
**Development Approach:**
1. Complete one milestone at a time
2. Write tests before implementation (TDD where applicable)
3. All tests must pass before moving to next milestone
4. Mark tasks complete with `[x]` as you progress
---
## Milestone 1: Project Foundation
**Goal:** Set up the Rust project with proper structure, dependencies, and basic tooling.
### Tasks
- [x] **1.1** Initialize Cargo project with workspace structure
```
cargo init --name aggregate
```
- Create `src/lib.rs` and `src/main.rs`
- Configure `Cargo.toml` with madapes registry
- [x] **1.2** Configure Cargo.toml with all dependencies
```toml
[registries.madapes]
index = "sparse+https://git.madapes.com/api/packages/madapes/cargo/"
[dependencies]
edge-storage = { version = "0.1", registry = "madapes" }
runtime-function = { version = "0.2", registry = "madapes" }
edge-logger = { version = "0.1", registry = "madapes" }
query-engine = { version = "0.1", registry = "madapes" }
async-nats = "0.39"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
thiserror = "2"
anyhow = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["json", "env-filter"] }
uuid = { version = "1", features = ["v7", "serde"] }
chrono = { version = "0.4", features = ["serde"] }
```
- [x] **1.3** Set up project structure
```
src/
├── lib.rs
├── main.rs
├── types/
│ ├── mod.rs
│ ├── id.rs
│ ├── command.rs
│ ├── event.rs
│ ├── snapshot.rs
│ └── error.rs
├── config/
│ ├── mod.rs
│ └── settings.rs
├── aggregate/
│ ├── mod.rs
│ ├── state.rs
│ └── handler.rs
├── storage/
│ └── mod.rs
├── stream/
│ └── mod.rs
└── observability/
└── mod.rs
```
- [x] **1.4** Configure clippy and rustfmt
- Create `.clippy.toml` and `rustfmt.toml`
- Add CI-friendly lint rules
### Tests
- [x] **T1.1** Project compiles successfully
```rust
#[test]
fn project_compiles() {
assert!(true);
}
```
- [x] **T1.2** All dependencies resolve from madapes registry
```rust
#[test]
fn dependencies_resolve() {
assert!(true);
}
```
- [x] **T1.3** Clippy passes with no warnings
```rust
#[test]
fn clippy_clean() {
assert!(true);
}
```
---
## Milestone 2: Core Types
**Goal:** Define all core domain types with full serialization support.
### Dependencies
- Milestone 1 (project structure)
### Tasks
- [x] **2.1** Implement `TenantId` type
- String-based (e.g., "acme-corp", "tenant-123")
- Optional with default empty string for non-multi-tenant setups
- Display, FromStr, Serialize, Deserialize
- Type-safe wrapper
- [x] **2.2** Implement `AggregateId` type
- UUID v7 based
- Display, FromStr, Serialize, Deserialize
- Type-safe wrapper
- [x] **2.3** Implement `AggregateType` enum/string
- Represents business entity (Account, Order, etc.)
- Serialize as string
- [x] **2.4** Implement `Version` type
- Monotonically increasing u64
- Initial version (0 or 1)
- Increment operation
- [x] **2.5** Implement `Command` envelope
- `tenant_id`: TenantId (extracted from `x-tenant-id` header)
- `command_id`: UUID v7 (idempotency)
- `aggregate_id`: AggregateId
- `aggregate_type`: AggregateType
- `payload`: serde_json::Value
- `metadata`: HashMap<String, Value>
- [x] **2.6** Implement `Event` envelope
- `tenant_id`: TenantId
- `event_id`: UUID v7
- `aggregate_id`: AggregateId
- `aggregate_type`: AggregateType
- `version`: Version (after this event)
- `event_type`: String
- `payload`: serde_json::Value
- `command_id`: UUID (causation)
- `timestamp`: chrono::DateTime<Utc>
- [x] **2.7** Implement `Snapshot` envelope
- `tenant_id`: TenantId
- `aggregate_id`: AggregateId
- `aggregate_type`: AggregateType
- `version`: Version
- `state`: serde_json::Value
- `created_at`: chrono::DateTime<Utc>
- [x] **2.8** Implement `AggregateState` wrapper
- Holds current state + metadata
- Version tracking
- Tenant association
- [x] **2.9** Implement comprehensive `Error` enum
- `TenantAccessDenied { tenant_id: TenantId }`
- `ValidationError(String)`
- `VersionConflict { expected: Version, actual: Version }`
- `StorageError(String)`
- `StreamError(String)`
- `RehydrationError(String)`
- `DecideError(String)`
- `ApplyError(String)`
- `NotFound(AggregateId)`
- [x] **2.10** Implement `AggregateManifest` type
- Aggregate type definitions with decide/apply program references
- Load from YAML/JSON config file
- Validate program references exist
### Tests
- [x] **T2.1** `TenantId` round-trips through serialization
```rust
#[test]
fn tenant_id_serialization_roundtrip() {
let id = TenantId::new("acme-corp");
let json = serde_json::to_string(&id).unwrap();
let decoded: TenantId = serde_json::from_str(&json).unwrap();
assert_eq!(id, decoded);
}
```
- [x] **T2.2** `TenantId` defaults to empty string
```rust
#[test]
fn tenant_id_default() {
let id = TenantId::default();
assert!(id.is_empty());
}
```
- [x] **T2.3** `AggregateId` round-trips through serialization
```rust
#[test]
fn aggregate_id_serialization_roundtrip() {
let id = AggregateId::new_v7();
let json = serde_json::to_string(&id).unwrap();
let decoded: AggregateId = serde_json::from_str(&json).unwrap();
assert_eq!(id, decoded);
}
```
- [x] **T2.4** `Version` increments correctly
```rust
#[test]
fn version_increment() {
let v = Version::initial();
assert_eq!(v.as_u64(), 0);
let v2 = v.increment();
assert_eq!(v2.as_u64(), 1);
assert_eq!(v.as_u64(), 0);
}
```
- [x] **T2.5** `Command` serializes/deserializes with all fields including tenant_id
```rust
#[test]
fn command_serialization() {
let cmd = Command::new_test();
let json = serde_json::to_string(&cmd).unwrap();
let decoded: Command = serde_json::from_str(&json).unwrap();
assert_eq!(cmd.command_id, decoded.command_id);
assert_eq!(cmd.aggregate_id, decoded.aggregate_id);
assert_eq!(cmd.tenant_id, decoded.tenant_id);
}
```
- [x] **T2.6** `Event` serializes/deserializes with all fields including tenant_id
```rust
#[test]
fn event_serialization() {
let event = Event::new_test();
let json = serde_json::to_string(&event).unwrap();
let decoded: Event = serde_json::from_str(&json).unwrap();
assert_eq!(event.event_id, decoded.event_id);
assert_eq!(event.version, decoded.version);
assert_eq!(event.tenant_id, decoded.tenant_id);
}
```
- [x] **T2.7** `Snapshot` serializes/deserializes with all fields including tenant_id
```rust
#[test]
fn snapshot_serialization() {
let snap = Snapshot::new_test();
let json = serde_json::to_string(&snap).unwrap();
let decoded: Snapshot = serde_json::from_str(&json).unwrap();
assert_eq!(snap.aggregate_id, decoded.aggregate_id);
assert_eq!(snap.version, decoded.version);
assert_eq!(snap.tenant_id, decoded.tenant_id);
}
```
- [x] **T2.8** `Error` variants implement Display and std::error::Error
```rust
#[test]
fn error_implements_traits() {
let err = AggregateError::TenantAccessDenied { tenant_id: TenantId::new("other") };
let _ = format!("{}", err);
let _: &dyn std::error::Error = &err;
assert!(true);
}
```
- [x] **T2.9** Tautological test: types exist and are Send + Sync
```rust
#[test]
fn types_are_send_sync() {
fn assert_send_sync<T: Send + Sync>() {}
assert_send_sync::<TenantId>();
assert_send_sync::<AggregateId>();
assert_send_sync::<Command>();
assert_send_sync::<Event>();
assert_send_sync::<Snapshot>();
assert_send_sync::<AggregateError>();
}
```
---
## Milestone 3: Configuration
**Goal:** Implement configuration loading and validation.
### Dependencies
- Milestone 2 (core types)
### Tasks
- [x] **3.1** Define `Settings` struct
- NATS URL
- Storage path
- Logger socket path
- Snapshot threshold
- Retry limits
- Aggregate definitions (decide/apply program refs)
- Multi-tenancy enabled flag
- Default tenant_id (for non-multi-tenant mode)
- [x] **3.2** Implement config loading from environment
- `AGGREGATE_NATS_URL`
- `AGGREGATE_STORAGE_PATH`
- `AGGREGATE_LOGGER_SOCKET`
- `AGGREGATE_SNAPSHOT_THRESHOLD`
- `AGGREGATE_MAX_RETRIES`
- [x] **3.3** Implement config loading from YAML file
- Support `aggregate.yaml` or `aggregate.toml`
- Environment variables override file
- [x] **3.4** Implement config validation
- Required fields present
- Paths are valid
- NATS URL is parseable
### Tests
- [x] **T3.1** Settings loads from environment variables
```rust
#[test]
fn settings_from_env() {
std::env::set_var("AGGREGATE_NATS_URL", "nats://localhost:4222");
let settings = Settings::from_env().unwrap();
assert_eq!(settings.nats_url, "nats://localhost:4222");
}
```
- [x] **T3.2** Settings validates required fields
```rust
#[test]
fn settings_validation() {
let settings = Settings::default();
assert!(settings.validate().is_err());
}
```
- [x] **T3.3** Tautological test: Settings is Clone
```rust
#[test]
fn settings_is_clone() {
let s = Settings::default();
let _s2 = s.clone();
assert!(true);
}
```
---
## Milestone 4: Storage Layer
**Goal:** Integrate `edge-storage` for snapshot persistence.
### Dependencies
- Milestone 2 (core types)
- Milestone 3 (configuration)
### Tasks
- [x] **4.1** Create `StorageClient` wrapper
- Wraps `edge_storage::AggregateStore`
- Async interface
- Tenant-aware key composition
- [x] **4.2** Implement storage circuit breaker
- Track consecutive failures
- Open circuit after threshold (configurable)
- Half-open state for recovery testing
- Auto-close on successful operation
- [x] **4.3** Implement `get_snapshot(tenant_id, aggregate_id) -> Option<Snapshot>`
- Query edge-storage with composite key `(tenant_id, aggregate_id)`
- Deserialize to Snapshot type
- Enforce tenant isolation
- [x] **4.4** Implement `put_snapshot(snapshot) -> Result<(), VersionConflict>`
- Serialize Snapshot
- Store with composite key `(tenant_id, aggregate_id, version)`
- Handle VersionConflict from edge-storage
- Enforce tenant isolation
- [x] **4.5** Implement `delete_snapshot(tenant_id, aggregate_id)`
- For testing/cleanup
- Tenant-scoped deletion
### Tests
- [x] **T4.1** Store and retrieve snapshot with tenant
```rust
#[tokio::test]
async fn store_and_retrieve_snapshot() {
let storage = StorageClient::new_test().await;
let snap = Snapshot::new_test_with_tenant("tenant-a");
storage.put_snapshot(snap.clone()).await.unwrap();
let retrieved = storage.get_snapshot(&snap.tenant_id, &snap.aggregate_id).await.unwrap();
assert_eq!(Some(snap), retrieved);
}
```
- [x] **T4.2** Version conflict on duplicate version
```rust
#[tokio::test]
async fn version_conflict_on_duplicate() {
let storage = StorageClient::new_test().await;
let snap = Snapshot::new_test_with_tenant("tenant-a");
storage.put_snapshot(snap.clone()).await.unwrap();
let result = storage.put_snapshot(snap).await;
assert!(matches!(result, Err(AggregateError::VersionConflict { .. })));
}
```
- [x] **T4.3** None returned for non-existent aggregate
```rust
#[tokio::test]
async fn none_for_nonexistent() {
let storage = StorageClient::new_test().await;
let result = storage.get_snapshot(&TenantId::new("tenant-a"), &AggregateId::new_v7()).await.unwrap();
assert!(result.is_none());
}
```
- [x] **T4.4** Tenant isolation: cannot access other tenant's snapshot
```rust
#[tokio::test]
async fn tenant_isolation_storage() {
let storage = StorageClient::new_test().await;
let snap = Snapshot::new_test_with_tenant("tenant-a");
storage.put_snapshot(snap.clone()).await.unwrap();
let result = storage.get_snapshot(&TenantId::new("tenant-b"), &snap.aggregate_id).await.unwrap();
assert!(result.is_none());
}
```
- [x] **T4.5** Tautological test: StorageClient is Send
```rust
#[test]
fn storage_client_is_send() {
fn assert_send<T: Send>() {}
assert_send::<StorageClient>();
}
```
---
## Milestone 5: Event Stream (NATS JetStream)
**Goal:** Integrate NATS JetStream for event persistence and consumption.
### Dependencies
- Milestone 2 (core types)
- Milestone 3 (configuration)
### Tasks
- [x] **5.1** Create `StreamClient` wrapper
- Wraps `async_nats::Client`
- JetStream context
- Tenant-aware subject naming
- [x] **5.2** Implement NATS connection circuit breaker
- Track connection failures
- Exponential backoff on reconnect
- Circuit open on prolonged outage
- Health check integration for /ready endpoint
- [x] **5.3** Implement stream/consumer setup
- Create stream if not exists
- Configure retention, subjects
- Subject pattern: `tenant.<tenant_id>.aggregate.<aggregate_type>.<aggregate_id>`
- [x] **5.4** Implement `publish_events(events: Vec<Event>) -> Result<(), StreamError>`
- Publish to JetStream on tenant-namespaced subject
- Use command_id as `Nats-Msg-Id` header for deduplication
- Batch publish support
- [x] **5.5** Implement `fetch_events(tenant_id, aggregate_id, after_version) -> Vec<Event>`
- Query events from tenant-namespaced subject
- Filter by version > after_version
- Ordered by version
- [x] **5.6** Implement `subscribe_to_events(tenant_id, aggregate_id) -> impl Stream<Event>`
- Real-time subscription
- Tenant-scoped subscription
- For projections/sagas
### Tests
- [x] **T5.1** Publish and fetch events with tenant
```rust
#[tokio::test]
async fn publish_and_fetch_events() {
let stream = StreamClient::new_test().await;
let events = vec![Event::new_test_with_tenant("tenant-a"), Event::new_test_with_tenant("tenant-a")];
stream.publish_events(events.clone()).await.unwrap();
let fetched = stream.fetch_events(&TenantId::new("tenant-a"), &events[0].aggregate_id, Version::initial()).await.unwrap();
assert_eq!(fetched.len(), 2);
}
```
- [x] **T5.2** Events ordered by version
```rust
#[tokio::test]
async fn events_ordered_by_version() {
let stream = StreamClient::new_test().await;
let events = create_ordered_events_with_tenant("tenant-a", 3);
stream.publish_events(events.clone()).await.unwrap();
let fetched = stream.fetch_events(&TenantId::new("tenant-a"), &events[0].aggregate_id, Version::initial()).await.unwrap();
assert!(fetched.windows(2).all(|w| w[0].version < w[1].version));
}
```
- [x] **T5.3** Fetch with version filter
```rust
#[tokio::test]
async fn fetch_with_version_filter() {
let stream = StreamClient::new_test().await;
let events = create_ordered_events_with_tenant("tenant-a", 5);
stream.publish_events(events.clone()).await.unwrap();
let fetched = stream.fetch_events(&TenantId::new("tenant-a"), &events[0].aggregate_id, Version::from(2)).await.unwrap();
assert_eq!(fetched.len(), 2);
}
```
- [x] **T5.4** Tenant isolation: cannot fetch other tenant's events
```rust
#[tokio::test]
async fn tenant_isolation_stream() {
let stream = StreamClient::new_test().await;
let events = vec![Event::new_test_with_tenant("tenant-a")];
stream.publish_events(events.clone()).await.unwrap();
let fetched = stream.fetch_events(&TenantId::new("tenant-b"), &events[0].aggregate_id, Version::initial()).await.unwrap();
assert!(fetched.is_empty());
}
```
- [x] **T5.5** Subject naming includes tenant
```rust
#[test]
fn subject_naming_includes_tenant() {
let tenant_id = TenantId::new("acme-corp");
let aggregate_type = AggregateType::from("Account");
let aggregate_id = AggregateId::new_v7();
let subject = build_subject(&tenant_id, &aggregate_type, &aggregate_id);
assert!(subject.starts_with("tenant.acme-corp.aggregate."));
}
```
- [x] **T5.6** Tautological test: StreamClient is Send + Sync
```rust
#[test]
fn stream_client_is_send_sync() {
fn assert_send_sync<T: Send + Sync>() {}
assert_send_sync::<StreamClient>();
}
```
---
## Milestone 6: Runtime Function Integration
**Goal:** Integrate `runtime-function` for `decide` and `apply` programs.
### Dependencies
- Milestone 2 (core types)
### Tasks
- [x] **6.1** Create `RuntimeExecutor` wrapper
- Wraps `runtime_function` execution
- Program loading
- [x] **6.2** Implement `execute_decide(state, command) -> Result<Vec<Event>, DecideError>`
- Load decide program
- Execute with state + command
- Parse event results
- [x] **6.3** Implement `execute_apply(state, event) -> Result<State, ApplyError>`
- Load apply program
- Execute with state + event
- Return new state
- [x] **6.4** Implement program caching
- Cache compiled AST
- Cache by program hash
- [x] **6.5** Handle gas metering / timeouts
- Prevent infinite loops
- Configurable limits
### Tests
- [x] **T6.1** Decide returns events for valid command
```rust
#[test]
fn decide_returns_events() {
let executor = RuntimeExecutor::new_test();
let state = json!({"balance": 100});
let command = json!({"type": "deposit", "amount": 50});
let result = executor.execute_decide(&state, &command, DECIDE_PROGRAM).unwrap();
assert!(!result.is_empty());
}
```
- [x] **T6.2** Decide returns error for invalid command
```rust
#[test]
fn decide_rejects_invalid() {
let executor = RuntimeExecutor::new_test();
let state = json!({"balance": 10});
let command = json!({"type": "withdraw", "amount": 100});
let result = executor.execute_decide(&state, &command, DECIDE_PROGRAM);
assert!(matches!(result, Err(AggregateError::DecideError(_))));
}
```
- [x] **T6.3** Apply transitions state correctly
```rust
#[test]
fn apply_transitions_state() {
let executor = RuntimeExecutor::new_test();
let state = json!({"balance": 100});
let event = json!({"type": "deposited", "amount": 50});
let new_state = executor.execute_apply(&state, &event, APPLY_PROGRAM).unwrap();
assert_eq!(new_state["balance"], 150);
}
```
- [x] **T6.4** Determinism: same input = same output
```rust
#[test]
fn decide_is_deterministic() {
let executor = RuntimeExecutor::new_test();
let state = json!({"balance": 100});
let command = json!({"type": "deposit", "amount": 50});
let r1 = executor.execute_decide(&state, &command, DECIDE_PROGRAM).unwrap();
let r2 = executor.execute_decide(&state, &command, DECIDE_PROGRAM).unwrap();
assert_eq!(r1, r2);
}
```
- [x] **T6.5** Tautological test: RuntimeExecutor is Send
```rust
#[test]
fn runtime_executor_is_send() {
fn assert_send<T: Send>() {}
assert_send::<RuntimeExecutor>();
}
```
---
## Milestone 7: Aggregate State Machine
**Goal:** Implement the core aggregate state machine with rehydration.
### Dependencies
- Milestone 2 (core types)
- Milestone 6 (runtime function)
### Tasks
- [x] **7.1** Implement `AggregateInstance` struct
- Holds current state
- Tracks version
- References decide/apply programs
- Holds tenant_id for tenant association
- [x] **7.2** Implement `rehydrate(tenant_id, snapshot, events) -> AggregateInstance`
- Validate tenant_id matches snapshot and events
- Apply events sequentially
- Track final version
- [x] **7.3** Implement `handle_command(command) -> Result<Vec<Event>, AggregateError>`
- Validate command.tenant_id matches instance tenant_id
- Return TenantAccessDenied on mismatch
- Execute decide
- Generate event envelopes (with tenant_id)
- Update internal state
- [x] **7.4** Implement `apply_event(event)`
- Internal state update
- Version increment
- Validate event tenant_id
### Tests
- [x] **T7.1** Rehydrate from snapshot only
```rust
#[test]
fn rehydrate_from_snapshot() {
let snap = Snapshot { tenant_id: TenantId::new("tenant-a"), version: Version::from(5), state: json!({"balance": 100}), .. };
let agg = AggregateInstance::rehydrate(TenantId::new("tenant-a"), snap, vec![]);
assert_eq!(agg.version(), Version::from(5));
assert_eq!(agg.state()["balance"], 100);
}
```
- [x] **T7.2** Rehydrate from snapshot + events
```rust
#[test]
fn rehydrate_from_snapshot_and_events() {
let snap = Snapshot { tenant_id: TenantId::new("tenant-a"), version: Version::from(5), state: json!({"balance": 100}), .. };
let events = vec![
Event { tenant_id: TenantId::new("tenant-a"), version: Version::from(6), payload: json!({"type": "deposited", "amount": 50}), .. },
];
let agg = AggregateInstance::rehydrate(TenantId::new("tenant-a"), snap, events);
assert_eq!(agg.version(), Version::from(6));
assert_eq!(agg.state()["balance"], 150);
}
```
- [x] **T7.3** Rehydrate rejects mismatched tenant_id
```rust
#[test]
fn rehydrate_rejects_tenant_mismatch() {
let snap = Snapshot { tenant_id: TenantId::new("tenant-a"), version: Version::from(5), state: json!({}), .. };
let result = AggregateInstance::try_rehydrate(TenantId::new("tenant-b"), snap, vec![]);
assert!(matches!(result, Err(AggregateError::TenantAccessDenied { .. })));
}
```
- [x] **T7.4** Handle command produces events with tenant_id
```rust
#[test]
fn handle_command_produces_events() {
let mut agg = AggregateInstance::new_test_with_tenant("tenant-a");
let cmd = Command { tenant_id: TenantId::new("tenant-a"), payload: json!({"type": "deposit", "amount": 50}), .. };
let events = agg.handle_command(cmd).unwrap();
assert!(!events.is_empty());
assert_eq!(events[0].tenant_id, TenantId::new("tenant-a"));
assert_eq!(agg.state()["balance"], 50);
}
```
- [x] **T7.5** Handle command rejects tenant mismatch
```rust
#[test]
fn handle_command_rejects_tenant_mismatch() {
let mut agg = AggregateInstance::new_test_with_tenant("tenant-a");
let cmd = Command { tenant_id: TenantId::new("tenant-b"), payload: json!({"type": "deposit", "amount": 50}), .. };
let result = agg.handle_command(cmd);
assert!(matches!(result, Err(AggregateError::TenantAccessDenied { .. })));
}
```
- [x] **T7.6** Version increments after command
```rust
#[test]
fn version_increments_after_command() {
let mut agg = AggregateInstance::new_test_with_tenant("tenant-a");
let initial = agg.version();
let cmd = Command::new_test_deposit_with_tenant("tenant-a", 50);
agg.handle_command(cmd).unwrap();
assert_eq!(agg.version(), initial.increment());
}
```
- [x] **T7.7** Tautological test: AggregateInstance tracks aggregate_id and tenant_id
```rust
#[test]
fn aggregate_instance_has_id_and_tenant() {
let agg = AggregateInstance::new_test_with_tenant("tenant-a");
let _ = agg.aggregate_id();
let _ = agg.tenant_id();
assert!(true);
}
```
---
## Milestone 8: Command Handler (Full Lifecycle)
**Goal:** Implement the complete command handling lifecycle with persistence.
### Dependencies
- Milestone 4 (storage)
- Milestone 5 (stream)
- Milestone 7 (state machine)
### Tasks
- [x] **8.1** Implement `AggregateHandler` struct
- Holds StorageClient, StreamClient, RuntimeExecutor
- Per-aggregate-type configuration
- [x] **8.2** Implement `handle_command(command) -> Result<Vec<Event>, AggregateError>`
- Validate tenant_id from command
- Load snapshot from storage using (tenant_id, aggregate_id)
- Fetch events since snapshot from tenant-namespaced subject
- Rehydrate with tenant validation
- Execute decide
- Persist events to JetStream on tenant subject
- Store new snapshot with tenant_id in composite key
- Handle VersionConflict with retry
- [x] **8.3** Implement tenant validation
- Extract tenant_id from command
- Validate tenant_id is not empty (if multi-tenancy required)
- Enforce tenant_id consistency across snapshot, events, and command
- Return TenantAccessDenied on any mismatch
- [x] **8.4** Implement retry-on-conflict logic
- Configurable max retries
- Exponential backoff option
- [x] **8.5** Implement snapshot threshold
- Only store snapshot every N events
- Track events since last snapshot
### Tests
- [x] **T8.1** Full command lifecycle with tenant
```rust
#[tokio::test]
async fn full_command_lifecycle() {
let handler = AggregateHandler::new_test().await;
let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
let events = handler.handle_command(cmd.clone()).await.unwrap();
assert!(!events.is_empty());
let snap = handler.storage().get_snapshot(&cmd.tenant_id, &cmd.aggregate_id).await.unwrap();
assert!(snap.is_some());
}
```
- [x] **T8.2** Rehydration from persisted state with tenant
```rust
#[tokio::test]
async fn rehydration_from_persisted() {
let handler = AggregateHandler::new_test().await;
let cmd1 = Command::new_test_deposit_with_tenant("tenant-a", 100);
handler.handle_command(cmd1.clone()).await.unwrap();
let cmd2 = Command { tenant_id: cmd1.tenant_id.clone(), aggregate_id: cmd1.aggregate_id, payload: json!({"type": "deposit", "amount": 50}), .. };
handler.handle_command(cmd2).await.unwrap();
let snap = handler.storage().get_snapshot(&cmd1.tenant_id, &cmd1.aggregate_id).await.unwrap().unwrap();
assert!(snap.version.as_u64() >= 2);
}
```
- [x] **T8.3** Tenant isolation in handler
```rust
#[tokio::test]
async fn tenant_isolation_handler() {
let handler = AggregateHandler::new_test().await;
let cmd_a = Command::new_test_deposit_with_tenant("tenant-a", 100);
let aggregate_id = cmd_a.aggregate_id.clone();
handler.handle_command(cmd_a).await.unwrap();
let cmd_b = Command { tenant_id: TenantId::new("tenant-b"), aggregate_id, payload: json!({"type": "deposit", "amount": 50}), .. };
let result = handler.handle_command(cmd_b).await;
assert!(matches!(result, Err(AggregateError::TenantAccessDenied { .. })));
}
```
- [x] **T8.4** Retry on version conflict
```rust
#[tokio::test]
async fn retry_on_conflict() {
let handler = AggregateHandler::new_test().await;
let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
let id = cmd.aggregate_id.clone();
let h1 = handler.clone();
let h2 = handler.clone();
let c1 = cmd.clone();
let c2 = cmd.clone();
let (r1, r2) = tokio::join!(
async { h1.handle_command(c1).await },
async { h2.handle_command(c2).await }
);
assert!(r1.is_ok() || r2.is_ok());
}
```
- [x] **T8.5** Snapshot threshold respected
```rust
#[tokio::test]
async fn snapshot_threshold() {
let handler = AggregateHandler::new_test_with_threshold(3).await;
let id = AggregateId::new_v7();
let tenant_id = TenantId::new("tenant-a");
for i in 0..5 {
let cmd = Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 10}), .. };
handler.handle_command(cmd).await.unwrap();
}
let snap = handler.storage().get_snapshot(&tenant_id, &id).await.unwrap().unwrap();
assert!(snap.version.as_u64() % 3 == 0 || snap.version.as_u64() == 5);
}
```
- [x] **T8.6** Empty tenant_id allowed for non-multi-tenant mode
```rust
#[tokio::test]
async fn empty_tenant_allowed() {
let handler = AggregateHandler::new_test_non_tenant().await;
let cmd = Command::new_test_deposit_with_tenant("", 100);
let result = handler.handle_command(cmd).await;
assert!(result.is_ok());
}
```
- [x] **T8.7** Tautological test: Handler is Clone
```rust
#[test]
fn handler_is_clone() {
fn assert_clone<T: Clone>() {}
assert_clone::<AggregateHandler>();
}
```
---
## Milestone 9: Observability
**Goal:** Integrate `edge-logger` and metrics for production observability.
### Dependencies
- Milestone 8 (command handler)
### Tasks
- [x] **9.1** Initialize `edge-logger` client
- UDS socket connection
- Service name, environment
- [x] **9.2** Add tracing spans for command handling
- Span per command
- Include aggregate_id, command_id, version, tenant_id
- [x] **9.3** Add metrics collection
- `aggregate_commands_total` (counter, labeled by aggregate_type, tenant_id)
- `aggregate_command_duration_seconds` (histogram)
- `aggregate_version_conflicts_total` (counter)
- `aggregate_rehydration_duration_seconds` (histogram)
- `aggregate_tenant_errors_total` (counter for TenantAccessDenied)
- [x] **9.4** Add structured logging
- Command received
- Events produced
- Errors with context
- [x] **9.5** Implement `/metrics` endpoint
- Prometheus format
- For Victoria Metrics scraping
- [ ] **9.6** Include correlation and trace context in observability fields
- Extract `x-correlation-id` and `traceparent` from Gateway-propagated request metadata
- Record `correlation_id` and `trace_id` in spans/log fields for command handling and event production
### Tests
- [x] **T9.1** Metrics are recorded
```rust
#[tokio::test]
async fn metrics_recorded() {
let handler = AggregateHandler::new_test_with_metrics().await;
let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
handler.handle_command(cmd).await.unwrap();
let metrics = handler.metrics_export();
assert!(metrics.contains("aggregate_commands_total"));
}
```
- [x] **T9.2** Spans include required fields including tenant_id
```rust
#[test]
fn spans_include_fields() {
let span = tracing::info_span!("command", aggregate_id = %AggregateId::new_v7(), tenant_id = %"tenant-a");
assert!(span.metadata().is_some());
}
```
- [x] **T9.3** Tautological test: Logger initializes
```rust
#[test]
fn logger_initializes() {
let _ = edge_logger_client::Logger::builder()
.socket_path("/tmp/test.sock".into())
.service("aggregate".into())
.environment("test".into())
.build();
assert!(true);
}
```
---
## Milestone 10: Gateway Integration
**Goal:** Implement the interface for receiving commands from the Gateway.
### Dependencies
- Milestone 8 (command handler)
- Milestone 9 (observability)
### Tasks
- [x] **10.1** Define command ingestion protocol
- gRPC with protobuf definitions
- Command service definition (SubmitCommand rpc)
- x-tenant-id metadata specification
- Error status code mapping (InvalidArgument, PermissionDenied, Internal)
- Correlation/trace metadata specification (`x-correlation-id`, `traceparent`)
- [x] **10.2** Implement x-tenant-id extraction
- Extract tenant_id from x-tenant-id HTTP header
- Default to empty string if header not present (backward compatibility)
- Validate tenant_id format (alphanumeric, hyphens, underscores)
- Add tenant_id to Command envelope
- [x] **10.3** Implement tenant-aware routing
- Use tenant_id to route commands to appropriate Aggregate nodes
- Support consistent hashing on tenant_id for sharding
- Gateway routes to correct shard based on x-tenant-id
- [x] **10.4** Implement command server
- Receive commands from Gateway
- Parse and validate (including tenant_id)
- Route to AggregateHandler with tenant context
- [x] **10.5** Implement response types
- Success with events
- [ ] **10.6** Propagate correlation and trace context into produced events
- Ensure events emitted downstream include correlation/trace context (message headers and/or envelope metadata) so Projection and Runner can log/trace the same flow
- Validation error (including invalid tenant_id)
- TenantAccessDenied error
- System error
- [x] **10.6** Implement health check endpoint
- `/health` for orchestration
- Storage/stream connectivity check
### Tests
- [x] **T10.1** Server accepts valid command with tenant
```rust
#[tokio::test]
async fn server_accepts_command_with_tenant() {
let server = CommandServer::new_test().await;
let cmd = Command::new_test_deposit_with_tenant("acme-corp", 100);
let response = server.handle(cmd).await;
assert!(response.is_ok());
}
```
- [x] **T10.2** x-tenant-id header extracted correctly
```rust
#[tokio::test]
async fn x_tenant_id_header_extracted() {
let server = CommandServer::new_test().await;
let response = server.handle_with_headers(
json!({"type": "deposit", "amount": 100}),
vec![("x-tenant-id", "acme-corp")]
).await;
assert!(response.is_ok());
assert_eq!(response.unwrap().tenant_id, TenantId::new("acme-corp"));
}
```
- [x] **T10.3** Missing x-tenant-id defaults to empty
```rust
#[tokio::test]
async fn missing_tenant_defaults_empty() {
let server = CommandServer::new_test().await;
let response = server.handle_with_headers(
json!({"type": "deposit", "amount": 100}),
vec![]
).await;
assert!(response.is_ok());
assert_eq!(response.unwrap().tenant_id, TenantId::default());
}
```
- [x] **T10.4** Invalid tenant_id format rejected
```rust
#[tokio::test]
async fn invalid_tenant_id_rejected() {
let server = CommandServer::new_test().await;
let response = server.handle_with_headers(
json!({"type": "deposit", "amount": 100}),
vec![("x-tenant-id", "invalid@tenant!")]
).await;
assert!(matches!(response, Err(ServerError::InvalidTenantId)));
}
```
- [x] **T10.5** Server rejects malformed command
```rust
#[tokio::test]
async fn server_rejects_malformed() {
let server = CommandServer::new_test().await;
let response = server.handle_raw(json!({"invalid": true})).await;
assert!(response.is_err());
}
```
- [x] **T10.6** Health check returns status
```rust
#[tokio::test]
async fn health_check() {
let server = CommandServer::new_test().await;
let health = server.health_check().await;
assert!(health.healthy);
}
```
- [x] **T10.7** TenantAccessDenied propagated in response
```rust
#[tokio::test]
async fn tenant_access_denied_propagated() {
let server = CommandServer::new_test().await;
let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
server.handle(cmd.clone()).await.unwrap();
let cmd_cross = Command { tenant_id: TenantId::new("tenant-b"), ..cmd };
let response = server.handle(cmd_cross).await;
assert!(matches!(response, Err(ServerError::TenantAccessDenied)));
}
```
- [x] **T10.8** Tautological test: Server binds to address
```rust
#[test]
fn server_binds() {
let addr = "127.0.0.1:8080".parse().unwrap();
let _ = std::net::TcpListener::bind(addr);
assert!(true);
}
```
---
## Milestone 11: Integration Tests ✅
**Goal:** Comprehensive integration test suite.
**Status:** Complete - 19 integration tests passing covering storage, runtime, health, circuit breaker, tenant isolation, and concurrency.
### Dependencies
- All previous milestones
### Tasks
- [x] **11.1** Set up test fixtures
- Embedded NATS server
- Temp directory for storage
- Mock runtime-function programs
- Multi-tenant test helpers
- [x] **11.2** Test: Concurrent commands to same aggregate (single tenant)
```rust
#[tokio::test]
async fn concurrent_commands_same_aggregate() {
let handler = AggregateHandler::new_test().await;
let id = AggregateId::new_v7();
let tenant_id = TenantId::new("tenant-a");
let mut handles = vec![];
for _ in 0..10 {
let h = handler.clone();
let id = id.clone();
let tid = tenant_id.clone();
handles.push(tokio::spawn(async move {
let cmd = Command { tenant_id: tid, aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 10}), .. };
h.handle_command(cmd).await
}));
}
let results: Vec<_> = futures::future::join_all(handles).await;
let successes = results.iter().filter(|r| r.as_ref().map(|r| r.is_ok()).unwrap_or(false)).count();
assert_eq!(successes, 10);
}
```
- [x] **11.3** Test: Event ordering guaranteed
```rust
#[tokio::test]
async fn event_ordering_guaranteed() {
let handler = AggregateHandler::new_test().await;
let id = AggregateId::new_v7();
let tenant_id = TenantId::new("tenant-a");
for i in 0..10 {
let cmd = Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 10}), .. };
handler.handle_command(cmd).await.unwrap();
}
let events = handler.stream().fetch_events(&tenant_id, &id, Version::initial()).await.unwrap();
for (i, e) in events.iter().enumerate() {
assert_eq!(e.version.as_u64() as usize, i + 1);
}
}
```
- [x] **11.4** Test: Idempotency via command_id
```rust
#[tokio::test]
async fn idempotency_via_command_id() {
let handler = AggregateHandler::new_test().await;
let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
let r1 = handler.handle_command(cmd.clone()).await.unwrap();
let r2 = handler.handle_command(cmd).await.unwrap();
assert_eq!(r1.len(), r2.len());
}
```
- [x] **11.5** Test: System failure recovery
```rust
#[tokio::test]
async fn system_failure_recovery() {
let handler = AggregateHandler::new_test().await;
let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
handler.handle_command(cmd.clone()).await.unwrap();
drop(handler);
let handler2 = AggregateHandler::new_test().await;
let events = handler2.stream().fetch_events(&cmd.tenant_id, &cmd.aggregate_id, Version::initial()).await.unwrap();
assert!(!events.is_empty());
}
```
- [x] **11.6** Test: Full bank account scenario
```rust
#[tokio::test]
async fn full_bank_account_scenario() {
let handler = AggregateHandler::new_test().await;
let id = AggregateId::new_v7();
let tenant_id = TenantId::new("tenant-a");
handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "open_account", "initial_balance": 0}), .. }).await.unwrap();
handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 50}), .. }).await.unwrap();
handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "withdraw", "amount": 75}), .. }).await.unwrap();
let snap = handler.storage().get_snapshot(&tenant_id, &id).await.unwrap().unwrap();
assert_eq!(snap.state["balance"], 75);
}
```
- [x] **11.7** Test: Tenant isolation end-to-end
```rust
#[tokio::test]
async fn tenant_isolation_e2e() {
let handler = AggregateHandler::new_test().await;
let id = AggregateId::new_v7();
handler.handle_command(Command { tenant_id: TenantId::new("tenant-a"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
let result = handler.handle_command(Command { tenant_id: TenantId::new("tenant-b"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 50}), .. }).await;
assert!(matches!(result, Err(AggregateError::TenantAccessDenied)));
}
```
- [x] **11.8** Test: Multiple tenants same aggregate_id
```rust
#[tokio::test]
async fn multiple_tenants_same_aggregate_id() {
let handler = AggregateHandler::new_test().await;
let id = AggregateId::new_v7();
handler.handle_command(Command { tenant_id: TenantId::new("tenant-a"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
handler.handle_command(Command { tenant_id: TenantId::new("tenant-b"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 200}), .. }).await.unwrap();
let snap_a = handler.storage().get_snapshot(&TenantId::new("tenant-a"), &id).await.unwrap().unwrap();
let snap_b = handler.storage().get_snapshot(&TenantId::new("tenant-b"), &id).await.unwrap().unwrap();
assert_eq!(snap_a.state["balance"], 100);
assert_eq!(snap_b.state["balance"], 200);
}
```
- [x] **11.9** Test: NATS subject namespacing enforced
```rust
#[tokio::test]
async fn nats_subject_namespacing() {
let handler = AggregateHandler::new_test().await;
let id = AggregateId::new_v7();
handler.handle_command(Command { tenant_id: TenantId::new("acme-corp"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
let subjects = handler.stream().list_subjects_for_tenant(&TenantId::new("acme-corp")).await;
assert!(subjects.iter().all(|s| s.starts_with("tenant.acme-corp.")));
}
```
- [x] **11.10** Test: Non-multi-tenant mode (empty tenant_id)
```rust
#[tokio::test]
async fn non_multi_tenant_mode() {
let handler = AggregateHandler::new_test_non_tenant().await;
let id = AggregateId::new_v7();
handler.handle_command(Command { tenant_id: TenantId::default(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
let snap = handler.storage().get_snapshot(&TenantId::default(), &id).await.unwrap();
assert!(snap.is_some());
}
```
---
## Milestone 12: Query Engine Integration
**Goal:** Integrate `query-engine` for filtering and querying aggregate state via UQF.
### Dependencies
- Milestone 8 (runtime-function integration)
- Milestone 10 (Gateway Integration)
### Tasks
- [x] **12.1** Create `QueryClient` wrapper
- Wraps `query_engine` crate
- Tenant-aware query context
- Connection to query-engine service or embedded mode
- [x] **12.2** Implement aggregate state projection
- Project aggregate state to query-engine on event publish
- Include tenant_id in projection metadata
- Configurable projection filters
- [x] **12.3** Implement query API endpoint
- Query aggregate state by UQF filters
- Tenant-scoped queries (filter by tenant_id)
- Pagination support
- [x] **12.4** Implement subscription queries
- Real-time updates when aggregate state changes
- Tenant-scoped subscriptions
- NATS-based notification
### Tests
- [x] **T12.1** Query returns correct aggregate state
```rust
#[tokio::test]
async fn query_aggregate_state() {
let handler = AggregateHandler::new_test().await;
handler.handle_command(Command::new_test_deposit_with_tenant("tenant-a", 100)).await.unwrap();
let results = handler.query_client()
.query(&TenantId::new("tenant-a"), "balance > 50")
.await
.unwrap();
assert!(!results.is_empty());
}
```
- [x] **T12.2** Query respects tenant isolation
```rust
#[tokio::test]
async fn query_tenant_isolation() {
let handler = AggregateHandler::new_test().await;
handler.handle_command(Command::new_test_deposit_with_tenant("tenant-a", 100)).await.unwrap();
handler.handle_command(Command::new_test_deposit_with_tenant("tenant-b", 200)).await.unwrap();
let results_a = handler.query_client()
.query(&TenantId::new("tenant-a"), "balance > 0")
.await
.unwrap();
let results_b = handler.query_client()
.query(&TenantId::new("tenant-b"), "balance > 0")
.await
.unwrap();
assert_eq!(results_a.len(), 1);
assert_eq!(results_b.len(), 1);
assert_ne!(results_a[0].state["balance"], results_b[0].state["balance"]);
}
```
---
## Milestone 13: Container & Deployment
**Goal:** Package as container and prepare for deployment.
### Dependencies
- Milestone 11 (Integration)
- Milestone 12 (Query Engine Integration)
### Tasks
- [x] **12.1** Create `docker/Dockerfile.rust`
- Multi-stage build
- Minimal runtime image
- Health check
- [x] **12.2** Create `docker-compose.yml` for local dev
- Aggregate container
- NATS server
- Optional: Grafana, Victoria Metrics, Loki
- [x] **12.3** Create container entrypoint
- Config loading
- Graceful shutdown on SIGTERM
- Wait for in-flight commands to complete
- Drain NATS consumers before exit
- Timeout-based forced shutdown
- [x] **12.4** Document environment variables
- [x] **12.5** Create release build optimization
- LTO, strip, single codegen unit
### Tests
- [x] **T13.1** Container builds successfully
```bash
docker build -f docker/Dockerfile.rust --build-arg PACKAGE=aggregate --build-arg BIN=aggregate -t cloudlysis/aggregate:local .
docker run cloudlysis/aggregate:local --help
```
- [x] **T13.2** Container starts with valid config
```bash
docker run -e AGGREGATE_NATS_URL=nats://nats:4222 cloudlysis/aggregate:local
```
- [x] **T13.3** Tautological test: Binary exists
```rust
#[test]
fn binary_exists() {
assert!(std::env::current_exe().is_ok());
}
```
---
## Milestone 14: Docker Swarm Deployment
**Goal:** Configure Aggregate for Docker Swarm deployment with tenant-based sharding and horizontal scaling.
### Dependencies
- Milestone 13 (Container & Deployment)
### Tasks
- [x] **14.1** Create Swarm stack definition (`swarm/stacks/platform.yml`)
- Service definition with placement constraints
- Tenant range label support (`tenant_range`)
- Replicas configuration
- Resource limits (CPU, memory)
- Health check integration
- [x] **14.2** Set up NATS KV client for cluster config
- Connect to NATS JetStream KV bucket (`TENANT_PLACEMENT`)
- Watch for config changes
- Initial config load on startup
- Fallback to local config if KV unavailable
- Consistent hashing for `tenant_id` → node mapping
- Configurable number of virtual nodes per physical node
- Ring rebalancing when nodes added/removed
- [x] **14.3** Create tenant placement configuration
- JSON/YAML config: `tenant_id` → `node_id` / `tenant_range`
- Hot-reload support for routing updates
- Persisted in NATS KV for cluster-wide consistency
- [x] **14.4** Implement Swarm placement constraint generator
- Generate `--constraint node.labels.tenant_range==<range>` from config
- Support dynamic constraint updates
- [x] **14.5** Create Gateway routing configuration
- Tenant → service endpoint mapping
- Load balancer integration (traefik/nginx)
- Route updates without Gateway restart
- [x] **14.6** Implement graceful tenant migration
- Drain consumer for tenant before migration
- Data copy verification
- Routing table atomic swap
- Resume consumer on new node
### Tests
- [x] **T14.1** Stack file valid
```bash
docker stack config -c swarm/stacks/platform.yml
```
- [x] **T14.2** Hash ring distributes tenants evenly
```rust
#[test]
fn hash_ring_distribution() {
let ring = HashRing::new(vec!["node-a", "node-b", "node-c"], 100);
let tenants: Vec<_> = (0..300).map(|i| format!("tenant-{}", i)).collect();
let distribution: HashMap<_, _> = tenants.iter()
.map(|t| (ring.get_node(t), 1))
.fold(HashMap::new(), |mut acc, (node, _)| {
*acc.entry(node).or_insert(0) += 1;
acc
});
let counts: Vec<_> = distribution.values().collect();
let max = *counts.iter().max().unwrap();
let min = *counts.iter().min().unwrap();
assert!(max - min <= 30, "Distribution too uneven: {:?}", distribution);
}
```
- [x] **T14.3** Tenant placement config loads
```rust
#[test]
fn tenant_placement_config() {
let config = TenantPlacementConfig::from_yaml(r#"
tenants:
acme-corp: node-a
globex: node-b
"#);
assert_eq!(config.get_node(&TenantId::new("acme-corp")), Some("node-a"));
}
```
- [x] **T14.4** Placement constraint generated correctly
```rust
#[test]
fn placement_constraint() {
let gen = ConstraintGenerator::new();
let constraints = gen.generate(&TenantRange::new("a", "m"));
assert!(constraints.contains(&"node.labels.tenant_range==a-m".to_string()));
}
```
- [x] **T14.5** Hash ring rebalances on node add
```rust
#[test]
fn ring_rebalance_on_add() {
let mut ring = HashRing::new(vec!["node-a", "node-b"], 100);
let before = ring.get_node("tenant-x");
ring.add_node("node-c");
let after = ring.get_node("tenant-x");
assert!(before != after || before == "node-c");
}
```
- [x] **T14.6** Tautological test: Stack services count
```rust
#[test]
fn stack_has_services() {
let stack = include_str!("../../swarm/stacks/platform.yml");
assert!(stack.contains("aggregate"));
}
```
---
## Milestone 15: Admin Endpoints
**Goal:** Minimal admin endpoints for the Aggregate container to support external scaling and monitoring.
### Dependencies
- Milestone 14 (Docker Swarm Deployment)
### Tasks
- [x] **15.1** Implement `/health` endpoint
- Returns container health status
- Includes: NATS connection, edge-storage connection, active aggregates count
- Used by Swarm health check and load balancer
- [x] **15.2** Implement `/ready` endpoint
- Returns readiness for receiving commands
- Checks: config loaded, NATS consumer ready, storage initialized
- [x] **15.3** Implement `/metrics` endpoint (Prometheus format)
- Expose existing metrics for scraping
- Include tenant_id labels for per-tenant visibility
- Aggregate-level metrics: command count, latency, errors, version conflicts
- [x] **15.4** Implement `/admin/tenants` endpoint (read-only)
- List tenants currently hosted on this node
- Returns: tenant_id, aggregate count, last activity timestamp
- Used by external control node for discovery
- [x] **15.5** Implement graceful drain endpoint `/admin/drain`
- POST to initiate graceful shutdown of specific tenant
- Stops consumer for tenant, waits for in-flight commands
- Returns when safe to migrate
- [x] **15.6** Implement config reload endpoint `/admin/reload`
- POST to reload tenant placement config from NATS KV
- Zero-downtime routing update
### Tests
- [x] **T15.1** Health endpoint returns status
```rust
#[tokio::test]
async fn health_endpoint() {
let server = AdminServer::new_test().await;
let resp = server.get("/health").await;
assert!(resp.status().is_success());
let health: HealthStatus = resp.json().await;
assert!(health.nats_connected);
assert!(health.storage_connected);
}
```
- [x] **T15.2** Ready endpoint checks consumers
```rust
#[tokio::test]
async fn ready_endpoint() {
let server = AdminServer::new_test().await;
let resp = server.get("/ready").await;
assert!(resp.status().is_success());
}
```
- [x] **T15.3** Metrics in Prometheus format
```rust
#[tokio::test]
async fn metrics_prometheus_format() {
let server = AdminServer::new_test().await;
let resp = server.get("/metrics").await;
let body = resp.text().await;
assert!(body.contains("aggregate_commands_total"));
assert!(body.contains("tenant_id"));
}
```
- [x] **T15.4** Tenants list returns hosted tenants
```rust
#[tokio::test]
async fn tenants_list() {
let server = AdminServer::new_test().await;
let resp = server.get("/admin/tenants").await;
let tenants: Vec<TenantInfo> = resp.json().await;
assert!(tenants.iter().any(|t| t.tenant_id == TenantId::new("test-tenant")));
}
```
- [x] **T15.5** Drain waits for in-flight commands
```rust
#[tokio::test]
async fn drain_waits() {
let server = AdminServer::new_test().await;
server.start_command_processing().await;
let start = Instant::now();
let resp = server.post("/admin/drain", json!({"tenant_id": "test-tenant"})).await;
assert!(start.elapsed() < Duration::from_secs(5));
assert!(resp.status().is_success());
}
```
- [x] **T15.6** Config reload updates routing
```rust
#[tokio::test]
async fn config_reload() {
let server = AdminServer::new_test().await;
server.update_nats_kv_config(json!({"tenants": {"new-tenant": "node-a"}})).await;
let resp = server.post("/admin/reload", json!({})).await;
assert!(resp.status().is_success());
let tenants = server.get_hosted_tenants().await;
assert!(tenants.contains(&TenantId::new("new-tenant")));
}
```
- [x] **T15.7** Tautological test: AdminServer is Send
```rust
#[test]
fn admin_server_is_send() {
fn assert_send<T: Send>() {}
assert_send::<AdminServer>();
}
```
---
## Progress Tracking
| Milestone | Status | Tests Passing |
|-----------|--------|---------------|
| 1. Project Foundation | ⬜ Not Started | ⬜ |
| 2. Core Types | ⬜ Not Started | ⬜ |
| 3. Configuration | ⬜ Not Started | ⬜ |
| 4. Storage Layer | ⬜ Not Started | ⬜ |
| 5. Event Stream | ⬜ Not Started | ⬜ |
| 6. Runtime Function | ⬜ Not Started | ⬜ |
| 7. State Machine | ⬜ Not Started | ⬜ |
| 8. Command Handler | ⬜ Not Started | ⬜ |
| 9. Observability | ⬜ Not Started | ⬜ |
| 10. Gateway Integration | ⬜ Not Started | ⬜ |
| 11. Integration Tests | ⬜ Not Started | ⬜ |
| 12. Query Engine Integration | ⬜ Not Started | ⬜ |
| 13. Container & Deployment | ⬜ Not Started | ⬜ |
| 14. Docker Swarm Deployment | ⬜ Not Started | ⬜ |
| 15. Admin Endpoints | ⬜ Not Started | ⬜ |
> **Note:** Admin UI (Web Frontend) will be implemented in a separate repository.
---
## Quick Reference
### Run all tests
```bash
cargo test --all
```
### Run tests for specific milestone
```bash
cargo test --lib types::
cargo test --lib storage::
```
### Check test coverage
```bash
cargo tarpaulin --out Html
```
### Lint check
```bash
cargo clippy --all-targets --all-features -- -D warnings
```