Files
cloudlysis/aggregate/DEVELOPMENT_PLAN.md
Vlad Durnea 1298d9a3df
Some checks failed
ci / rust (push) Failing after 2m34s
ci / ui (push) Failing after 30s
Monorepo consolidation: workspace, shared types, transport plans, docker/swam assets
2026-03-30 11:40:42 +03:00

54 KiB

Development Plan: Aggregate Container

Overview

This plan breaks down the Aggregate container implementation into milestones ordered by dependency. Each milestone includes:

  • Tasks with clear deliverables
  • Test Requirements (unit tests + tautological tests)
  • Dependencies on previous milestones

Development Approach:

  1. Complete one milestone at a time
  2. Write tests before implementation (TDD where applicable)
  3. All tests must pass before moving to next milestone
  4. Mark tasks complete with [x] as you progress

Milestone 1: Project Foundation

Goal: Set up the Rust project with proper structure, dependencies, and basic tooling.

Tasks

  • 1.1 Initialize Cargo project with workspace structure

    cargo init --name aggregate
    
    • Create src/lib.rs and src/main.rs
    • Configure Cargo.toml with madapes registry
  • 1.2 Configure Cargo.toml with all dependencies

    [registries.madapes]
    index = "sparse+https://git.madapes.com/api/packages/madapes/cargo/"
    
    [dependencies]
    edge-storage = { version = "0.1", registry = "madapes" }
    runtime-function = { version = "0.2", registry = "madapes" }
    edge-logger = { version = "0.1", registry = "madapes" }
    query-engine = { version = "0.1", registry = "madapes" }
    async-nats = "0.39"
    tokio = { version = "1", features = ["full"] }
    serde = { version = "1", features = ["derive"] }
    serde_json = "1"
    thiserror = "2"
    anyhow = "1"
    tracing = "0.1"
    tracing-subscriber = { version = "0.3", features = ["json", "env-filter"] }
    uuid = { version = "1", features = ["v7", "serde"] }
    chrono = { version = "0.4", features = ["serde"] }
    
  • 1.3 Set up project structure

    src/
    ├── lib.rs
    ├── main.rs
    ├── types/
    │   ├── mod.rs
    │   ├── id.rs
    │   ├── command.rs
    │   ├── event.rs
    │   ├── snapshot.rs
    │   └── error.rs
    ├── config/
    │   ├── mod.rs
    │   └── settings.rs
    ├── aggregate/
    │   ├── mod.rs
    │   ├── state.rs
    │   └── handler.rs
    ├── storage/
    │   └── mod.rs
    ├── stream/
    │   └── mod.rs
    └── observability/
        └── mod.rs
    
  • 1.4 Configure clippy and rustfmt

    • Create .clippy.toml and rustfmt.toml
    • Add CI-friendly lint rules

Tests

  • T1.1 Project compiles successfully

    #[test]
    fn project_compiles() {
        assert!(true);
    }
    
  • T1.2 All dependencies resolve from madapes registry

    #[test]
    fn dependencies_resolve() {
        assert!(true);
    }
    
  • T1.3 Clippy passes with no warnings

    #[test]
    fn clippy_clean() {
        assert!(true);
    }
    

Milestone 2: Core Types

Goal: Define all core domain types with full serialization support.

Dependencies

  • Milestone 1 (project structure)

Tasks

  • 2.1 Implement TenantId type

    • String-based (e.g., "acme-corp", "tenant-123")
    • Optional with default empty string for non-multi-tenant setups
    • Display, FromStr, Serialize, Deserialize
    • Type-safe wrapper
  • 2.2 Implement AggregateId type

    • UUID v7 based
    • Display, FromStr, Serialize, Deserialize
    • Type-safe wrapper
  • 2.3 Implement AggregateType enum/string

    • Represents business entity (Account, Order, etc.)
    • Serialize as string
  • 2.4 Implement Version type

    • Monotonically increasing u64
    • Initial version (0 or 1)
    • Increment operation
  • 2.5 Implement Command envelope

    • tenant_id: TenantId (extracted from x-tenant-id header)
    • command_id: UUID v7 (idempotency)
    • aggregate_id: AggregateId
    • aggregate_type: AggregateType
    • payload: serde_json::Value
    • metadata: HashMap<String, Value>
  • 2.6 Implement Event envelope

    • tenant_id: TenantId
    • event_id: UUID v7
    • aggregate_id: AggregateId
    • aggregate_type: AggregateType
    • version: Version (after this event)
    • event_type: String
    • payload: serde_json::Value
    • command_id: UUID (causation)
    • timestamp: chrono::DateTime
  • 2.7 Implement Snapshot envelope

    • tenant_id: TenantId
    • aggregate_id: AggregateId
    • aggregate_type: AggregateType
    • version: Version
    • state: serde_json::Value
    • created_at: chrono::DateTime
  • 2.8 Implement AggregateState wrapper

    • Holds current state + metadata
    • Version tracking
    • Tenant association
  • 2.9 Implement comprehensive Error enum

    • TenantAccessDenied { tenant_id: TenantId }
    • ValidationError(String)
    • VersionConflict { expected: Version, actual: Version }
    • StorageError(String)
    • StreamError(String)
    • RehydrationError(String)
    • DecideError(String)
    • ApplyError(String)
    • NotFound(AggregateId)
  • 2.10 Implement AggregateManifest type

    • Aggregate type definitions with decide/apply program references
    • Load from YAML/JSON config file
    • Validate program references exist

Tests

  • T2.1 TenantId round-trips through serialization

    #[test]
    fn tenant_id_serialization_roundtrip() {
        let id = TenantId::new("acme-corp");
        let json = serde_json::to_string(&id).unwrap();
        let decoded: TenantId = serde_json::from_str(&json).unwrap();
        assert_eq!(id, decoded);
    }
    
  • T2.2 TenantId defaults to empty string

    #[test]
    fn tenant_id_default() {
        let id = TenantId::default();
        assert!(id.is_empty());
    }
    
  • T2.3 AggregateId round-trips through serialization

    #[test]
    fn aggregate_id_serialization_roundtrip() {
        let id = AggregateId::new_v7();
        let json = serde_json::to_string(&id).unwrap();
        let decoded: AggregateId = serde_json::from_str(&json).unwrap();
        assert_eq!(id, decoded);
    }
    
  • T2.4 Version increments correctly

    #[test]
    fn version_increment() {
        let v = Version::initial();
        assert_eq!(v.as_u64(), 0);
        let v2 = v.increment();
        assert_eq!(v2.as_u64(), 1);
        assert_eq!(v.as_u64(), 0);
    }
    
  • T2.5 Command serializes/deserializes with all fields including tenant_id

    #[test]
    fn command_serialization() {
        let cmd = Command::new_test();
        let json = serde_json::to_string(&cmd).unwrap();
        let decoded: Command = serde_json::from_str(&json).unwrap();
        assert_eq!(cmd.command_id, decoded.command_id);
        assert_eq!(cmd.aggregate_id, decoded.aggregate_id);
        assert_eq!(cmd.tenant_id, decoded.tenant_id);
    }
    
  • T2.6 Event serializes/deserializes with all fields including tenant_id

    #[test]
    fn event_serialization() {
        let event = Event::new_test();
        let json = serde_json::to_string(&event).unwrap();
        let decoded: Event = serde_json::from_str(&json).unwrap();
        assert_eq!(event.event_id, decoded.event_id);
        assert_eq!(event.version, decoded.version);
        assert_eq!(event.tenant_id, decoded.tenant_id);
    }
    
  • T2.7 Snapshot serializes/deserializes with all fields including tenant_id

    #[test]
    fn snapshot_serialization() {
        let snap = Snapshot::new_test();
        let json = serde_json::to_string(&snap).unwrap();
        let decoded: Snapshot = serde_json::from_str(&json).unwrap();
        assert_eq!(snap.aggregate_id, decoded.aggregate_id);
        assert_eq!(snap.version, decoded.version);
        assert_eq!(snap.tenant_id, decoded.tenant_id);
    }
    
  • T2.8 Error variants implement Display and std::error::Error

    #[test]
    fn error_implements_traits() {
        let err = AggregateError::TenantAccessDenied { tenant_id: TenantId::new("other") };
        let _ = format!("{}", err);
        let _: &dyn std::error::Error = &err;
        assert!(true);
    }
    
  • T2.9 Tautological test: types exist and are Send + Sync

    #[test]
    fn types_are_send_sync() {
        fn assert_send_sync<T: Send + Sync>() {}
        assert_send_sync::<TenantId>();
        assert_send_sync::<AggregateId>();
        assert_send_sync::<Command>();
        assert_send_sync::<Event>();
        assert_send_sync::<Snapshot>();
        assert_send_sync::<AggregateError>();
    }
    

Milestone 3: Configuration

Goal: Implement configuration loading and validation.

Dependencies

  • Milestone 2 (core types)

Tasks

  • 3.1 Define Settings struct

    • NATS URL
    • Storage path
    • Logger socket path
    • Snapshot threshold
    • Retry limits
    • Aggregate definitions (decide/apply program refs)
    • Multi-tenancy enabled flag
    • Default tenant_id (for non-multi-tenant mode)
  • 3.2 Implement config loading from environment

    • AGGREGATE_NATS_URL
    • AGGREGATE_STORAGE_PATH
    • AGGREGATE_LOGGER_SOCKET
    • AGGREGATE_SNAPSHOT_THRESHOLD
    • AGGREGATE_MAX_RETRIES
  • 3.3 Implement config loading from YAML file

    • Support aggregate.yaml or aggregate.toml
    • Environment variables override file
  • 3.4 Implement config validation

    • Required fields present
    • Paths are valid
    • NATS URL is parseable

Tests

  • T3.1 Settings loads from environment variables

    #[test]
    fn settings_from_env() {
        std::env::set_var("AGGREGATE_NATS_URL", "nats://localhost:4222");
        let settings = Settings::from_env().unwrap();
        assert_eq!(settings.nats_url, "nats://localhost:4222");
    }
    
  • T3.2 Settings validates required fields

    #[test]
    fn settings_validation() {
        let settings = Settings::default();
        assert!(settings.validate().is_err());
    }
    
  • T3.3 Tautological test: Settings is Clone

    #[test]
    fn settings_is_clone() {
        let s = Settings::default();
        let _s2 = s.clone();
        assert!(true);
    }
    

Milestone 4: Storage Layer

Goal: Integrate edge-storage for snapshot persistence.

Dependencies

  • Milestone 2 (core types)
  • Milestone 3 (configuration)

Tasks

  • 4.1 Create StorageClient wrapper

    • Wraps edge_storage::AggregateStore
    • Async interface
    • Tenant-aware key composition
  • 4.2 Implement storage circuit breaker

    • Track consecutive failures
    • Open circuit after threshold (configurable)
    • Half-open state for recovery testing
    • Auto-close on successful operation
  • 4.3 Implement get_snapshot(tenant_id, aggregate_id) -> Option<Snapshot>

    • Query edge-storage with composite key (tenant_id, aggregate_id)
    • Deserialize to Snapshot type
    • Enforce tenant isolation
  • 4.4 Implement put_snapshot(snapshot) -> Result<(), VersionConflict>

    • Serialize Snapshot
    • Store with composite key (tenant_id, aggregate_id, version)
    • Handle VersionConflict from edge-storage
    • Enforce tenant isolation
  • 4.5 Implement delete_snapshot(tenant_id, aggregate_id)

    • For testing/cleanup
    • Tenant-scoped deletion

Tests

  • T4.1 Store and retrieve snapshot with tenant

    #[tokio::test]
    async fn store_and_retrieve_snapshot() {
        let storage = StorageClient::new_test().await;
        let snap = Snapshot::new_test_with_tenant("tenant-a");
        storage.put_snapshot(snap.clone()).await.unwrap();
        let retrieved = storage.get_snapshot(&snap.tenant_id, &snap.aggregate_id).await.unwrap();
        assert_eq!(Some(snap), retrieved);
    }
    
  • T4.2 Version conflict on duplicate version

    #[tokio::test]
    async fn version_conflict_on_duplicate() {
        let storage = StorageClient::new_test().await;
        let snap = Snapshot::new_test_with_tenant("tenant-a");
        storage.put_snapshot(snap.clone()).await.unwrap();
        let result = storage.put_snapshot(snap).await;
        assert!(matches!(result, Err(AggregateError::VersionConflict { .. })));
    }
    
  • T4.3 None returned for non-existent aggregate

    #[tokio::test]
    async fn none_for_nonexistent() {
        let storage = StorageClient::new_test().await;
        let result = storage.get_snapshot(&TenantId::new("tenant-a"), &AggregateId::new_v7()).await.unwrap();
        assert!(result.is_none());
    }
    
  • T4.4 Tenant isolation: cannot access other tenant's snapshot

    #[tokio::test]
    async fn tenant_isolation_storage() {
        let storage = StorageClient::new_test().await;
        let snap = Snapshot::new_test_with_tenant("tenant-a");
        storage.put_snapshot(snap.clone()).await.unwrap();
    
        let result = storage.get_snapshot(&TenantId::new("tenant-b"), &snap.aggregate_id).await.unwrap();
        assert!(result.is_none());
    }
    
  • T4.5 Tautological test: StorageClient is Send

    #[test]
    fn storage_client_is_send() {
        fn assert_send<T: Send>() {}
        assert_send::<StorageClient>();
    }
    

Milestone 5: Event Stream (NATS JetStream)

Goal: Integrate NATS JetStream for event persistence and consumption.

Dependencies

  • Milestone 2 (core types)
  • Milestone 3 (configuration)

Tasks

  • 5.1 Create StreamClient wrapper

    • Wraps async_nats::Client
    • JetStream context
    • Tenant-aware subject naming
  • 5.2 Implement NATS connection circuit breaker

    • Track connection failures
    • Exponential backoff on reconnect
    • Circuit open on prolonged outage
    • Health check integration for /ready endpoint
  • 5.3 Implement stream/consumer setup

    • Create stream if not exists
    • Configure retention, subjects
    • Subject pattern: tenant.<tenant_id>.aggregate.<aggregate_type>.<aggregate_id>
  • 5.4 Implement publish_events(events: Vec<Event>) -> Result<(), StreamError>

    • Publish to JetStream on tenant-namespaced subject
    • Use command_id as Nats-Msg-Id header for deduplication
    • Batch publish support
  • 5.5 Implement fetch_events(tenant_id, aggregate_id, after_version) -> Vec<Event>

    • Query events from tenant-namespaced subject
    • Filter by version > after_version
    • Ordered by version
  • 5.6 Implement subscribe_to_events(tenant_id, aggregate_id) -> impl Stream<Event>

    • Real-time subscription
    • Tenant-scoped subscription
    • For projections/sagas

Tests

  • T5.1 Publish and fetch events with tenant

    #[tokio::test]
    async fn publish_and_fetch_events() {
        let stream = StreamClient::new_test().await;
        let events = vec![Event::new_test_with_tenant("tenant-a"), Event::new_test_with_tenant("tenant-a")];
        stream.publish_events(events.clone()).await.unwrap();
        let fetched = stream.fetch_events(&TenantId::new("tenant-a"), &events[0].aggregate_id, Version::initial()).await.unwrap();
        assert_eq!(fetched.len(), 2);
    }
    
  • T5.2 Events ordered by version

    #[tokio::test]
    async fn events_ordered_by_version() {
        let stream = StreamClient::new_test().await;
        let events = create_ordered_events_with_tenant("tenant-a", 3);
        stream.publish_events(events.clone()).await.unwrap();
        let fetched = stream.fetch_events(&TenantId::new("tenant-a"), &events[0].aggregate_id, Version::initial()).await.unwrap();
        assert!(fetched.windows(2).all(|w| w[0].version < w[1].version));
    }
    
  • T5.3 Fetch with version filter

    #[tokio::test]
    async fn fetch_with_version_filter() {
        let stream = StreamClient::new_test().await;
        let events = create_ordered_events_with_tenant("tenant-a", 5);
        stream.publish_events(events.clone()).await.unwrap();
        let fetched = stream.fetch_events(&TenantId::new("tenant-a"), &events[0].aggregate_id, Version::from(2)).await.unwrap();
        assert_eq!(fetched.len(), 2);
    }
    
  • T5.4 Tenant isolation: cannot fetch other tenant's events

    #[tokio::test]
    async fn tenant_isolation_stream() {
        let stream = StreamClient::new_test().await;
        let events = vec![Event::new_test_with_tenant("tenant-a")];
        stream.publish_events(events.clone()).await.unwrap();
    
        let fetched = stream.fetch_events(&TenantId::new("tenant-b"), &events[0].aggregate_id, Version::initial()).await.unwrap();
        assert!(fetched.is_empty());
    }
    
  • T5.5 Subject naming includes tenant

    #[test]
    fn subject_naming_includes_tenant() {
        let tenant_id = TenantId::new("acme-corp");
        let aggregate_type = AggregateType::from("Account");
        let aggregate_id = AggregateId::new_v7();
    
        let subject = build_subject(&tenant_id, &aggregate_type, &aggregate_id);
        assert!(subject.starts_with("tenant.acme-corp.aggregate."));
    }
    
  • T5.6 Tautological test: StreamClient is Send + Sync

    #[test]
    fn stream_client_is_send_sync() {
        fn assert_send_sync<T: Send + Sync>() {}
        assert_send_sync::<StreamClient>();
    }
    

Milestone 6: Runtime Function Integration

Goal: Integrate runtime-function for decide and apply programs.

Dependencies

  • Milestone 2 (core types)

Tasks

  • 6.1 Create RuntimeExecutor wrapper

    • Wraps runtime_function execution
    • Program loading
  • 6.2 Implement execute_decide(state, command) -> Result<Vec<Event>, DecideError>

    • Load decide program
    • Execute with state + command
    • Parse event results
  • 6.3 Implement execute_apply(state, event) -> Result<State, ApplyError>

    • Load apply program
    • Execute with state + event
    • Return new state
  • 6.4 Implement program caching

    • Cache compiled AST
    • Cache by program hash
  • 6.5 Handle gas metering / timeouts

    • Prevent infinite loops
    • Configurable limits

Tests

  • T6.1 Decide returns events for valid command

    #[test]
    fn decide_returns_events() {
        let executor = RuntimeExecutor::new_test();
        let state = json!({"balance": 100});
        let command = json!({"type": "deposit", "amount": 50});
        let result = executor.execute_decide(&state, &command, DECIDE_PROGRAM).unwrap();
        assert!(!result.is_empty());
    }
    
  • T6.2 Decide returns error for invalid command

    #[test]
    fn decide_rejects_invalid() {
        let executor = RuntimeExecutor::new_test();
        let state = json!({"balance": 10});
        let command = json!({"type": "withdraw", "amount": 100});
        let result = executor.execute_decide(&state, &command, DECIDE_PROGRAM);
        assert!(matches!(result, Err(AggregateError::DecideError(_))));
    }
    
  • T6.3 Apply transitions state correctly

    #[test]
    fn apply_transitions_state() {
        let executor = RuntimeExecutor::new_test();
        let state = json!({"balance": 100});
        let event = json!({"type": "deposited", "amount": 50});
        let new_state = executor.execute_apply(&state, &event, APPLY_PROGRAM).unwrap();
        assert_eq!(new_state["balance"], 150);
    }
    
  • T6.4 Determinism: same input = same output

    #[test]
    fn decide_is_deterministic() {
        let executor = RuntimeExecutor::new_test();
        let state = json!({"balance": 100});
        let command = json!({"type": "deposit", "amount": 50});
        let r1 = executor.execute_decide(&state, &command, DECIDE_PROGRAM).unwrap();
        let r2 = executor.execute_decide(&state, &command, DECIDE_PROGRAM).unwrap();
        assert_eq!(r1, r2);
    }
    
  • T6.5 Tautological test: RuntimeExecutor is Send

    #[test]
    fn runtime_executor_is_send() {
        fn assert_send<T: Send>() {}
        assert_send::<RuntimeExecutor>();
    }
    

Milestone 7: Aggregate State Machine

Goal: Implement the core aggregate state machine with rehydration.

Dependencies

  • Milestone 2 (core types)
  • Milestone 6 (runtime function)

Tasks

  • 7.1 Implement AggregateInstance struct

    • Holds current state
    • Tracks version
    • References decide/apply programs
    • Holds tenant_id for tenant association
  • 7.2 Implement rehydrate(tenant_id, snapshot, events) -> AggregateInstance

    • Validate tenant_id matches snapshot and events
    • Apply events sequentially
    • Track final version
  • 7.3 Implement handle_command(command) -> Result<Vec<Event>, AggregateError>

    • Validate command.tenant_id matches instance tenant_id
    • Return TenantAccessDenied on mismatch
    • Execute decide
    • Generate event envelopes (with tenant_id)
    • Update internal state
  • 7.4 Implement apply_event(event)

    • Internal state update
    • Version increment
    • Validate event tenant_id

Tests

  • T7.1 Rehydrate from snapshot only

    #[test]
    fn rehydrate_from_snapshot() {
        let snap = Snapshot { tenant_id: TenantId::new("tenant-a"), version: Version::from(5), state: json!({"balance": 100}), .. };
        let agg = AggregateInstance::rehydrate(TenantId::new("tenant-a"), snap, vec![]);
        assert_eq!(agg.version(), Version::from(5));
        assert_eq!(agg.state()["balance"], 100);
    }
    
  • T7.2 Rehydrate from snapshot + events

    #[test]
    fn rehydrate_from_snapshot_and_events() {
        let snap = Snapshot { tenant_id: TenantId::new("tenant-a"), version: Version::from(5), state: json!({"balance": 100}), .. };
        let events = vec![
            Event { tenant_id: TenantId::new("tenant-a"), version: Version::from(6), payload: json!({"type": "deposited", "amount": 50}), .. },
        ];
        let agg = AggregateInstance::rehydrate(TenantId::new("tenant-a"), snap, events);
        assert_eq!(agg.version(), Version::from(6));
        assert_eq!(agg.state()["balance"], 150);
    }
    
  • T7.3 Rehydrate rejects mismatched tenant_id

    #[test]
    fn rehydrate_rejects_tenant_mismatch() {
        let snap = Snapshot { tenant_id: TenantId::new("tenant-a"), version: Version::from(5), state: json!({}), .. };
        let result = AggregateInstance::try_rehydrate(TenantId::new("tenant-b"), snap, vec![]);
        assert!(matches!(result, Err(AggregateError::TenantAccessDenied { .. })));
    }
    
  • T7.4 Handle command produces events with tenant_id

    #[test]
    fn handle_command_produces_events() {
        let mut agg = AggregateInstance::new_test_with_tenant("tenant-a");
        let cmd = Command { tenant_id: TenantId::new("tenant-a"), payload: json!({"type": "deposit", "amount": 50}), .. };
        let events = agg.handle_command(cmd).unwrap();
        assert!(!events.is_empty());
        assert_eq!(events[0].tenant_id, TenantId::new("tenant-a"));
        assert_eq!(agg.state()["balance"], 50);
    }
    
  • T7.5 Handle command rejects tenant mismatch

    #[test]
    fn handle_command_rejects_tenant_mismatch() {
        let mut agg = AggregateInstance::new_test_with_tenant("tenant-a");
        let cmd = Command { tenant_id: TenantId::new("tenant-b"), payload: json!({"type": "deposit", "amount": 50}), .. };
        let result = agg.handle_command(cmd);
        assert!(matches!(result, Err(AggregateError::TenantAccessDenied { .. })));
    }
    
  • T7.6 Version increments after command

    #[test]
    fn version_increments_after_command() {
        let mut agg = AggregateInstance::new_test_with_tenant("tenant-a");
        let initial = agg.version();
        let cmd = Command::new_test_deposit_with_tenant("tenant-a", 50);
        agg.handle_command(cmd).unwrap();
        assert_eq!(agg.version(), initial.increment());
    }
    
  • T7.7 Tautological test: AggregateInstance tracks aggregate_id and tenant_id

    #[test]
    fn aggregate_instance_has_id_and_tenant() {
        let agg = AggregateInstance::new_test_with_tenant("tenant-a");
        let _ = agg.aggregate_id();
        let _ = agg.tenant_id();
        assert!(true);
    }
    

Milestone 8: Command Handler (Full Lifecycle)

Goal: Implement the complete command handling lifecycle with persistence.

Dependencies

  • Milestone 4 (storage)
  • Milestone 5 (stream)
  • Milestone 7 (state machine)

Tasks

  • 8.1 Implement AggregateHandler struct

    • Holds StorageClient, StreamClient, RuntimeExecutor
    • Per-aggregate-type configuration
  • 8.2 Implement handle_command(command) -> Result<Vec<Event>, AggregateError>

    • Validate tenant_id from command
    • Load snapshot from storage using (tenant_id, aggregate_id)
    • Fetch events since snapshot from tenant-namespaced subject
    • Rehydrate with tenant validation
    • Execute decide
    • Persist events to JetStream on tenant subject
    • Store new snapshot with tenant_id in composite key
    • Handle VersionConflict with retry
  • 8.3 Implement tenant validation

    • Extract tenant_id from command
    • Validate tenant_id is not empty (if multi-tenancy required)
    • Enforce tenant_id consistency across snapshot, events, and command
    • Return TenantAccessDenied on any mismatch
  • 8.4 Implement retry-on-conflict logic

    • Configurable max retries
    • Exponential backoff option
  • 8.5 Implement snapshot threshold

    • Only store snapshot every N events
    • Track events since last snapshot

Tests

  • T8.1 Full command lifecycle with tenant

    #[tokio::test]
    async fn full_command_lifecycle() {
        let handler = AggregateHandler::new_test().await;
        let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
        let events = handler.handle_command(cmd.clone()).await.unwrap();
        assert!(!events.is_empty());
    
        let snap = handler.storage().get_snapshot(&cmd.tenant_id, &cmd.aggregate_id).await.unwrap();
        assert!(snap.is_some());
    }
    
  • T8.2 Rehydration from persisted state with tenant

    #[tokio::test]
    async fn rehydration_from_persisted() {
        let handler = AggregateHandler::new_test().await;
        let cmd1 = Command::new_test_deposit_with_tenant("tenant-a", 100);
        handler.handle_command(cmd1.clone()).await.unwrap();
    
        let cmd2 = Command { tenant_id: cmd1.tenant_id.clone(), aggregate_id: cmd1.aggregate_id, payload: json!({"type": "deposit", "amount": 50}), .. };
        handler.handle_command(cmd2).await.unwrap();
    
        let snap = handler.storage().get_snapshot(&cmd1.tenant_id, &cmd1.aggregate_id).await.unwrap().unwrap();
        assert!(snap.version.as_u64() >= 2);
    }
    
  • T8.3 Tenant isolation in handler

    #[tokio::test]
    async fn tenant_isolation_handler() {
        let handler = AggregateHandler::new_test().await;
    
        let cmd_a = Command::new_test_deposit_with_tenant("tenant-a", 100);
        let aggregate_id = cmd_a.aggregate_id.clone();
        handler.handle_command(cmd_a).await.unwrap();
    
        let cmd_b = Command { tenant_id: TenantId::new("tenant-b"), aggregate_id, payload: json!({"type": "deposit", "amount": 50}), .. };
        let result = handler.handle_command(cmd_b).await;
    
        assert!(matches!(result, Err(AggregateError::TenantAccessDenied { .. })));
    }
    
  • T8.4 Retry on version conflict

    #[tokio::test]
    async fn retry_on_conflict() {
        let handler = AggregateHandler::new_test().await;
    
        let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
        let id = cmd.aggregate_id.clone();
    
        let h1 = handler.clone();
        let h2 = handler.clone();
    
        let c1 = cmd.clone();
        let c2 = cmd.clone();
    
        let (r1, r2) = tokio::join!(
            async { h1.handle_command(c1).await },
            async { h2.handle_command(c2).await }
        );
    
        assert!(r1.is_ok() || r2.is_ok());
    }
    
  • T8.5 Snapshot threshold respected

    #[tokio::test]
    async fn snapshot_threshold() {
        let handler = AggregateHandler::new_test_with_threshold(3).await;
        let id = AggregateId::new_v7();
        let tenant_id = TenantId::new("tenant-a");
    
        for i in 0..5 {
            let cmd = Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 10}), .. };
            handler.handle_command(cmd).await.unwrap();
        }
    
        let snap = handler.storage().get_snapshot(&tenant_id, &id).await.unwrap().unwrap();
        assert!(snap.version.as_u64() % 3 == 0 || snap.version.as_u64() == 5);
    }
    
  • T8.6 Empty tenant_id allowed for non-multi-tenant mode

    #[tokio::test]
    async fn empty_tenant_allowed() {
        let handler = AggregateHandler::new_test_non_tenant().await;
        let cmd = Command::new_test_deposit_with_tenant("", 100);
        let result = handler.handle_command(cmd).await;
        assert!(result.is_ok());
    }
    
  • T8.7 Tautological test: Handler is Clone

    #[test]
    fn handler_is_clone() {
        fn assert_clone<T: Clone>() {}
        assert_clone::<AggregateHandler>();
    }
    

Milestone 9: Observability

Goal: Integrate edge-logger and metrics for production observability.

Dependencies

  • Milestone 8 (command handler)

Tasks

  • 9.1 Initialize edge-logger client

    • UDS socket connection
    • Service name, environment
  • 9.2 Add tracing spans for command handling

    • Span per command
    • Include aggregate_id, command_id, version, tenant_id
  • 9.3 Add metrics collection

    • aggregate_commands_total (counter, labeled by aggregate_type, tenant_id)
    • aggregate_command_duration_seconds (histogram)
    • aggregate_version_conflicts_total (counter)
    • aggregate_rehydration_duration_seconds (histogram)
    • aggregate_tenant_errors_total (counter for TenantAccessDenied)
  • 9.4 Add structured logging

    • Command received
    • Events produced
    • Errors with context
  • 9.5 Implement /metrics endpoint

    • Prometheus format
    • For Victoria Metrics scraping
  • 9.6 Include correlation and trace context in observability fields

    • Extract x-correlation-id and traceparent from Gateway-propagated request metadata
    • Record correlation_id and trace_id in spans/log fields for command handling and event production

Tests

  • T9.1 Metrics are recorded

    #[tokio::test]
    async fn metrics_recorded() {
        let handler = AggregateHandler::new_test_with_metrics().await;
        let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
        handler.handle_command(cmd).await.unwrap();
    
        let metrics = handler.metrics_export();
        assert!(metrics.contains("aggregate_commands_total"));
    }
    
  • T9.2 Spans include required fields including tenant_id

    #[test]
    fn spans_include_fields() {
        let span = tracing::info_span!("command", aggregate_id = %AggregateId::new_v7(), tenant_id = %"tenant-a");
        assert!(span.metadata().is_some());
    }
    
  • T9.3 Tautological test: Logger initializes

    #[test]
    fn logger_initializes() {
        let _ = edge_logger_client::Logger::builder()
            .socket_path("/tmp/test.sock".into())
            .service("aggregate".into())
            .environment("test".into())
            .build();
        assert!(true);
    }
    

Milestone 10: Gateway Integration

Goal: Implement the interface for receiving commands from the Gateway.

Dependencies

  • Milestone 8 (command handler)
  • Milestone 9 (observability)

Tasks

  • 10.1 Define command ingestion protocol

    • gRPC with protobuf definitions
    • Command service definition (SubmitCommand rpc)
    • x-tenant-id metadata specification
    • Error status code mapping (InvalidArgument, PermissionDenied, Internal)
    • Correlation/trace metadata specification (x-correlation-id, traceparent)
  • 10.2 Implement x-tenant-id extraction

    • Extract tenant_id from x-tenant-id HTTP header
    • Default to empty string if header not present (backward compatibility)
    • Validate tenant_id format (alphanumeric, hyphens, underscores)
    • Add tenant_id to Command envelope
  • 10.3 Implement tenant-aware routing

    • Use tenant_id to route commands to appropriate Aggregate nodes
    • Support consistent hashing on tenant_id for sharding
    • Gateway routes to correct shard based on x-tenant-id
  • 10.4 Implement command server

    • Receive commands from Gateway
    • Parse and validate (including tenant_id)
    • Route to AggregateHandler with tenant context
  • 10.5 Implement response types

    • Success with events
  • 10.6 Propagate correlation and trace context into produced events

    • Ensure events emitted downstream include correlation/trace context (message headers and/or envelope metadata) so Projection and Runner can log/trace the same flow
    • Validation error (including invalid tenant_id)
    • TenantAccessDenied error
    • System error
  • 10.6 Implement health check endpoint

    • /health for orchestration
    • Storage/stream connectivity check

Tests

  • T10.1 Server accepts valid command with tenant

    #[tokio::test]
    async fn server_accepts_command_with_tenant() {
        let server = CommandServer::new_test().await;
        let cmd = Command::new_test_deposit_with_tenant("acme-corp", 100);
        let response = server.handle(cmd).await;
        assert!(response.is_ok());
    }
    
  • T10.2 x-tenant-id header extracted correctly

    #[tokio::test]
    async fn x_tenant_id_header_extracted() {
        let server = CommandServer::new_test().await;
        let response = server.handle_with_headers(
            json!({"type": "deposit", "amount": 100}),
            vec![("x-tenant-id", "acme-corp")]
        ).await;
        assert!(response.is_ok());
        assert_eq!(response.unwrap().tenant_id, TenantId::new("acme-corp"));
    }
    
  • T10.3 Missing x-tenant-id defaults to empty

    #[tokio::test]
    async fn missing_tenant_defaults_empty() {
        let server = CommandServer::new_test().await;
        let response = server.handle_with_headers(
            json!({"type": "deposit", "amount": 100}),
            vec![]
        ).await;
        assert!(response.is_ok());
        assert_eq!(response.unwrap().tenant_id, TenantId::default());
    }
    
  • T10.4 Invalid tenant_id format rejected

    #[tokio::test]
    async fn invalid_tenant_id_rejected() {
        let server = CommandServer::new_test().await;
        let response = server.handle_with_headers(
            json!({"type": "deposit", "amount": 100}),
            vec![("x-tenant-id", "invalid@tenant!")]
        ).await;
        assert!(matches!(response, Err(ServerError::InvalidTenantId)));
    }
    
  • T10.5 Server rejects malformed command

    #[tokio::test]
    async fn server_rejects_malformed() {
        let server = CommandServer::new_test().await;
        let response = server.handle_raw(json!({"invalid": true})).await;
        assert!(response.is_err());
    }
    
  • T10.6 Health check returns status

    #[tokio::test]
    async fn health_check() {
        let server = CommandServer::new_test().await;
        let health = server.health_check().await;
        assert!(health.healthy);
    }
    
  • T10.7 TenantAccessDenied propagated in response

    #[tokio::test]
    async fn tenant_access_denied_propagated() {
        let server = CommandServer::new_test().await;
        let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
        server.handle(cmd.clone()).await.unwrap();
    
        let cmd_cross = Command { tenant_id: TenantId::new("tenant-b"), ..cmd };
        let response = server.handle(cmd_cross).await;
        assert!(matches!(response, Err(ServerError::TenantAccessDenied)));
    }
    
  • T10.8 Tautological test: Server binds to address

    #[test]
    fn server_binds() {
        let addr = "127.0.0.1:8080".parse().unwrap();
        let _ = std::net::TcpListener::bind(addr);
        assert!(true);
    }
    

Milestone 11: Integration Tests

Goal: Comprehensive integration test suite.

Status: Complete - 19 integration tests passing covering storage, runtime, health, circuit breaker, tenant isolation, and concurrency.

Dependencies

  • All previous milestones

Tasks

  • 11.1 Set up test fixtures

    • Embedded NATS server
    • Temp directory for storage
    • Mock runtime-function programs
    • Multi-tenant test helpers
  • 11.2 Test: Concurrent commands to same aggregate (single tenant)

    #[tokio::test]
    async fn concurrent_commands_same_aggregate() {
        let handler = AggregateHandler::new_test().await;
        let id = AggregateId::new_v7();
        let tenant_id = TenantId::new("tenant-a");
    
        let mut handles = vec![];
        for _ in 0..10 {
            let h = handler.clone();
            let id = id.clone();
            let tid = tenant_id.clone();
            handles.push(tokio::spawn(async move {
                let cmd = Command { tenant_id: tid, aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 10}), .. };
                h.handle_command(cmd).await
            }));
        }
    
        let results: Vec<_> = futures::future::join_all(handles).await;
        let successes = results.iter().filter(|r| r.as_ref().map(|r| r.is_ok()).unwrap_or(false)).count();
        assert_eq!(successes, 10);
    }
    
  • 11.3 Test: Event ordering guaranteed

    #[tokio::test]
    async fn event_ordering_guaranteed() {
        let handler = AggregateHandler::new_test().await;
        let id = AggregateId::new_v7();
        let tenant_id = TenantId::new("tenant-a");
    
        for i in 0..10 {
            let cmd = Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 10}), .. };
            handler.handle_command(cmd).await.unwrap();
        }
    
        let events = handler.stream().fetch_events(&tenant_id, &id, Version::initial()).await.unwrap();
        for (i, e) in events.iter().enumerate() {
            assert_eq!(e.version.as_u64() as usize, i + 1);
        }
    }
    
  • 11.4 Test: Idempotency via command_id

    #[tokio::test]
    async fn idempotency_via_command_id() {
        let handler = AggregateHandler::new_test().await;
        let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
    
        let r1 = handler.handle_command(cmd.clone()).await.unwrap();
        let r2 = handler.handle_command(cmd).await.unwrap();
    
        assert_eq!(r1.len(), r2.len());
    }
    
  • 11.5 Test: System failure recovery

    #[tokio::test]
    async fn system_failure_recovery() {
        let handler = AggregateHandler::new_test().await;
        let cmd = Command::new_test_deposit_with_tenant("tenant-a", 100);
        handler.handle_command(cmd.clone()).await.unwrap();
    
        drop(handler);
    
        let handler2 = AggregateHandler::new_test().await;
        let events = handler2.stream().fetch_events(&cmd.tenant_id, &cmd.aggregate_id, Version::initial()).await.unwrap();
        assert!(!events.is_empty());
    }
    
  • 11.6 Test: Full bank account scenario

    #[tokio::test]
    async fn full_bank_account_scenario() {
        let handler = AggregateHandler::new_test().await;
        let id = AggregateId::new_v7();
        let tenant_id = TenantId::new("tenant-a");
    
        handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "open_account", "initial_balance": 0}), .. }).await.unwrap();
        handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
        handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 50}), .. }).await.unwrap();
        handler.handle_command(Command { tenant_id: tenant_id.clone(), aggregate_id: id.clone(), payload: json!({"type": "withdraw", "amount": 75}), .. }).await.unwrap();
    
        let snap = handler.storage().get_snapshot(&tenant_id, &id).await.unwrap().unwrap();
        assert_eq!(snap.state["balance"], 75);
    }
    
  • 11.7 Test: Tenant isolation end-to-end

    #[tokio::test]
    async fn tenant_isolation_e2e() {
        let handler = AggregateHandler::new_test().await;
        let id = AggregateId::new_v7();
    
        handler.handle_command(Command { tenant_id: TenantId::new("tenant-a"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
    
        let result = handler.handle_command(Command { tenant_id: TenantId::new("tenant-b"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 50}), .. }).await;
        assert!(matches!(result, Err(AggregateError::TenantAccessDenied)));
    }
    
  • 11.8 Test: Multiple tenants same aggregate_id

    #[tokio::test]
    async fn multiple_tenants_same_aggregate_id() {
        let handler = AggregateHandler::new_test().await;
        let id = AggregateId::new_v7();
    
        handler.handle_command(Command { tenant_id: TenantId::new("tenant-a"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
        handler.handle_command(Command { tenant_id: TenantId::new("tenant-b"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 200}), .. }).await.unwrap();
    
        let snap_a = handler.storage().get_snapshot(&TenantId::new("tenant-a"), &id).await.unwrap().unwrap();
        let snap_b = handler.storage().get_snapshot(&TenantId::new("tenant-b"), &id).await.unwrap().unwrap();
    
        assert_eq!(snap_a.state["balance"], 100);
        assert_eq!(snap_b.state["balance"], 200);
    }
    
  • 11.9 Test: NATS subject namespacing enforced

    #[tokio::test]
    async fn nats_subject_namespacing() {
        let handler = AggregateHandler::new_test().await;
        let id = AggregateId::new_v7();
    
        handler.handle_command(Command { tenant_id: TenantId::new("acme-corp"), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
    
        let subjects = handler.stream().list_subjects_for_tenant(&TenantId::new("acme-corp")).await;
        assert!(subjects.iter().all(|s| s.starts_with("tenant.acme-corp.")));
    }
    
  • 11.10 Test: Non-multi-tenant mode (empty tenant_id)

    #[tokio::test]
    async fn non_multi_tenant_mode() {
        let handler = AggregateHandler::new_test_non_tenant().await;
        let id = AggregateId::new_v7();
    
        handler.handle_command(Command { tenant_id: TenantId::default(), aggregate_id: id.clone(), payload: json!({"type": "deposit", "amount": 100}), .. }).await.unwrap();
    
        let snap = handler.storage().get_snapshot(&TenantId::default(), &id).await.unwrap();
        assert!(snap.is_some());
    }
    

Milestone 12: Query Engine Integration

Goal: Integrate query-engine for filtering and querying aggregate state via UQF.

Dependencies

  • Milestone 8 (runtime-function integration)
  • Milestone 10 (Gateway Integration)

Tasks

  • 12.1 Create QueryClient wrapper

    • Wraps query_engine crate
    • Tenant-aware query context
    • Connection to query-engine service or embedded mode
  • 12.2 Implement aggregate state projection

    • Project aggregate state to query-engine on event publish
    • Include tenant_id in projection metadata
    • Configurable projection filters
  • 12.3 Implement query API endpoint

    • Query aggregate state by UQF filters
    • Tenant-scoped queries (filter by tenant_id)
    • Pagination support
  • 12.4 Implement subscription queries

    • Real-time updates when aggregate state changes
    • Tenant-scoped subscriptions
    • NATS-based notification

Tests

  • T12.1 Query returns correct aggregate state

    #[tokio::test]
    async fn query_aggregate_state() {
        let handler = AggregateHandler::new_test().await;
        handler.handle_command(Command::new_test_deposit_with_tenant("tenant-a", 100)).await.unwrap();
    
        let results = handler.query_client()
            .query(&TenantId::new("tenant-a"), "balance > 50")
            .await
            .unwrap();
        assert!(!results.is_empty());
    }
    
  • T12.2 Query respects tenant isolation

    #[tokio::test]
    async fn query_tenant_isolation() {
        let handler = AggregateHandler::new_test().await;
        handler.handle_command(Command::new_test_deposit_with_tenant("tenant-a", 100)).await.unwrap();
        handler.handle_command(Command::new_test_deposit_with_tenant("tenant-b", 200)).await.unwrap();
    
        let results_a = handler.query_client()
            .query(&TenantId::new("tenant-a"), "balance > 0")
            .await
            .unwrap();
        let results_b = handler.query_client()
            .query(&TenantId::new("tenant-b"), "balance > 0")
            .await
            .unwrap();
    
        assert_eq!(results_a.len(), 1);
        assert_eq!(results_b.len(), 1);
        assert_ne!(results_a[0].state["balance"], results_b[0].state["balance"]);
    }
    

Milestone 13: Container & Deployment

Goal: Package as container and prepare for deployment.

Dependencies

  • Milestone 11 (Integration)
  • Milestone 12 (Query Engine Integration)

Tasks

  • 12.1 Create docker/Dockerfile.rust

    • Multi-stage build
    • Minimal runtime image
    • Health check
  • 12.2 Create docker-compose.yml for local dev

    • Aggregate container
    • NATS server
    • Optional: Grafana, Victoria Metrics, Loki
  • 12.3 Create container entrypoint

    • Config loading
    • Graceful shutdown on SIGTERM
    • Wait for in-flight commands to complete
    • Drain NATS consumers before exit
    • Timeout-based forced shutdown
  • 12.4 Document environment variables

  • 12.5 Create release build optimization

    • LTO, strip, single codegen unit

Tests

  • T13.1 Container builds successfully

    docker build -f docker/Dockerfile.rust --build-arg PACKAGE=aggregate --build-arg BIN=aggregate -t cloudlysis/aggregate:local .
    docker run cloudlysis/aggregate:local --help
    
  • T13.2 Container starts with valid config

    docker run -e AGGREGATE_NATS_URL=nats://nats:4222 cloudlysis/aggregate:local
    
  • T13.3 Tautological test: Binary exists

    #[test]
    fn binary_exists() {
        assert!(std::env::current_exe().is_ok());
    }
    

Milestone 14: Docker Swarm Deployment

Goal: Configure Aggregate for Docker Swarm deployment with tenant-based sharding and horizontal scaling.

Dependencies

  • Milestone 13 (Container & Deployment)

Tasks

  • 14.1 Create Swarm stack definition (swarm/stacks/platform.yml)

    • Service definition with placement constraints
    • Tenant range label support (tenant_range)
    • Replicas configuration
    • Resource limits (CPU, memory)
    • Health check integration
  • 14.2 Set up NATS KV client for cluster config

    • Connect to NATS JetStream KV bucket (TENANT_PLACEMENT)
    • Watch for config changes
    • Initial config load on startup
    • Fallback to local config if KV unavailable
    • Consistent hashing for tenant_id → node mapping
    • Configurable number of virtual nodes per physical node
    • Ring rebalancing when nodes added/removed
  • 14.3 Create tenant placement configuration

    • JSON/YAML config: tenant_idnode_id / tenant_range
    • Hot-reload support for routing updates
    • Persisted in NATS KV for cluster-wide consistency
  • 14.4 Implement Swarm placement constraint generator

    • Generate --constraint node.labels.tenant_range==<range> from config
    • Support dynamic constraint updates
  • 14.5 Create Gateway routing configuration

    • Tenant → service endpoint mapping
    • Load balancer integration (traefik/nginx)
    • Route updates without Gateway restart
  • 14.6 Implement graceful tenant migration

    • Drain consumer for tenant before migration
    • Data copy verification
    • Routing table atomic swap
    • Resume consumer on new node

Tests

  • T14.1 Stack file valid

    docker stack config -c swarm/stacks/platform.yml
    
  • T14.2 Hash ring distributes tenants evenly

    #[test]
    fn hash_ring_distribution() {
        let ring = HashRing::new(vec!["node-a", "node-b", "node-c"], 100);
        let tenants: Vec<_> = (0..300).map(|i| format!("tenant-{}", i)).collect();
        let distribution: HashMap<_, _> = tenants.iter()
            .map(|t| (ring.get_node(t), 1))
            .fold(HashMap::new(), |mut acc, (node, _)| {
                *acc.entry(node).or_insert(0) += 1;
                acc
            });
    
        let counts: Vec<_> = distribution.values().collect();
        let max = *counts.iter().max().unwrap();
        let min = *counts.iter().min().unwrap();
        assert!(max - min <= 30, "Distribution too uneven: {:?}", distribution);
    }
    
  • T14.3 Tenant placement config loads

    #[test]
    fn tenant_placement_config() {
        let config = TenantPlacementConfig::from_yaml(r#"
        tenants:
          acme-corp: node-a
          globex: node-b
        "#);
        assert_eq!(config.get_node(&TenantId::new("acme-corp")), Some("node-a"));
    }
    
  • T14.4 Placement constraint generated correctly

    #[test]
    fn placement_constraint() {
        let gen = ConstraintGenerator::new();
        let constraints = gen.generate(&TenantRange::new("a", "m"));
        assert!(constraints.contains(&"node.labels.tenant_range==a-m".to_string()));
    }
    
  • T14.5 Hash ring rebalances on node add

    #[test]
    fn ring_rebalance_on_add() {
        let mut ring = HashRing::new(vec!["node-a", "node-b"], 100);
        let before = ring.get_node("tenant-x");
        ring.add_node("node-c");
        let after = ring.get_node("tenant-x");
        assert!(before != after || before == "node-c");
    }
    
  • T14.6 Tautological test: Stack services count

    #[test]
    fn stack_has_services() {
        let stack = include_str!("../../swarm/stacks/platform.yml");
        assert!(stack.contains("aggregate"));
    }
    

Milestone 15: Admin Endpoints

Goal: Minimal admin endpoints for the Aggregate container to support external scaling and monitoring.

Dependencies

  • Milestone 14 (Docker Swarm Deployment)

Tasks

  • 15.1 Implement /health endpoint

    • Returns container health status
    • Includes: NATS connection, edge-storage connection, active aggregates count
    • Used by Swarm health check and load balancer
  • 15.2 Implement /ready endpoint

    • Returns readiness for receiving commands
    • Checks: config loaded, NATS consumer ready, storage initialized
  • 15.3 Implement /metrics endpoint (Prometheus format)

    • Expose existing metrics for scraping
    • Include tenant_id labels for per-tenant visibility
    • Aggregate-level metrics: command count, latency, errors, version conflicts
  • 15.4 Implement /admin/tenants endpoint (read-only)

    • List tenants currently hosted on this node
    • Returns: tenant_id, aggregate count, last activity timestamp
    • Used by external control node for discovery
  • 15.5 Implement graceful drain endpoint /admin/drain

    • POST to initiate graceful shutdown of specific tenant
    • Stops consumer for tenant, waits for in-flight commands
    • Returns when safe to migrate
  • 15.6 Implement config reload endpoint /admin/reload

    • POST to reload tenant placement config from NATS KV
    • Zero-downtime routing update

Tests

  • T15.1 Health endpoint returns status

    #[tokio::test]
    async fn health_endpoint() {
        let server = AdminServer::new_test().await;
        let resp = server.get("/health").await;
        assert!(resp.status().is_success());
        let health: HealthStatus = resp.json().await;
        assert!(health.nats_connected);
        assert!(health.storage_connected);
    }
    
  • T15.2 Ready endpoint checks consumers

    #[tokio::test]
    async fn ready_endpoint() {
        let server = AdminServer::new_test().await;
        let resp = server.get("/ready").await;
        assert!(resp.status().is_success());
    }
    
  • T15.3 Metrics in Prometheus format

    #[tokio::test]
    async fn metrics_prometheus_format() {
        let server = AdminServer::new_test().await;
        let resp = server.get("/metrics").await;
        let body = resp.text().await;
        assert!(body.contains("aggregate_commands_total"));
        assert!(body.contains("tenant_id"));
    }
    
  • T15.4 Tenants list returns hosted tenants

    #[tokio::test]
    async fn tenants_list() {
        let server = AdminServer::new_test().await;
        let resp = server.get("/admin/tenants").await;
        let tenants: Vec<TenantInfo> = resp.json().await;
        assert!(tenants.iter().any(|t| t.tenant_id == TenantId::new("test-tenant")));
    }
    
  • T15.5 Drain waits for in-flight commands

    #[tokio::test]
    async fn drain_waits() {
        let server = AdminServer::new_test().await;
        server.start_command_processing().await;
    
        let start = Instant::now();
        let resp = server.post("/admin/drain", json!({"tenant_id": "test-tenant"})).await;
        assert!(start.elapsed() < Duration::from_secs(5));
        assert!(resp.status().is_success());
    }
    
  • T15.6 Config reload updates routing

    #[tokio::test]
    async fn config_reload() {
        let server = AdminServer::new_test().await;
        server.update_nats_kv_config(json!({"tenants": {"new-tenant": "node-a"}})).await;
    
        let resp = server.post("/admin/reload", json!({})).await;
        assert!(resp.status().is_success());
    
        let tenants = server.get_hosted_tenants().await;
        assert!(tenants.contains(&TenantId::new("new-tenant")));
    }
    
  • T15.7 Tautological test: AdminServer is Send

    #[test]
    fn admin_server_is_send() {
        fn assert_send<T: Send>() {}
        assert_send::<AdminServer>();
    }
    

Progress Tracking

Milestone Status Tests Passing
1. Project Foundation Not Started
2. Core Types Not Started
3. Configuration Not Started
4. Storage Layer Not Started
5. Event Stream Not Started
6. Runtime Function Not Started
7. State Machine Not Started
8. Command Handler Not Started
9. Observability Not Started
10. Gateway Integration Not Started
11. Integration Tests Not Started
12. Query Engine Integration Not Started
13. Container & Deployment Not Started
14. Docker Swarm Deployment Not Started
15. Admin Endpoints Not Started

Note: Admin UI (Web Frontend) will be implemented in a separate repository.


Quick Reference

Run all tests

cargo test --all

Run tests for specific milestone

cargo test --lib types::
cargo test --lib storage::

Check test coverage

cargo tarpaulin --out Html

Lint check

cargo clippy --all-targets --all-features -- -D warnings