Files
cloudlysis/aggregate/prd.md
Vlad Durnea 1298d9a3df
Some checks failed
ci / rust (push) Failing after 2m34s
ci / ui (push) Failing after 30s
Monorepo consolidation: workspace, shared types, transport plans, docker/swam assets
2026-03-30 11:40:42 +03:00

14 KiB

🧱 Component: Aggregate

Definition:
The Aggregate is a standalone Rust-based container that serves as the primary consistency boundary and decision-making unit of the system. It is a stateful entity that encapsulates business logic, enforces invariants, and ensures that all changes to the system are valid according to defined rules. Commands are received from users through a Gateway, and events are stored on NATS JetStream; edge-storage AggregateStore holds versioned snapshots for efficient rehydration.

Multi-Tenancy:
The Aggregate supports optional multi-tenancy via tenant_id. When enabled:

  • Routing: The Gateway routes commands to Aggregate nodes based on the x-tenant-id header
  • Sharding: Aggregate instances are sharded across nodes by tenant_id, ensuring tenant data isolation
  • Storage: Snapshots and events are namespaced by tenant_id to prevent cross-tenant access
  • Subject Naming: NATS subjects include tenant_id (e.g., tenant.<tenant_id>.aggregate.<aggregate_type>.<aggregate_id>)
  • Backward Compatibility: Aggregates without multi-tenancy use a default/empty tenant_id

Dependencies:

  • Core crates pulled from the custom Cargo registry:

    [registries.madapes]
    index = "sparse+https://git.madapes.com/api/packages/madapes/cargo/"
    
    Crate Purpose
    edge-storage libmdbx-backed AggregateStore for versioned snapshots
    runtime-function Deterministic DAG execution for decide/apply programs
    edge-logger High-performance logging (UDS + Protobuf, Loki sink)
    query-engine UQF query support for filtering/querying aggregate state
    async-nats NATS JetStream client for event streaming
  • Source code available at ../../madapes/

  • Note: This is a standalone container — it does not use framework-bus or framework-aggregate (those serve a different system)

Observability:

  • Production stack: Grafana + Victoria Metrics + Loki
  • edge-logger provides structured logging via Unix Domain Sockets with lock-free batching
  • Metrics exposed via metrics-exporter-prometheus for Victoria Metrics scraping
  • Traces/logs flow to Loki with cardinality protection and multi-tenant isolation

1. Core Responsibilities

  • Command Validation: Receives intent (Commands) from the Gateway and uses runtime-function DAG programs to determine if the intent is valid based on the current state.
  • State Rehydration: Reconstructs its internal state by loading the latest snapshot from edge-storage AggregateStore (get_latest_snapshot) and replaying any subsequent events from NATS JetStream.
  • Event Production: Transforms valid commands into one or more Events that represent a "fact" that has occurred.
  • Atomic Persistence: Publishes new events to NATS JetStream and stores an updated snapshot in edge-storage AggregateStore (put_snapshot_sync).
  • Concurrency Control: Protects against "lost updates" using version-based optimistic locking. edge-storage AggregateStore returns VersionConflict for duplicate versions.

2. The Lifecycle of a Command

  1. Reception: The Gateway routes a Command from a user to the Aggregate container based on the aggregate_id and x-tenant-id header. The tenant_id is extracted and included in the Command envelope for tenant-aware processing.
  2. Loading (Rehydration):
    • The Aggregate fetches the latest Snapshot from edge-storage AggregateStore using the composite key (tenant_id, aggregate_id).
    • It reads any Events from NATS JetStream (tenant-namespaced subject) that occurred after the snapshot version.
    • It applies these events sequentially to the snapshot state using the deterministic apply runtime-function program to reach the "Current State."
  3. Execution:
    • The Aggregate passes the Current State and the Command to the decide runtime-function program.
    • If invalid: Returns an Error (Command Rejected).
    • If valid: Returns a list of New Events.
  4. Persistence (The Commit):
    • The Aggregate publishes New Events to NATS JetStream on tenant-namespaced subjects, with command_id mapped to idempotency_key.
    • It stores an updated snapshot in edge-storage AggregateStore using (tenant_id, aggregate_id, new_version) as the composite key.
    • Constraint: AggregateStore enforces strict monotonicity — if new_version already exists, it returns VersionConflict, and the Aggregate must reload and retry.
  5. Publication:
    • Events published to NATS JetStream are immediately available for downstream consumption by Sagas and Projections (filtered by tenant if needed).

3. Technical Constraints & Guarantees

  • Determinism: The logic within an Aggregate must be 100% deterministic. runtime-function DAG programs are sandboxed and gas-metered, with no access to the system clock, random number generators, or external APIs. All data required for a decision must be present in the Command or the Aggregate State.
  • Side-Effect Free: An Aggregate does not send emails, update databases, or call other services. It only produces events. Side effects are the responsibility of Sagas.
  • Single Writer: While multiple nodes may attempt to process commands for the same aggregate_id, only one "Commit" can succeed for a specific version, enforced by edge-storage AggregateStore (VersionConflict).
  • Tenant Isolation: An Aggregate can only access data within its tenant_id scope. Cross-tenant access is blocked at the storage and stream layers. The tenant_id is validated on every command to prevent tenant spoofing.
  • Isolation: An Aggregate cannot see the state of other Aggregates. If a business rule spans multiple Aggregates, it must be handled by a Saga.

4. Data Structure (The Envelope)

Each Aggregate maintains a metadata header:

  • tenant_id: Optional identifier for multi-tenant isolation (routed via x-tenant-id header)
  • aggregate_id: Unique UUID or URN for the instance.
  • aggregate_type: The name of the business entity (e.g., Account, Order).
  • version: A monotonically increasing integer representing the number of events processed.
  • snapshot_threshold: A configuration defining how many events should trigger a new snapshot in edge-storage.

5. Error Handling

  • Validation Errors: Business rule violations (e.g., "Insufficient Funds") result in an immediate synchronous rejection of the command.
  • Tenant Access Errors: Cross-tenant access attempts (e.g., wrong tenant_id in command) are rejected with TenantAccessDenied.
  • Concurrency Conflicts: If edge-storage returns VersionConflict, the framework implements an automatic "Retry-on-Conflict" policy (Reload → Re-validate → Re-commit) up to a defined limit.
  • System Failures: If edge-storage or NATS JetStream is unavailable, the Aggregate remains in a read-only or "unavailable" state to prevent inconsistent branching of the event stream.

6. Horizontal Scaling Strategy

The Aggregate container is designed for horizontal scaling on Docker Swarm, leveraging tenant-based sharding for predictable data locality and simple operations.

Sharding Model:

  • Tenant-Aware Placement: Aggregate instances are placed on Swarm nodes based on tenant_id using Docker Swarm placement constraints
  • Consistent Hashing: A hash ring maps tenant_id values to specific nodes, ensuring all commands for a tenant route to the same node (or replica set)
  • Subject-Based Routing: NATS JetStream consumer groups are tenant-namespaced, enabling parallel processing across tenants without coordination

Scaling Architecture:

┌─────────────────────────────────────────────────────────────────┐
│                      Admin UI (Control Node)                     │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Scale Manager: CRUD for tenant → node assignments      │    │
│  │  - List tenants, node assignments, load metrics         │    │
│  │  - Add/remove nodes, migrate tenants                    │    │
│  │  - Emit scaling commands to Docker Swarm API            │    │
│  └─────────────────────────────────────────────────────────┘    │
└──────────────────────────┬──────────────────────────────────────┘
                           │ Docker Swarm API / SSH
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Docker Swarm Cluster                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │  Node A      │  │  Node B      │  │  Node C      │           │
│  │  tenant: a-c │  │  tenant: d-m │  │  tenant: n-z │           │
│  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │           │
│  │  │Agg Ctr │  │  │  │Agg Ctr │  │  │  │Agg Ctr │  │           │
│  │  └───┬────┘  │  │  └───┬────┘  │  │  └───┬────┘  │           │
│  │      │       │  │      │       │  │      │       │           │
│  │  ┌───▼────┐  │  │  ┌───▼────┐  │  │  ┌───▼────┐  │           │
│  │  │libmdbx │  │  │  │libmdbx │  │  │  │libmdbx │  │           │
│  │  │(local) │  │  │  │(local) │  │  │  │(local) │  │           │
│  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │           │
│  └──────────────┘  └──────────────┘  └──────────────┘           │
│         │                  │                  │                  │
│         └──────────────────┴──────────────────┘                  │
│                           │                                      │
│  ┌────────────────────────▼────────────────────────────────────┐ │
│  │                Shared NATS JetStream Cluster                │ │
│  │         (tenant-namespaced subjects for isolation)          │ │
│  └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Note: Each node has its own embedded edge-storage (libmdbx) containing snapshots for its assigned tenants. NATS JetStream provides shared event storage. Tenant migration requires snapshot data transfer between nodes.

Operational Model:

  • Scale Up: Admin UI calls Swarm API to add new node, updates tenant → node mapping, Gateway updates routing table
  • Scale Down: Migrate tenants to other nodes (drain), remove node from Swarm
  • Tenant Migration: Pause consumer, copy tenant data, update routing, resume on new node
  • Zero-Downtime: New tenant assignments are picked up by Gateway via config reload without restart

Placement Constraints:

  • Each Aggregate service runs with --constraint node.labels.tenant_range==<range>
  • Gateway uses tenant → node mapping to route commands to correct Swarm service endpoint
  • Multiple replicas per tenant range supported for HA (active-passive via NATS consumer groups)

Admin Endpoints (per Aggregate container):

  • /health - Container health (NATS, storage, active aggregates)
  • /ready - Readiness for receiving commands
  • /metrics - Prometheus metrics with tenant_id labels
  • /admin/tenants - List tenants hosted on this node (read-only)
  • /admin/drain - Graceful drain for tenant migration
  • /admin/reload - Hot-reload tenant placement config

External Control Node:

  • Separate service that calls Aggregate admin endpoints
  • Manages Docker Swarm API for scaling operations
  • Publishes tenant → node mapping to NATS KV
  • See Admin UI repository for full implementation

💡 Implementation Note:

The Aggregate Logic is a pair of runtime-function DAG programs:

  1. decide program: (state, command) → events[] — The business logic (validates command, produces events).
  2. apply program: (state, event) → new_state — The state transition logic (used during rehydration from snapshots + events).

These are referenced in the manifest as decide: and apply: fields under each aggregate definition.