Files
cloudlysis/projection/prd.md
Vlad Durnea 1298d9a3df
Some checks failed
ci / ui (push) Failing after 30s
ci / rust (push) Failing after 2m34s
Monorepo consolidation: workspace, shared types, transport plans, docker/swam assets
2026-03-30 11:40:42 +03:00

11 KiB
Raw Blame History

The Projection is the "Read Side" of your CQRS (Command Query Responsibility Segregation) architecture. While Aggregates focus on writing valid data, Projections focus on reading and formatting that data for the end-user or application.

In your framework, Projections are event-driven views that transform the stream of facts from NATS JetStream into highly optimized, queryable state in edge-storage KvStore, queryable via the embedded query-engine (UQF).


🧱 Component: Projection (Read Model)

Definition:
A Projection is a standalone Rust-based container that consumes Events from NATS JetStream and incrementally updates one or more "Read Models" in edge-storage. Its sole purpose is to provide a high-performance, pre-computed view of the system state that is optimized for specific queries, bypassing the need to rehydrate Aggregate state or replay event streams at query time.

Multi-Tenancy:
The Projection supports optional multi-tenancy via tenant_id. When enabled:

  • Subject Naming: JetStream subjects include tenant_id (e.g., tenant.<tenant_id>.aggregate.<aggregate_type>.<aggregate_id>)
  • Storage Namespacing: Views and checkpoints are namespaced by tenant_id to prevent cross-tenant reads
  • Query Isolation: Queries are tenant-scoped (e.g., x-tenant-id header) and only scan tenant-prefixed keys
  • Backward Compatibility: Deployments without multi-tenancy use a default/empty tenant_id

Dependencies:

  • Core crates pulled from the custom Cargo registry:

    [registries.madapes]
    index = "sparse+https://git.madapes.com/api/packages/madapes/cargo/"
    
    Crate Purpose
    edge-storage libmdbx-backed KvStore for durable view storage
    runtime-function Deterministic DAG execution for project programs
    edge-logger High-performance logging (UDS + Protobuf, Loki sink)
    query-engine UQF query support for filtering/querying view state
    async-nats NATS JetStream client for event consumption
  • Source code available at ../../madapes/

  • Note: This is a standalone container — it does not use event-bus or gRPC Consume/FetchBatch APIs

1. Core Responsibilities

  • Event Consumption: Subscribes to one or more JetStream subjects (typically Aggregate event subjects) using a durable consumer, filtering with subject wildcards.
  • State Transformation: Uses a project program (runtime-function DAG) to map an incoming event to a state change (e.g., IncrementCounter, UpdateUserEmail, AddToList).
  • Read Model Persistence: Stores the resulting "View" in edge-storage KvStore as a JSON document, keyed by view:{tenant_id}:{view_type}:{view_id} (e.g., view:tenant_a:UserDashboard:user_123).
  • Query Serving: Provides read access via query-engine UQF queries. The existing KvStore::query() integration performs prefix scans and applies UQF filters/sorts.
  • Checkpointing: Tracks its stream position (JetStream stream sequence) in edge-storage KvStore (key: checkpoint:{tenant_id}:{view_type}) to resume correctly after a restart.
  • Safe Acknowledgement: Acks JetStream messages only after the view update and checkpoint are durably committed.

2. The Lifecycle of a Projection Update

  1. Ingestion: The Projection receives a JetStream message whose payload is a FrameworkEnvelope (or equivalent event envelope). It extracts the message metadata (at minimum, the JetStream stream sequence) used for idempotency.
  2. Context Loading:
    • The Projection fetches the current "View" from edge-storage KvStore (e.g., kv.get("view:tenant_a:UserDashboard:user_123")).
  3. Transformation (runtime-function):
    • It executes the project DAG program: (current_view_state, incoming_event) → new_view_state.
    • Alternatively, it can use KvStore::query() (with query-engine UQF) to perform cross-projection lookups to build the new state.
  4. Atomic Update:
    • The Projection saves the new_view_state back to edge-storage KvStore.
    • Critical: It must save the checkpoint (JetStream stream sequence) as part of the same MDBX transaction (e.g., kv.put_sync("checkpoint:tenant_a:UserDashboard", stream_sequence)). This ensures crash-recovery correctness.
  5. Acknowledge: After the transaction commits, the Projection acks the JetStream message so it will not be redelivered.
  6. Query Availability: The updated state is immediately available for applications to query via query-engine UQF queries.

3. Technical Constraints & Guarantees

  • Eventual Consistency: Projections are inherently "behind" the Aggregate. There is a sub-second (usually) delay between an event being committed and the Projection reflecting that change.
  • Idempotency: Since JetStream provides at-least-once delivery, the Projection must use its stored Checkpoint (stream sequence) to ignore events it has already processed.
  • Disposable & Rebuildable: Because JetStream is a durable log, Projections are "disposable." If a business requirement changes, you can delete a Projection's KV entries, create a new runtime-function program, and replay the entire history from JetStream (starting from sequence 1) to build a new view from scratch.
  • Read-Only: Projections never produce events or commands. They are strictly "sinks" for data.

4. Replay & Recovery Model

  • Catch-up Mode: When a new Projection is deployed (no checkpoint exists), it starts from the beginning of the JetStream stream (sequence 1) and consumes as fast as possible until it reaches the tail.
  • Live Mode: Once caught up, it continues consuming in real time using the same durable consumer, relying on JetStream acks/redelivery for reliability.

5. Snapshots (Relationship to Aggregates)

The Projection does not require Aggregate snapshots to function, because its source of truth for changes is the JetStream event stream. However, snapshots are still relevant in two ways:

  • Aggregate Snapshots (Write Side): Aggregates persist versioned snapshots in edge-storage AggregateStore to speed up Aggregate rehydration. These snapshots are not a read API for projections and should not be treated as a substitute for consuming events.
  • Projection State (Read Side): A Projections stored View in edge-storage KvStore is effectively its own “snapshot” of the read model at a specific checkpoint (stream sequence).
  • Fast Recovery: On restart, the Projection loads checkpoint:{tenant_id}:{view_type}, resumes JetStream consumption from the next sequence, and continues updating existing View records in place. No replay is required unless the checkpoint is missing or the view schema/logic has changed.
  • Optional Seeding: For very large histories, a Projection may optionally seed an initial View state from a recent Aggregate snapshot or an external export, then set its checkpoint to a known JetStream stream sequence and continue consuming events forward from that point. This preserves incremental correctness while reducing rebuild time.

6. Hot Provisioning (Rolling Scale + Rolling Upgrades)

Projections are designed to be provisioned and updated without downtime.

  • Hot Scale-Out: Multiple Projection replicas can run concurrently per tenant_id and view_type. JetStream consumer configuration is used to ensure each event is processed by exactly one replica within a replica set.
  • Hot Restart: A restarted instance resumes from the persisted checkpoint and continues consumption; recovery time is proportional to the gap between the checkpoint and the stream tail.
  • Hot Upgrade (Projection Logic): To change a project program safely:
    • Deploy a new Projection version under a new view_type (or view_type + version suffix) with its own checkpoint.
    • Backfill by consuming from sequence 1 (or from a chosen seed sequence) until caught up.
    • Switch query routing from the old view keys to the new view keys.
    • Retire old view data and checkpoint after the cutover.
  • In-Place Migration: If the schema change is backward compatible, a Projection may evolve the stored View shape incrementally while processing events, but this requires strict versioning in the View payload.

7. Caveats & Operational Notes

  • Ordering Guarantees: JetStream preserves ordering per stream, but if the Projection processes messages concurrently it can violate per-entity ordering. If ordering matters for a view_id, enforce per-key serialization in the Projection.
  • At-Least-Once Reality: Redeliveries can happen (network splits, ack timeouts, restarts). The Projection must be idempotent via checkpoint checks and/or per-event dedupe keyed by stream sequence.
  • Ack Discipline: Never ack before the MDBX transaction commits. Treat “view update + checkpoint update + ack” as one logical commit.
  • Poison Messages: A single malformed event or incompatible schema can stall a durable consumer. Define a policy for retries, quarantine, and alerting (including whether to skip and record the failure).
  • Schema Evolution: Projection logic must be able to handle old event versions or explicitly version the stream/subjects. Projection View schemas also need versioning if you support in-place migrations.
  • Backpressure & Lag: Catch-up replays can saturate storage and CPU. Monitor consumer lag, redeliveries, and processing latency; apply limits (max in-flight, batching) to protect the node.
  • Rebuild Semantics: Rebuilds must delete both View keys and checkpoints for the target tenant_id/view_type. Partial deletes can create “mixed era” views.
  • Cross-View Lookups: Using KvStore::query() to join across projections is convenient but can amplify read load and introduce consistency anomalies between views. Prefer event-local computation when possible.

8. Data Structure (The View Envelope)

  • view_id: The unique key for the record (e.g., user_id). Used in KvStore key: view:{tenant_id}:{view_type}:{view_id}.
  • view_type: The name of the projection (e.g., active_users_list).
  • last_event_sequence: The checkpoint (JetStream stream sequence) of the last event processed. Stored separately in checkpoint:{tenant_id}:{view_type}.
  • data: The actual payload (JSON) optimized for the UI or API, stored as the KvStore value.

💡 Key Distinction for your PRD:

In your framework, the Projection is where the "Distributed" part of the system becomes visible to the user.

  • Aggregates are for Consistency (The Truth).
  • Projections are for Performance (The Speed).