Files
madbase/docs/CACHING_STRATEGY.md
Vlad Durnea cffdf8af86
Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
wip:milestone 0 fixes
2026-03-15 12:35:42 +02:00

7.0 KiB

MadBase Caching Strategy

Overview

MadBase implements a two-tier caching architecture that maintains the simplicity of the 4-pillar system while providing enterprise-grade caching capabilities.

Architecture

Tier 1: L1 Cache (In-Memory)

  • Technology: moka (Rust)
  • Location: Proxy / Worker nodes
  • Purpose: Ultra-low latency for frequently accessed data
  • Typical Use Cases:
    • Project configurations
    • JWT validation cache
    • Hot database query results
    • API response caching

Tier 2: L2 Cache (Redis)

  • Technology: Redis 7
  • Location: State Pillar (Pillar 3)
  • Purpose: Shared state across the entire cluster
  • Typical Use Cases:
    • Distributed session storage
    • Realtime presence tracking
    • Rate limiting counters
    • Distributed locking
    • Pub/Sub messaging

State Pillar Integration

The State Pillar (formerly "Database Pillar") now hosts both PostgreSQL and Redis:

┌─────────────────────────────────────────┐
│           State Pillar Node             │
├─────────────────────────────────────────┤
│  ┌──────────┐        ┌─────────────┐   │
│  │PostgreSQL│        │    Redis    │   │
│  │  :5432   │        │    :6379    │   │
│  └──────────┘        └─────────────┘   │
│         │                   │          │
│         └─────────┬─────────┘          │
│                   ▼                     │
│            ┌─────────────┐             │
│            │   HAProxy   │             │
│            │ :5433/:6379 │             │
│            └─────────────┘             │
└─────────────────────────────────────────┘

Why This Approach?

  1. Resource Symmetry: Both PostgreSQL and Redis are memory-intensive and share the same VPS requirements
  2. HA Piggybacking: Pillar 3 already manages HA via Patroni and etcd. Redis benefits from the same infrastructure
  3. Centralized State: Maintains clean separation of Compute (Worker/Proxy) vs. State (DB/Redis)
  4. Zero Complexity: No new pillar needed, just enhanced the existing one

Features

1. Shared Auth Sessions

Users can now stay logged in even if the Proxy node handling their request changes:

use auth::SessionManager;

// Create a session
let session_token = session_manager
    .create_session(user_id, email, "authenticated".to_string())
    .await?;

// Validate on any proxy node
let session = session_manager
    .validate_session(&session_token)
    .await?;

2. Realtime Presence

Track "Who is online" across multiple Worker nodes:

use realtime::PresenceManager;

// User joins a channel
presence_manager
    .join_channel(user_id, "public-chat".to_string(), None)
    .await?;

// Get online count
let count = presence_manager
    .get_channel_online_count("public-chat".to_string())
    .await?;

3. Distributed Locking

Prevent race conditions during background operations:

use common::DistributedLock;

let lock = DistributedLock::new(
    redis_client,
    "migration:lock".to_string(),
    30, // 30 seconds TTL
);

if lock.acquire().await? {
    // Perform critical section
    lock.release().await?;
}

4. Rate Limiting

Distributed rate limiting across all instances:

use gateway::rate_limit::RateLimitMiddleware;

// Check IP-based rate limit
if !middleware.check_ip(&user_ip).await? {
    return Err("Rate limit exceeded");
}

Configuration

Environment Variables

# PostgreSQL
DATABASE_URL="postgres://user:pass@db:5432/madbase"

# Redis (Optional - will fallback to L1 only)
REDIS_URL="redis://db:6379/0"

# Cache TTL
CACHE_TTL_SECONDS=3600

Cache Keyspaces

Pattern Purpose TTL
session:{token} User sessions 3600s
presence:channel:{name}:user:{id} User presence 60s
ratelimit:ip:{addr} IP rate limiting 60s
ratelimit:user:{id} User rate limiting 60s
lock:{name} Distributed locks Configurable

HAProxy Configuration

The State Pillar's HAProxy routes both PostgreSQL and Redis traffic:

listen primary
    bind *:5433
    mode tcp
    server patroni1 patroni:5432 check

listen redis
    bind *:6379
    mode tcp
    server redis1 redis:6379 check

Scaling Strategy

Horizontal Scaling

  • Proxy Nodes: Add more proxies, all share the same Redis cache
  • Worker Nodes: Add more workers, presence tracking works seamlessly
  • State Nodes: Scale to 3 or 5 nodes for HA, Redis is replicated via Sentinel/Cluster

Vertical Scaling

  • Upgrade State Node plan for more RAM (benefits both PostgreSQL and Redis)
  • Typical: CX21 (8GB) → CX31 (16GB) → CX41 (32GB)

Monitoring

Redis is monitored alongside PostgreSQL:

  • HAProxy Stats: http://db-node:7000
  • Grafana Dashboard: "State Pillar Performance"
  • Metrics:
    • Redis memory usage
    • Cache hit/miss ratios
    • Connection pool utilization
    • Rate limit enforcement

Best Practices

  1. Session Management: Use appropriate TTLs (shorter for sensitive data)
  2. Presence Tracking: Implement heartbeats to keep users "online"
  3. Rate Limiting: Use different limits for different user tiers
  4. Distributed Locks: Always set reasonable TTLs to prevent deadlocks
  5. Cache Invalidation: Use versioned keys or explicit deletion

Migration Guide

From Single-Node to Cluster

  1. Update State Pillar image to include Redis
  2. Set REDIS_URL in all Proxy/Worker configurations
  3. Deploy SessionManager in Auth handlers
  4. Enable presence tracking in Realtime module
  5. Update rate limiting to use distributed counters

Testing

# Test Redis connection
redis-cli -h db-node ping

# Test session creation
curl -X POST http://localhost:8000/auth/v1/token \
  -d '{"email":"test@example.com","password":"password"}'

# Check presence
redis-cli -h db-node SMEMBERS "presence:channel:public:users"

Performance

Expected Latency

Operation L1 Cache (moka) L2 Cache (Redis) Database
Get <1μs 1-2ms 10-50ms
Set <1μs 1-2ms 10-50ms
Delete <1μs 1-2ms 10-50ms

Cache Hit Ratios

  • L1 Hit: 95%+ for frequently accessed data
  • L2 Hit: 80%+ for shared state
  • Miss: Falls through to database

Future Enhancements

  • Redis Cluster for horizontal scaling
  • Pub/Sub for real-time events
  • Bloom filters for existence checks
  • HyperLogLog for cardinality estimation
  • Geospatial indexing for location features