wip:milestone 0 fixes

2026-03-15 12:35:42 +02:00
parent 6708cf28a7
commit cffdf8af86
61266 changed files with 4511646 additions and 1938 deletions
--- a/docs/CACHING_STRATEGY.md
+++ b/docs/CACHING_STRATEGY.md
@@ -0,0 +1,249 @@
+# MadBase Caching Strategy
+
+## Overview
+
+MadBase implements a **two-tier caching architecture** that maintains the simplicity of the 4-pillar system while providing enterprise-grade caching capabilities.
+
+## Architecture
+
+### Tier 1: L1 Cache (In-Memory)
+- **Technology**: moka (Rust)
+- **Location**: Proxy / Worker nodes
+- **Purpose**: Ultra-low latency for frequently accessed data
+- **Typical Use Cases**:
+  - Project configurations
+  - JWT validation cache
+  - Hot database query results
+  - API response caching
+
+### Tier 2: L2 Cache (Redis)
+- **Technology**: Redis 7
+- **Location**: State Pillar (Pillar 3)
+- **Purpose**: Shared state across the entire cluster
+- **Typical Use Cases**:
+  - Distributed session storage
+  - Realtime presence tracking
+  - Rate limiting counters
+  - Distributed locking
+  - Pub/Sub messaging
+
+## State Pillar Integration
+
+The **State Pillar** (formerly "Database Pillar") now hosts both PostgreSQL and Redis:
+
+```
+┌─────────────────────────────────────────┐
+│           State Pillar Node             │
+├─────────────────────────────────────────┤
+│  ┌──────────┐        ┌─────────────┐   │
+│  │PostgreSQL│        │    Redis    │   │
+│  │  :5432   │        │    :6379    │   │
+│  └──────────┘        └─────────────┘   │
+│         │                   │          │
+│         └─────────┬─────────┘          │
+│                   ▼                     │
+│            ┌─────────────┐             │
+│            │   HAProxy   │             │
+│            │ :5433/:6379 │             │
+│            └─────────────┘             │
+└─────────────────────────────────────────┘
+```
+
+### Why This Approach?
+
+1. **Resource Symmetry**: Both PostgreSQL and Redis are memory-intensive and share the same VPS requirements
+2. **HA Piggybacking**: Pillar 3 already manages HA via Patroni and etcd. Redis benefits from the same infrastructure
+3. **Centralized State**: Maintains clean separation of Compute (Worker/Proxy) vs. State (DB/Redis)
+4. **Zero Complexity**: No new pillar needed, just enhanced the existing one
+
+## Features
+
+### 1. Shared Auth Sessions
+
+Users can now stay logged in even if the Proxy node handling their request changes:
+
+```rust
+use auth::SessionManager;
+
+// Create a session
+let session_token = session_manager
+    .create_session(user_id, email, "authenticated".to_string())
+    .await?;
+
+// Validate on any proxy node
+let session = session_manager
+    .validate_session(&session_token)
+    .await?;
+```
+
+### 2. Realtime Presence
+
+Track "Who is online" across multiple Worker nodes:
+
+```rust
+use realtime::PresenceManager;
+
+// User joins a channel
+presence_manager
+    .join_channel(user_id, "public-chat".to_string(), None)
+    .await?;
+
+// Get online count
+let count = presence_manager
+    .get_channel_online_count("public-chat".to_string())
+    .await?;
+```
+
+### 3. Distributed Locking
+
+Prevent race conditions during background operations:
+
+```rust
+use common::DistributedLock;
+
+let lock = DistributedLock::new(
+    redis_client,
+    "migration:lock".to_string(),
+    30, // 30 seconds TTL
+);
+
+if lock.acquire().await? {
+    // Perform critical section
+    lock.release().await?;
+}
+```
+
+### 4. Rate Limiting
+
+Distributed rate limiting across all instances:
+
+```rust
+use gateway::rate_limit::RateLimitMiddleware;
+
+// Check IP-based rate limit
+if !middleware.check_ip(&user_ip).await? {
+    return Err("Rate limit exceeded");
+}
+```
+
+## Configuration
+
+### Environment Variables
+
+```bash
+# PostgreSQL
+DATABASE_URL="postgres://user:pass@db:5432/madbase"
+
+# Redis (Optional - will fallback to L1 only)
+REDIS_URL="redis://db:6379/0"
+
+# Cache TTL
+CACHE_TTL_SECONDS=3600
+```
+
+### Cache Keyspaces
+
+| Pattern | Purpose | TTL |
+|---------|---------|-----|
+| `session:{token}` | User sessions | 3600s |
+| `presence:channel:{name}:user:{id}` | User presence | 60s |
+| `ratelimit:ip:{addr}` | IP rate limiting | 60s |
+| `ratelimit:user:{id}` | User rate limiting | 60s |
+| `lock:{name}` | Distributed locks | Configurable |
+
+## HAProxy Configuration
+
+The State Pillar's HAProxy routes both PostgreSQL and Redis traffic:
+
+```haproxy
+listen primary
+    bind *:5433
+    mode tcp
+    server patroni1 patroni:5432 check
+
+listen redis
+    bind *:6379
+    mode tcp
+    server redis1 redis:6379 check
+```
+
+## Scaling Strategy
+
+### Horizontal Scaling
+
+- **Proxy Nodes**: Add more proxies, all share the same Redis cache
+- **Worker Nodes**: Add more workers, presence tracking works seamlessly
+- **State Nodes**: Scale to 3 or 5 nodes for HA, Redis is replicated via Sentinel/Cluster
+
+### Vertical Scaling
+
+- Upgrade State Node plan for more RAM (benefits both PostgreSQL and Redis)
+- Typical: CX21 (8GB) → CX31 (16GB) → CX41 (32GB)
+
+## Monitoring
+
+Redis is monitored alongside PostgreSQL:
+
+- **HAProxy Stats**: http://db-node:7000
+- **Grafana Dashboard**: "State Pillar Performance"
+- **Metrics**:
+  - Redis memory usage
+  - Cache hit/miss ratios
+  - Connection pool utilization
+  - Rate limit enforcement
+
+## Best Practices
+
+1. **Session Management**: Use appropriate TTLs (shorter for sensitive data)
+2. **Presence Tracking**: Implement heartbeats to keep users "online"
+3. **Rate Limiting**: Use different limits for different user tiers
+4. **Distributed Locks**: Always set reasonable TTLs to prevent deadlocks
+5. **Cache Invalidation**: Use versioned keys or explicit deletion
+
+## Migration Guide
+
+### From Single-Node to Cluster
+
+1. Update State Pillar image to include Redis
+2. Set `REDIS_URL` in all Proxy/Worker configurations
+3. Deploy SessionManager in Auth handlers
+4. Enable presence tracking in Realtime module
+5. Update rate limiting to use distributed counters
+
+### Testing
+
+```bash
+# Test Redis connection
+redis-cli -h db-node ping
+
+# Test session creation
+curl -X POST http://localhost:8000/auth/v1/token \
+  -d '{"email":"test@example.com","password":"password"}'
+
+# Check presence
+redis-cli -h db-node SMEMBERS "presence:channel:public:users"
+```
+
+## Performance
+
+### Expected Latency
+
+| Operation | L1 Cache (moka) | L2 Cache (Redis) | Database |
+|-----------|-----------------|------------------|----------|
+| Get       | <1μs            | 1-2ms            | 10-50ms  |
+| Set       | <1μs            | 1-2ms            | 10-50ms  |
+| Delete    | <1μs            | 1-2ms            | 10-50ms  |
+
+### Cache Hit Ratios
+
+- **L1 Hit**: 95%+ for frequently accessed data
+- **L2 Hit**: 80%+ for shared state
+- **Miss**: Falls through to database
+
+## Future Enhancements
+
+- [ ] Redis Cluster for horizontal scaling
+- [ ] Pub/Sub for real-time events
+- [ ] Bloom filters for existence checks
+- [ ] HyperLogLog for cardinality estimation
+- [ ] Geospatial indexing for location features