wip:milestone 0 fixes
Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped

This commit is contained in:
2026-03-15 12:35:42 +02:00
parent 6708cf28a7
commit cffdf8af86
61266 changed files with 4511646 additions and 1938 deletions

249
docs/CACHING_STRATEGY.md Normal file
View File

@@ -0,0 +1,249 @@
# MadBase Caching Strategy
## Overview
MadBase implements a **two-tier caching architecture** that maintains the simplicity of the 4-pillar system while providing enterprise-grade caching capabilities.
## Architecture
### Tier 1: L1 Cache (In-Memory)
- **Technology**: moka (Rust)
- **Location**: Proxy / Worker nodes
- **Purpose**: Ultra-low latency for frequently accessed data
- **Typical Use Cases**:
- Project configurations
- JWT validation cache
- Hot database query results
- API response caching
### Tier 2: L2 Cache (Redis)
- **Technology**: Redis 7
- **Location**: State Pillar (Pillar 3)
- **Purpose**: Shared state across the entire cluster
- **Typical Use Cases**:
- Distributed session storage
- Realtime presence tracking
- Rate limiting counters
- Distributed locking
- Pub/Sub messaging
## State Pillar Integration
The **State Pillar** (formerly "Database Pillar") now hosts both PostgreSQL and Redis:
```
┌─────────────────────────────────────────┐
│ State Pillar Node │
├─────────────────────────────────────────┤
│ ┌──────────┐ ┌─────────────┐ │
│ │PostgreSQL│ │ Redis │ │
│ │ :5432 │ │ :6379 │ │
│ └──────────┘ └─────────────┘ │
│ │ │ │
│ └─────────┬─────────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ HAProxy │ │
│ │ :5433/:6379 │ │
│ └─────────────┘ │
└─────────────────────────────────────────┘
```
### Why This Approach?
1. **Resource Symmetry**: Both PostgreSQL and Redis are memory-intensive and share the same VPS requirements
2. **HA Piggybacking**: Pillar 3 already manages HA via Patroni and etcd. Redis benefits from the same infrastructure
3. **Centralized State**: Maintains clean separation of Compute (Worker/Proxy) vs. State (DB/Redis)
4. **Zero Complexity**: No new pillar needed, just enhanced the existing one
## Features
### 1. Shared Auth Sessions
Users can now stay logged in even if the Proxy node handling their request changes:
```rust
use auth::SessionManager;
// Create a session
let session_token = session_manager
.create_session(user_id, email, "authenticated".to_string())
.await?;
// Validate on any proxy node
let session = session_manager
.validate_session(&session_token)
.await?;
```
### 2. Realtime Presence
Track "Who is online" across multiple Worker nodes:
```rust
use realtime::PresenceManager;
// User joins a channel
presence_manager
.join_channel(user_id, "public-chat".to_string(), None)
.await?;
// Get online count
let count = presence_manager
.get_channel_online_count("public-chat".to_string())
.await?;
```
### 3. Distributed Locking
Prevent race conditions during background operations:
```rust
use common::DistributedLock;
let lock = DistributedLock::new(
redis_client,
"migration:lock".to_string(),
30, // 30 seconds TTL
);
if lock.acquire().await? {
// Perform critical section
lock.release().await?;
}
```
### 4. Rate Limiting
Distributed rate limiting across all instances:
```rust
use gateway::rate_limit::RateLimitMiddleware;
// Check IP-based rate limit
if !middleware.check_ip(&user_ip).await? {
return Err("Rate limit exceeded");
}
```
## Configuration
### Environment Variables
```bash
# PostgreSQL
DATABASE_URL="postgres://user:pass@db:5432/madbase"
# Redis (Optional - will fallback to L1 only)
REDIS_URL="redis://db:6379/0"
# Cache TTL
CACHE_TTL_SECONDS=3600
```
### Cache Keyspaces
| Pattern | Purpose | TTL |
|---------|---------|-----|
| `session:{token}` | User sessions | 3600s |
| `presence:channel:{name}:user:{id}` | User presence | 60s |
| `ratelimit:ip:{addr}` | IP rate limiting | 60s |
| `ratelimit:user:{id}` | User rate limiting | 60s |
| `lock:{name}` | Distributed locks | Configurable |
## HAProxy Configuration
The State Pillar's HAProxy routes both PostgreSQL and Redis traffic:
```haproxy
listen primary
bind *:5433
mode tcp
server patroni1 patroni:5432 check
listen redis
bind *:6379
mode tcp
server redis1 redis:6379 check
```
## Scaling Strategy
### Horizontal Scaling
- **Proxy Nodes**: Add more proxies, all share the same Redis cache
- **Worker Nodes**: Add more workers, presence tracking works seamlessly
- **State Nodes**: Scale to 3 or 5 nodes for HA, Redis is replicated via Sentinel/Cluster
### Vertical Scaling
- Upgrade State Node plan for more RAM (benefits both PostgreSQL and Redis)
- Typical: CX21 (8GB) → CX31 (16GB) → CX41 (32GB)
## Monitoring
Redis is monitored alongside PostgreSQL:
- **HAProxy Stats**: http://db-node:7000
- **Grafana Dashboard**: "State Pillar Performance"
- **Metrics**:
- Redis memory usage
- Cache hit/miss ratios
- Connection pool utilization
- Rate limit enforcement
## Best Practices
1. **Session Management**: Use appropriate TTLs (shorter for sensitive data)
2. **Presence Tracking**: Implement heartbeats to keep users "online"
3. **Rate Limiting**: Use different limits for different user tiers
4. **Distributed Locks**: Always set reasonable TTLs to prevent deadlocks
5. **Cache Invalidation**: Use versioned keys or explicit deletion
## Migration Guide
### From Single-Node to Cluster
1. Update State Pillar image to include Redis
2. Set `REDIS_URL` in all Proxy/Worker configurations
3. Deploy SessionManager in Auth handlers
4. Enable presence tracking in Realtime module
5. Update rate limiting to use distributed counters
### Testing
```bash
# Test Redis connection
redis-cli -h db-node ping
# Test session creation
curl -X POST http://localhost:8000/auth/v1/token \
-d '{"email":"test@example.com","password":"password"}'
# Check presence
redis-cli -h db-node SMEMBERS "presence:channel:public:users"
```
## Performance
### Expected Latency
| Operation | L1 Cache (moka) | L2 Cache (Redis) | Database |
|-----------|-----------------|------------------|----------|
| Get | <1μs | 1-2ms | 10-50ms |
| Set | <1μs | 1-2ms | 10-50ms |
| Delete | <1μs | 1-2ms | 10-50ms |
### Cache Hit Ratios
- **L1 Hit**: 95%+ for frequently accessed data
- **L2 Hit**: 80%+ for shared state
- **Miss**: Falls through to database
## Future Enhancements
- [ ] Redis Cluster for horizontal scaling
- [ ] Pub/Sub for real-time events
- [ ] Bloom filters for existence checks
- [ ] HyperLogLog for cardinality estimation
- [ ] Geospatial indexing for location features