Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
250 lines
7.0 KiB
Markdown
250 lines
7.0 KiB
Markdown
# MadBase Caching Strategy
|
|
|
|
## Overview
|
|
|
|
MadBase implements a **two-tier caching architecture** that maintains the simplicity of the 4-pillar system while providing enterprise-grade caching capabilities.
|
|
|
|
## Architecture
|
|
|
|
### Tier 1: L1 Cache (In-Memory)
|
|
- **Technology**: moka (Rust)
|
|
- **Location**: Proxy / Worker nodes
|
|
- **Purpose**: Ultra-low latency for frequently accessed data
|
|
- **Typical Use Cases**:
|
|
- Project configurations
|
|
- JWT validation cache
|
|
- Hot database query results
|
|
- API response caching
|
|
|
|
### Tier 2: L2 Cache (Redis)
|
|
- **Technology**: Redis 7
|
|
- **Location**: State Pillar (Pillar 3)
|
|
- **Purpose**: Shared state across the entire cluster
|
|
- **Typical Use Cases**:
|
|
- Distributed session storage
|
|
- Realtime presence tracking
|
|
- Rate limiting counters
|
|
- Distributed locking
|
|
- Pub/Sub messaging
|
|
|
|
## State Pillar Integration
|
|
|
|
The **State Pillar** (formerly "Database Pillar") now hosts both PostgreSQL and Redis:
|
|
|
|
```
|
|
┌─────────────────────────────────────────┐
|
|
│ State Pillar Node │
|
|
├─────────────────────────────────────────┤
|
|
│ ┌──────────┐ ┌─────────────┐ │
|
|
│ │PostgreSQL│ │ Redis │ │
|
|
│ │ :5432 │ │ :6379 │ │
|
|
│ └──────────┘ └─────────────┘ │
|
|
│ │ │ │
|
|
│ └─────────┬─────────┘ │
|
|
│ ▼ │
|
|
│ ┌─────────────┐ │
|
|
│ │ HAProxy │ │
|
|
│ │ :5433/:6379 │ │
|
|
│ └─────────────┘ │
|
|
└─────────────────────────────────────────┘
|
|
```
|
|
|
|
### Why This Approach?
|
|
|
|
1. **Resource Symmetry**: Both PostgreSQL and Redis are memory-intensive and share the same VPS requirements
|
|
2. **HA Piggybacking**: Pillar 3 already manages HA via Patroni and etcd. Redis benefits from the same infrastructure
|
|
3. **Centralized State**: Maintains clean separation of Compute (Worker/Proxy) vs. State (DB/Redis)
|
|
4. **Zero Complexity**: No new pillar needed, just enhanced the existing one
|
|
|
|
## Features
|
|
|
|
### 1. Shared Auth Sessions
|
|
|
|
Users can now stay logged in even if the Proxy node handling their request changes:
|
|
|
|
```rust
|
|
use auth::SessionManager;
|
|
|
|
// Create a session
|
|
let session_token = session_manager
|
|
.create_session(user_id, email, "authenticated".to_string())
|
|
.await?;
|
|
|
|
// Validate on any proxy node
|
|
let session = session_manager
|
|
.validate_session(&session_token)
|
|
.await?;
|
|
```
|
|
|
|
### 2. Realtime Presence
|
|
|
|
Track "Who is online" across multiple Worker nodes:
|
|
|
|
```rust
|
|
use realtime::PresenceManager;
|
|
|
|
// User joins a channel
|
|
presence_manager
|
|
.join_channel(user_id, "public-chat".to_string(), None)
|
|
.await?;
|
|
|
|
// Get online count
|
|
let count = presence_manager
|
|
.get_channel_online_count("public-chat".to_string())
|
|
.await?;
|
|
```
|
|
|
|
### 3. Distributed Locking
|
|
|
|
Prevent race conditions during background operations:
|
|
|
|
```rust
|
|
use common::DistributedLock;
|
|
|
|
let lock = DistributedLock::new(
|
|
redis_client,
|
|
"migration:lock".to_string(),
|
|
30, // 30 seconds TTL
|
|
);
|
|
|
|
if lock.acquire().await? {
|
|
// Perform critical section
|
|
lock.release().await?;
|
|
}
|
|
```
|
|
|
|
### 4. Rate Limiting
|
|
|
|
Distributed rate limiting across all instances:
|
|
|
|
```rust
|
|
use gateway::rate_limit::RateLimitMiddleware;
|
|
|
|
// Check IP-based rate limit
|
|
if !middleware.check_ip(&user_ip).await? {
|
|
return Err("Rate limit exceeded");
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# PostgreSQL
|
|
DATABASE_URL="postgres://user:pass@db:5432/madbase"
|
|
|
|
# Redis (Optional - will fallback to L1 only)
|
|
REDIS_URL="redis://db:6379/0"
|
|
|
|
# Cache TTL
|
|
CACHE_TTL_SECONDS=3600
|
|
```
|
|
|
|
### Cache Keyspaces
|
|
|
|
| Pattern | Purpose | TTL |
|
|
|---------|---------|-----|
|
|
| `session:{token}` | User sessions | 3600s |
|
|
| `presence:channel:{name}:user:{id}` | User presence | 60s |
|
|
| `ratelimit:ip:{addr}` | IP rate limiting | 60s |
|
|
| `ratelimit:user:{id}` | User rate limiting | 60s |
|
|
| `lock:{name}` | Distributed locks | Configurable |
|
|
|
|
## HAProxy Configuration
|
|
|
|
The State Pillar's HAProxy routes both PostgreSQL and Redis traffic:
|
|
|
|
```haproxy
|
|
listen primary
|
|
bind *:5433
|
|
mode tcp
|
|
server patroni1 patroni:5432 check
|
|
|
|
listen redis
|
|
bind *:6379
|
|
mode tcp
|
|
server redis1 redis:6379 check
|
|
```
|
|
|
|
## Scaling Strategy
|
|
|
|
### Horizontal Scaling
|
|
|
|
- **Proxy Nodes**: Add more proxies, all share the same Redis cache
|
|
- **Worker Nodes**: Add more workers, presence tracking works seamlessly
|
|
- **State Nodes**: Scale to 3 or 5 nodes for HA, Redis is replicated via Sentinel/Cluster
|
|
|
|
### Vertical Scaling
|
|
|
|
- Upgrade State Node plan for more RAM (benefits both PostgreSQL and Redis)
|
|
- Typical: CX21 (8GB) → CX31 (16GB) → CX41 (32GB)
|
|
|
|
## Monitoring
|
|
|
|
Redis is monitored alongside PostgreSQL:
|
|
|
|
- **HAProxy Stats**: http://db-node:7000
|
|
- **Grafana Dashboard**: "State Pillar Performance"
|
|
- **Metrics**:
|
|
- Redis memory usage
|
|
- Cache hit/miss ratios
|
|
- Connection pool utilization
|
|
- Rate limit enforcement
|
|
|
|
## Best Practices
|
|
|
|
1. **Session Management**: Use appropriate TTLs (shorter for sensitive data)
|
|
2. **Presence Tracking**: Implement heartbeats to keep users "online"
|
|
3. **Rate Limiting**: Use different limits for different user tiers
|
|
4. **Distributed Locks**: Always set reasonable TTLs to prevent deadlocks
|
|
5. **Cache Invalidation**: Use versioned keys or explicit deletion
|
|
|
|
## Migration Guide
|
|
|
|
### From Single-Node to Cluster
|
|
|
|
1. Update State Pillar image to include Redis
|
|
2. Set `REDIS_URL` in all Proxy/Worker configurations
|
|
3. Deploy SessionManager in Auth handlers
|
|
4. Enable presence tracking in Realtime module
|
|
5. Update rate limiting to use distributed counters
|
|
|
|
### Testing
|
|
|
|
```bash
|
|
# Test Redis connection
|
|
redis-cli -h db-node ping
|
|
|
|
# Test session creation
|
|
curl -X POST http://localhost:8000/auth/v1/token \
|
|
-d '{"email":"test@example.com","password":"password"}'
|
|
|
|
# Check presence
|
|
redis-cli -h db-node SMEMBERS "presence:channel:public:users"
|
|
```
|
|
|
|
## Performance
|
|
|
|
### Expected Latency
|
|
|
|
| Operation | L1 Cache (moka) | L2 Cache (Redis) | Database |
|
|
|-----------|-----------------|------------------|----------|
|
|
| Get | <1μs | 1-2ms | 10-50ms |
|
|
| Set | <1μs | 1-2ms | 10-50ms |
|
|
| Delete | <1μs | 1-2ms | 10-50ms |
|
|
|
|
### Cache Hit Ratios
|
|
|
|
- **L1 Hit**: 95%+ for frequently accessed data
|
|
- **L2 Hit**: 80%+ for shared state
|
|
- **Miss**: Falls through to database
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Redis Cluster for horizontal scaling
|
|
- [ ] Pub/Sub for real-time events
|
|
- [ ] Bloom filters for existence checks
|
|
- [ ] HyperLogLog for cardinality estimation
|
|
- [ ] Geospatial indexing for location features
|