Files
madbase/docs/NODE_TEMPLATES.md
Vlad Durnea cffdf8af86
Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
wip:milestone 0 fixes
2026-03-15 12:35:42 +02:00

568 lines
16 KiB
Markdown

# Node Templates - Quick Reference
Complete guide to MadBase node templates for Hetzner Cloud deployment.
## Template Overview
| Template | Pillar | Min Plan | Cost/Mo | Use Case | Services |
|----------|--------|----------|---------|----------|----------|
| Template | Pillar | Zone | Min Plan | Cost/Mo | Use Case | Services |
|----------|--------|------|----------|---------|----------|----------|
| **system-node** | System | Public | CX21 | €6.94 | Cluster Root | Control API + Grafana + VM + Loki |
| **proxy-api-node** | Proxy / API | Public | CX11 | €3.69 | Scalable Ingress | Gateway + Platform API |
| **worker-node** | Worker | Private | CX11 | €3.69 | Horizontal scaling | Worker + vmagent |
| **db-node** | DB / State | Private | CX21 | €6.94 | Production database HA | PostgreSQL + Patroni + etcd + HAProxy |
| **worker-db-combo** ⭐ | Mixed | CX31 | €14.21 | Smaller deployments | Worker + PostgreSQL + etcd + HAProxy |
| **worker-monitor-combo** ⭐ | Mixed | CX21 | €6.94 | Cost-optimized | Worker + VictoriaMetrics + Loki |
| **all-in-one** ⭐ | Unified | CX41 | €25.60 | Development/MVP | All services on one node |
⭐ = Composite template (mixes multiple service types)
---
## Pure Templates (Single Service Type)
### 1. Database Node (db-node.yaml)
**Best for**: Production deployments requiring database HA
**Server**: CX21 (4GB RAM, 2 vCPU)
**Services**:
- PostgreSQL 15 with Patroni (auto-failover)
- etcd (distributed consensus)
- HAProxy (connection pooling + read/write splitting)
**Scaling**: 3-7 nodes (odd number for quorum)
**When to use**:
- Production traffic >1000 req/min
- Need database auto-failover
- Want separate database cluster
### 2. Worker Node (worker-node.yaml)
**Best for**: Horizontal scaling of API workers
**Server**: CX11 (4GB RAM, 2 vCPU)
**Services**:
- MadBase Worker (API processing)
- vmagent (metrics collection)
**Scaling**: 1-20 nodes
**Auto-scaling rules**:
- Scale up: CPU > 70%
- Scale down: CPU < 20%
**When to use**:
- Need to scale workers independently
- Separate database cluster already exists
- Production deployments
### 3. Control Plane Node (control-plane-node.yaml)
**Best for**: Management UI and APIs
**Server**: CX11 (4GB RAM, 2 vCPU)
**Services**:
- Gateway Proxy (port 8080)
- Control Plane API (port 8001)
- Grafana (port 3030)
- Keepalived (HA with floating IP)
**Scaling**: 1-2 nodes (HA mode)
**When to use**:
- Need web UI for server management
- Want to provision servers via API
- Production deployments
### 4. Monitoring Node (monitoring-node.yaml)
**Best for**: Centralized metrics and logging
**Server**: CX11 (4GB RAM, 2 vCPU)
**Services**:
- VictoriaMetrics (metrics database)
- Loki (log aggregation)
- Alertmanager (optional)
**Scaling**: 1-2 nodes (can be HA)
**When to use**:
- Production deployments
- Want centralized monitoring
- Need log aggregation
---
## Composite Templates (Mix Multiple Service Types)
### 5. Worker + Database Combo (worker-db-combo.yaml) ⭐
**Best for**: 2-3 server deployments with database and worker on same node
**Server**: CX31 (8GB RAM, 2 vCPU)
**Services**:
- PostgreSQL 15 with Patroni
- etcd
- HAProxy
- MadBase Worker
- vmagent
**Why use this**:
- Cost savings (€6.94 vs €10.63 for separate nodes)
- Simpler architecture for smaller deployments
- Easy to scale later
**Scaling**: 1-2 nodes
**Upgrade path**: When CPU > 60% or RAM > 70%, migrate to dedicated db-node + worker-node
**Deployment example**:
```yaml
Server 1 (worker-db-combo): PostgreSQL + Worker
Server 2 (control-plane): Proxy + Control + Grafana
Server 3 (monitoring): VictoriaMetrics + Loki
```
### 6. Worker + Monitoring Combo (worker-monitor-combo.yaml) ⭐
**Best for**: Cost-optimized deployments with monitoring on worker node
**Server**: CX21 (4GB RAM, 2 vCPU)
**Services**:
- MadBase Worker
- VictoriaMetrics
- Loki
- vmagent
- Promtail
**Why use this**:
- Save €3.69/mo (no dedicated monitoring node)
- Monitoring co-located with worker
- Good for 2-3 server deployments
**Scaling**: 1-3 nodes
**When to upgrade**:
- Worker CPU > 60% (monitoring competes for resources)
- Need to scale workers horizontally
**Deployment example**:
```yaml
Server 1 (worker-monitor-combo): Worker + VictoriaMetrics + Loki
Server 2 (db-node): PostgreSQL + etcd + HAProxy
Server 3 (control-plane): Proxy + Control + Grafana
```
### 7. All-in-One (all-in-one.yaml) ⭐
**Best for**: Development, testing, or MVP deployments
**Server**: CX41 (16GB RAM, 4 vCPU)
**Services**: ALL (PostgreSQL, etcd, HAProxy, Redis, MinIO, Workers, Proxy, Control, VictoriaMetrics, Loki, Grafana)
**Why use this**:
- Simplest deployment
- Single server for everything
- Great for development/testing
**When to upgrade**:
- Production traffic > 100 req/min
- CPU usage > 70% sustained
- Need HA for database
---
## Monitoring Stack: VictoriaMetrics + Loki
### How It Works
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ │ │ │ │ │
│ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌──────────┐ │
│ │ vmagent │─┼─────────┼─│ vmagent │─┼─────────┼─│ vmagent │─┼──┐
│ └──────────┘ │ │ └──────────┘ │ │ └──────────┘ │ │
│ Scans: │ │ Scans: │ │ Scans: │ │
│ - worker │ │ - worker │ │ - db │ │
│ - system │ │ - system │ │ - system │ │
└──────────────┘ └──────────────┘ └──────────────┘ │
┌───────────────────────┐
│ VictoriaMetrics │
│ Port: 8428 │
│ Type: Metrics DB │
└───────────┬───────────┘
┌───────────────────────┐
│ Grafana │
│ Port: 3030 │
│ Queries VM + Loki │
└───────────────────────┘
┌──────────────┐ ┌──────────────┐
│ Node 1 │ │ Node 2 │
│ │ │ │
│ ┌──────────┐ │ │ ┌──────────┐ │
│ │ Promtail │─┼─────────┼─│ Promtail │─┼───┐
│ └──────────┘ │ │ └──────────┘ │ │
│ Reads: │ │ Reads: │ │
│ - logs/* │ │ - logs/* │ │
└──────────────┘ └──────────────┘ │
┌───────────────────────┐
│ Loki │
│ Port: 3100 │
│ Type: Log Aggregation│
└───────────┬───────────┘
┌───────────────────────┐
│ Grafana │
│ LogQL Queries │
└───────────────────────┘
```
### Components
#### VictoriaMetrics (Metrics Database)
**Purpose**: Store and query time-series metrics
**Location**:
- Dedicated monitoring-node (recommended)
- worker-monitor-combo (cost-optimized)
- all-in-one (development)
**Data Flow**:
1. vmagent on each node scrapes metrics every 15s
2. Metrics sent to VictoriaMetrics via remote write
3. VictoriaMetrics stores metrics with 10x compression
4. Grafana queries VictoriaMetrics for dashboards
**Metrics Collected**:
- **Worker**: Request rate, error rate, latency, queue depth
- **PostgreSQL**: Connections, transactions, replication lag
- **System**: CPU, memory, disk, network
- **HAProxy**: Connection count, response time
**Storage Requirements**:
- ~1GB per million time series per day (compressed)
- Default retention: 30 days
- RAM: Minimal, scales with active queries
#### Loki (Log Aggregation)
**Purpose**: Store and query logs
**Location**:
- Dedicated monitoring-node (recommended)
- worker-monitor-combo (cost-optimized)
- all-in-one (development)
**Data Flow**:
1. Promtail on each node tails log files
2. Logs sent to Loki via HTTP API
3. Loki indexes logs by labels (service, level, host)
4. Grafana queries Loki using LogQL
**Logs Collected**:
- **Worker**: `/var/log/madbase/worker.log`
- **PostgreSQL**: `/var/log/postgresql/*.log`
- **System**: `/var/log/syslog`
**Storage Requirements**:
- ~10% of raw log size (with compression)
- Default retention: 30 days
- RAM: Minimal, scales with active queries
#### vmagent (Metrics Collector)
**Purpose**: Scrape metrics and send to VictoriaMetrics
**Location**: Runs on EVERY node
**Port**: 8429 (local debug endpoint)
**Configuration**: `config/vmagent.yml`
**Scrape Targets**:
- Worker: `localhost:8002/metrics`
- Patroni: `localhost:8008/metrics`
- Node Exporter: `localhost:9100/metrics`
- HAProxy: `localhost:7000/metrics`
**Resource Usage**:
- CPU: <5% of 1 core
- Memory: ~50MB
#### Promtail (Log Collector)
**Purpose**: Tail log files and send to Loki
**Location**: Runs on EVERY node
**Configuration**: `config/promtail.yml`
**Log Sources**:
- `/var/log/madbase/worker.log` (worker logs)
- `/var/log/postgresql/*.log` (database logs)
- `/var/log/syslog` (system logs)
**Resource Usage**:
- CPU: <2% of 1 core
- Memory: ~30MB
### Grafana Integration
Grafana connects to both VictoriaMetrics and Loki:
**Example Dashboard Query**:
``yaml
Panel 1: Request Rate (Metrics)
Query: rate(http_requests_total[5m])
Panel 2: Error Rate (Metrics)
Query: rate(http_requests_total{status=~"5.."}[5m])
Panel 3: Recent Errors (Logs)
Query: {level="error"} | line format "{{.message}}"
Panel 4: Trace Request by ID (Logs)
Query: {trace_id="abc123"} |= "timeout"
```
### Deployment Scenarios
#### Scenario 1: Dedicated Monitoring Node (Production)
``yaml
servers:
- name: server1
template: control-plane-node
plan: CX11
- name: server2
template: db-node
plan: CX21
- name: server3
template: worker-node
plan: CX11
- name: server4
template: monitoring-node ← Dedicated monitoring
plan: CX11
```
**Cost**: €17.22/mo (4 servers)
**Best for**: Production with >1000 req/min
#### Scenario 2: Worker + Monitoring Combo (Cost-Optimized)
``yaml
servers:
- name: server1
template: control-plane-node
plan: CX11
- name: server2
template: db-node
plan: CX21
- name: server3
template: worker-monitor-combo ← Combined
plan: CX21
```
**Cost**: €13.53/mo (3 servers)
**Best for**: Cost-optimized production with <1000 req/min
#### Scenario 3: All-in-One (Development)
``yaml
servers:
- name: dev-server
template: all-in-one
plan: CX41
```
**Cost**: €25.60/mo (1 server)
**Best for**: Development, testing, MVP
---
## Deployment Examples
### Example 1: Small Production (3 servers)
``yaml
Server 1 (CX21 - €6.94):
Template: worker-db-combo
Services: PostgreSQL + Worker
Server 2 (CX11 - €3.69):
Template: control-plane-node
Services: Proxy + Control + Grafana
Server 3 (CX11 - €3.69):
Template: worker-monitor-combo
Services: Worker + VictoriaMetrics + Loki
Total: €14.32/mo
```
### Example 2: Medium Production (4 servers)
``yaml
Server 1 (CX21 - €6.94):
Template: db-node
Services: PostgreSQL + etcd + HAProxy
Server 2 (CX11 - €3.69):
Template: worker-node
Services: Worker + vmagent
Server 3 (CX11 - €3.69):
Template: control-plane-node
Services: Proxy + Control + Grafana
Server 4 (CX11 - €3.69):
Template: monitoring-node
Services: VictoriaMetrics + Loki
Total: €17.22/mo
```
### Example 3: Large Production (6 servers)
``yaml
Server 1-3 (CX21 - €6.94 each):
Template: db-node
Services: PostgreSQL cluster (3 nodes)
Server 4-5 (CX11 - €3.69 each):
Template: worker-node
Services: Workers (2 nodes)
Server 6 (CX11 - €3.69):
Template: control-plane-node
Services: Proxy + Control + Grafana + VictoriaMetrics + Loki
Total: €30.70/mo
```
---
## Template Selection Guide
**Start with these questions**:
1. **What's your budget?**
- €15/mo → Use composite templates
- €25/mo → Use pure templates
2. **What's your traffic?**
- <100 req/min → all-in-one
- <1000 req/min → worker-db-combo
- >1000 req/min → pure templates
3. **Do you need database HA?**
- Yes → db-node (3 nodes minimum)
- No → worker-db-combo
4. **Do you need centralized monitoring?**
- Yes → monitoring-node or worker-monitor-combo
- No → Skip (use worker vmagent only)
---
## Control Plane API Integration
Templates are used by the Control Plane API to provision servers:
```http
POST /api/v1/servers
Content-Type: application/json
{
"name": "worker-1",
"template": "worker-node",
"hetzner_plan": "CX11",
"region": "fsn1",
"features": ["worker", "monitoring"],
"environment": "production"
}
```
**Response**:
``json
{
"server_id": "abc123",
"status": "provisioning",
"ip_address": "167.235.123.45",
"services": [
{"name": "worker", "port": 8002},
{"name": "vmagent", "port": 8429}
]
}
```
---
## Resource Profiles
Each service can be tuned with resource profiles:
``yaml
minimal:
cpu_limit: "0.5"
memory_limit: "512Mi"
balanced:
cpu_limit: "2"
memory_limit: "2Gi"
cpu_intensive:
cpu_limit: "4"
memory_limit: "4Gi"
```
Default profiles are assigned in templates but can be overridden:
```http
POST /api/v1/servers
{
"template": "worker-node",
"overrides": {
"worker": {
"resource_profile": "cpu_intensive"
}
}
}
```
---
## Next Steps
1. **Choose template** based on budget and traffic
2. **Provision servers** via Control Plane API or Hetzner CLI
3. **Configure monitoring** (vmagent + promtail)
4. **Verify health** with Grafana dashboards
5. **Scale up/down** as needed
For more details, see:
- `STORAGE_CONFIGURATION.md` - Storage backend setup
- `QUICKSTART_HETZNER_STORAGE.md` - Hetzner Bucket Storage guide
- `4SERVER_DEPLOYMENT_GUIDE.md` - Multi-server deployment