Some checks failed
CI/CD Pipeline / unit-tests (push) Failing after 1m16s
CI/CD Pipeline / integration-tests (push) Failing after 2m32s
CI/CD Pipeline / lint (push) Successful in 5m22s
CI/CD Pipeline / e2e-tests (push) Has been skipped
CI/CD Pipeline / build (push) Has been skipped
568 lines
16 KiB
Markdown
568 lines
16 KiB
Markdown
# Node Templates - Quick Reference
|
|
|
|
Complete guide to MadBase node templates for Hetzner Cloud deployment.
|
|
|
|
## Template Overview
|
|
|
|
| Template | Pillar | Min Plan | Cost/Mo | Use Case | Services |
|
|
|----------|--------|----------|---------|----------|----------|
|
|
| Template | Pillar | Zone | Min Plan | Cost/Mo | Use Case | Services |
|
|
|----------|--------|------|----------|---------|----------|----------|
|
|
| **system-node** | System | Public | CX21 | €6.94 | Cluster Root | Control API + Grafana + VM + Loki |
|
|
| **proxy-api-node** | Proxy / API | Public | CX11 | €3.69 | Scalable Ingress | Gateway + Platform API |
|
|
| **worker-node** | Worker | Private | CX11 | €3.69 | Horizontal scaling | Worker + vmagent |
|
|
| **db-node** | DB / State | Private | CX21 | €6.94 | Production database HA | PostgreSQL + Patroni + etcd + HAProxy |
|
|
| **worker-db-combo** ⭐ | Mixed | CX31 | €14.21 | Smaller deployments | Worker + PostgreSQL + etcd + HAProxy |
|
|
| **worker-monitor-combo** ⭐ | Mixed | CX21 | €6.94 | Cost-optimized | Worker + VictoriaMetrics + Loki |
|
|
| **all-in-one** ⭐ | Unified | CX41 | €25.60 | Development/MVP | All services on one node |
|
|
|
|
⭐ = Composite template (mixes multiple service types)
|
|
|
|
---
|
|
|
|
## Pure Templates (Single Service Type)
|
|
|
|
### 1. Database Node (db-node.yaml)
|
|
|
|
**Best for**: Production deployments requiring database HA
|
|
|
|
**Server**: CX21 (4GB RAM, 2 vCPU)
|
|
|
|
**Services**:
|
|
- PostgreSQL 15 with Patroni (auto-failover)
|
|
- etcd (distributed consensus)
|
|
- HAProxy (connection pooling + read/write splitting)
|
|
|
|
**Scaling**: 3-7 nodes (odd number for quorum)
|
|
|
|
**When to use**:
|
|
- Production traffic >1000 req/min
|
|
- Need database auto-failover
|
|
- Want separate database cluster
|
|
|
|
### 2. Worker Node (worker-node.yaml)
|
|
|
|
**Best for**: Horizontal scaling of API workers
|
|
|
|
**Server**: CX11 (4GB RAM, 2 vCPU)
|
|
|
|
**Services**:
|
|
- MadBase Worker (API processing)
|
|
- vmagent (metrics collection)
|
|
|
|
**Scaling**: 1-20 nodes
|
|
|
|
**Auto-scaling rules**:
|
|
- Scale up: CPU > 70%
|
|
- Scale down: CPU < 20%
|
|
|
|
**When to use**:
|
|
- Need to scale workers independently
|
|
- Separate database cluster already exists
|
|
- Production deployments
|
|
|
|
### 3. Control Plane Node (control-plane-node.yaml)
|
|
|
|
**Best for**: Management UI and APIs
|
|
|
|
**Server**: CX11 (4GB RAM, 2 vCPU)
|
|
|
|
**Services**:
|
|
- Gateway Proxy (port 8080)
|
|
- Control Plane API (port 8001)
|
|
- Grafana (port 3030)
|
|
- Keepalived (HA with floating IP)
|
|
|
|
**Scaling**: 1-2 nodes (HA mode)
|
|
|
|
**When to use**:
|
|
- Need web UI for server management
|
|
- Want to provision servers via API
|
|
- Production deployments
|
|
|
|
### 4. Monitoring Node (monitoring-node.yaml)
|
|
|
|
**Best for**: Centralized metrics and logging
|
|
|
|
**Server**: CX11 (4GB RAM, 2 vCPU)
|
|
|
|
**Services**:
|
|
- VictoriaMetrics (metrics database)
|
|
- Loki (log aggregation)
|
|
- Alertmanager (optional)
|
|
|
|
**Scaling**: 1-2 nodes (can be HA)
|
|
|
|
**When to use**:
|
|
- Production deployments
|
|
- Want centralized monitoring
|
|
- Need log aggregation
|
|
|
|
---
|
|
|
|
## Composite Templates (Mix Multiple Service Types)
|
|
|
|
### 5. Worker + Database Combo (worker-db-combo.yaml) ⭐
|
|
|
|
**Best for**: 2-3 server deployments with database and worker on same node
|
|
|
|
**Server**: CX31 (8GB RAM, 2 vCPU)
|
|
|
|
**Services**:
|
|
- PostgreSQL 15 with Patroni
|
|
- etcd
|
|
- HAProxy
|
|
- MadBase Worker
|
|
- vmagent
|
|
|
|
**Why use this**:
|
|
- Cost savings (€6.94 vs €10.63 for separate nodes)
|
|
- Simpler architecture for smaller deployments
|
|
- Easy to scale later
|
|
|
|
**Scaling**: 1-2 nodes
|
|
|
|
**Upgrade path**: When CPU > 60% or RAM > 70%, migrate to dedicated db-node + worker-node
|
|
|
|
**Deployment example**:
|
|
```yaml
|
|
Server 1 (worker-db-combo): PostgreSQL + Worker
|
|
Server 2 (control-plane): Proxy + Control + Grafana
|
|
Server 3 (monitoring): VictoriaMetrics + Loki
|
|
```
|
|
|
|
### 6. Worker + Monitoring Combo (worker-monitor-combo.yaml) ⭐
|
|
|
|
**Best for**: Cost-optimized deployments with monitoring on worker node
|
|
|
|
**Server**: CX21 (4GB RAM, 2 vCPU)
|
|
|
|
**Services**:
|
|
- MadBase Worker
|
|
- VictoriaMetrics
|
|
- Loki
|
|
- vmagent
|
|
- Promtail
|
|
|
|
**Why use this**:
|
|
- Save €3.69/mo (no dedicated monitoring node)
|
|
- Monitoring co-located with worker
|
|
- Good for 2-3 server deployments
|
|
|
|
**Scaling**: 1-3 nodes
|
|
|
|
**When to upgrade**:
|
|
- Worker CPU > 60% (monitoring competes for resources)
|
|
- Need to scale workers horizontally
|
|
|
|
**Deployment example**:
|
|
```yaml
|
|
Server 1 (worker-monitor-combo): Worker + VictoriaMetrics + Loki
|
|
Server 2 (db-node): PostgreSQL + etcd + HAProxy
|
|
Server 3 (control-plane): Proxy + Control + Grafana
|
|
```
|
|
|
|
### 7. All-in-One (all-in-one.yaml) ⭐
|
|
|
|
**Best for**: Development, testing, or MVP deployments
|
|
|
|
**Server**: CX41 (16GB RAM, 4 vCPU)
|
|
|
|
**Services**: ALL (PostgreSQL, etcd, HAProxy, Redis, MinIO, Workers, Proxy, Control, VictoriaMetrics, Loki, Grafana)
|
|
|
|
**Why use this**:
|
|
- Simplest deployment
|
|
- Single server for everything
|
|
- Great for development/testing
|
|
|
|
**When to upgrade**:
|
|
- Production traffic > 100 req/min
|
|
- CPU usage > 70% sustained
|
|
- Need HA for database
|
|
|
|
---
|
|
|
|
## Monitoring Stack: VictoriaMetrics + Loki
|
|
|
|
### How It Works
|
|
|
|
```
|
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Node 1 │ │ Node 2 │ │ Node 3 │
|
|
│ │ │ │ │ │
|
|
│ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌──────────┐ │
|
|
│ │ vmagent │─┼─────────┼─│ vmagent │─┼─────────┼─│ vmagent │─┼──┐
|
|
│ └──────────┘ │ │ └──────────┘ │ │ └──────────┘ │ │
|
|
│ Scans: │ │ Scans: │ │ Scans: │ │
|
|
│ - worker │ │ - worker │ │ - db │ │
|
|
│ - system │ │ - system │ │ - system │ │
|
|
└──────────────┘ └──────────────┘ └──────────────┘ │
|
|
│
|
|
▼
|
|
┌───────────────────────┐
|
|
│ VictoriaMetrics │
|
|
│ Port: 8428 │
|
|
│ Type: Metrics DB │
|
|
└───────────┬───────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────┐
|
|
│ Grafana │
|
|
│ Port: 3030 │
|
|
│ Queries VM + Loki │
|
|
└───────────────────────┘
|
|
|
|
┌──────────────┐ ┌──────────────┐
|
|
│ Node 1 │ │ Node 2 │
|
|
│ │ │ │
|
|
│ ┌──────────┐ │ │ ┌──────────┐ │
|
|
│ │ Promtail │─┼─────────┼─│ Promtail │─┼───┐
|
|
│ └──────────┘ │ │ └──────────┘ │ │
|
|
│ Reads: │ │ Reads: │ │
|
|
│ - logs/* │ │ - logs/* │ │
|
|
└──────────────┘ └──────────────┘ │
|
|
│
|
|
▼
|
|
┌───────────────────────┐
|
|
│ Loki │
|
|
│ Port: 3100 │
|
|
│ Type: Log Aggregation│
|
|
└───────────┬───────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────┐
|
|
│ Grafana │
|
|
│ LogQL Queries │
|
|
└───────────────────────┘
|
|
```
|
|
|
|
### Components
|
|
|
|
#### VictoriaMetrics (Metrics Database)
|
|
|
|
**Purpose**: Store and query time-series metrics
|
|
|
|
**Location**:
|
|
- Dedicated monitoring-node (recommended)
|
|
- worker-monitor-combo (cost-optimized)
|
|
- all-in-one (development)
|
|
|
|
**Data Flow**:
|
|
1. vmagent on each node scrapes metrics every 15s
|
|
2. Metrics sent to VictoriaMetrics via remote write
|
|
3. VictoriaMetrics stores metrics with 10x compression
|
|
4. Grafana queries VictoriaMetrics for dashboards
|
|
|
|
**Metrics Collected**:
|
|
- **Worker**: Request rate, error rate, latency, queue depth
|
|
- **PostgreSQL**: Connections, transactions, replication lag
|
|
- **System**: CPU, memory, disk, network
|
|
- **HAProxy**: Connection count, response time
|
|
|
|
**Storage Requirements**:
|
|
- ~1GB per million time series per day (compressed)
|
|
- Default retention: 30 days
|
|
- RAM: Minimal, scales with active queries
|
|
|
|
#### Loki (Log Aggregation)
|
|
|
|
**Purpose**: Store and query logs
|
|
|
|
**Location**:
|
|
- Dedicated monitoring-node (recommended)
|
|
- worker-monitor-combo (cost-optimized)
|
|
- all-in-one (development)
|
|
|
|
**Data Flow**:
|
|
1. Promtail on each node tails log files
|
|
2. Logs sent to Loki via HTTP API
|
|
3. Loki indexes logs by labels (service, level, host)
|
|
4. Grafana queries Loki using LogQL
|
|
|
|
**Logs Collected**:
|
|
- **Worker**: `/var/log/madbase/worker.log`
|
|
- **PostgreSQL**: `/var/log/postgresql/*.log`
|
|
- **System**: `/var/log/syslog`
|
|
|
|
**Storage Requirements**:
|
|
- ~10% of raw log size (with compression)
|
|
- Default retention: 30 days
|
|
- RAM: Minimal, scales with active queries
|
|
|
|
#### vmagent (Metrics Collector)
|
|
|
|
**Purpose**: Scrape metrics and send to VictoriaMetrics
|
|
|
|
**Location**: Runs on EVERY node
|
|
|
|
**Port**: 8429 (local debug endpoint)
|
|
|
|
**Configuration**: `config/vmagent.yml`
|
|
|
|
**Scrape Targets**:
|
|
- Worker: `localhost:8002/metrics`
|
|
- Patroni: `localhost:8008/metrics`
|
|
- Node Exporter: `localhost:9100/metrics`
|
|
- HAProxy: `localhost:7000/metrics`
|
|
|
|
**Resource Usage**:
|
|
- CPU: <5% of 1 core
|
|
- Memory: ~50MB
|
|
|
|
#### Promtail (Log Collector)
|
|
|
|
**Purpose**: Tail log files and send to Loki
|
|
|
|
**Location**: Runs on EVERY node
|
|
|
|
**Configuration**: `config/promtail.yml`
|
|
|
|
**Log Sources**:
|
|
- `/var/log/madbase/worker.log` (worker logs)
|
|
- `/var/log/postgresql/*.log` (database logs)
|
|
- `/var/log/syslog` (system logs)
|
|
|
|
**Resource Usage**:
|
|
- CPU: <2% of 1 core
|
|
- Memory: ~30MB
|
|
|
|
### Grafana Integration
|
|
|
|
Grafana connects to both VictoriaMetrics and Loki:
|
|
|
|
**Example Dashboard Query**:
|
|
``yaml
|
|
Panel 1: Request Rate (Metrics)
|
|
Query: rate(http_requests_total[5m])
|
|
|
|
Panel 2: Error Rate (Metrics)
|
|
Query: rate(http_requests_total{status=~"5.."}[5m])
|
|
|
|
Panel 3: Recent Errors (Logs)
|
|
Query: {level="error"} | line format "{{.message}}"
|
|
|
|
Panel 4: Trace Request by ID (Logs)
|
|
Query: {trace_id="abc123"} |= "timeout"
|
|
```
|
|
|
|
### Deployment Scenarios
|
|
|
|
#### Scenario 1: Dedicated Monitoring Node (Production)
|
|
|
|
``yaml
|
|
servers:
|
|
- name: server1
|
|
template: control-plane-node
|
|
plan: CX11
|
|
- name: server2
|
|
template: db-node
|
|
plan: CX21
|
|
- name: server3
|
|
template: worker-node
|
|
plan: CX11
|
|
- name: server4
|
|
template: monitoring-node ← Dedicated monitoring
|
|
plan: CX11
|
|
```
|
|
|
|
**Cost**: €17.22/mo (4 servers)
|
|
**Best for**: Production with >1000 req/min
|
|
|
|
#### Scenario 2: Worker + Monitoring Combo (Cost-Optimized)
|
|
|
|
``yaml
|
|
servers:
|
|
- name: server1
|
|
template: control-plane-node
|
|
plan: CX11
|
|
- name: server2
|
|
template: db-node
|
|
plan: CX21
|
|
- name: server3
|
|
template: worker-monitor-combo ← Combined
|
|
plan: CX21
|
|
```
|
|
|
|
**Cost**: €13.53/mo (3 servers)
|
|
**Best for**: Cost-optimized production with <1000 req/min
|
|
|
|
#### Scenario 3: All-in-One (Development)
|
|
|
|
``yaml
|
|
servers:
|
|
- name: dev-server
|
|
template: all-in-one
|
|
plan: CX41
|
|
```
|
|
|
|
**Cost**: €25.60/mo (1 server)
|
|
**Best for**: Development, testing, MVP
|
|
|
|
---
|
|
|
|
## Deployment Examples
|
|
|
|
### Example 1: Small Production (3 servers)
|
|
|
|
``yaml
|
|
Server 1 (CX21 - €6.94):
|
|
Template: worker-db-combo
|
|
Services: PostgreSQL + Worker
|
|
|
|
Server 2 (CX11 - €3.69):
|
|
Template: control-plane-node
|
|
Services: Proxy + Control + Grafana
|
|
|
|
Server 3 (CX11 - €3.69):
|
|
Template: worker-monitor-combo
|
|
Services: Worker + VictoriaMetrics + Loki
|
|
|
|
Total: €14.32/mo
|
|
```
|
|
|
|
### Example 2: Medium Production (4 servers)
|
|
|
|
``yaml
|
|
Server 1 (CX21 - €6.94):
|
|
Template: db-node
|
|
Services: PostgreSQL + etcd + HAProxy
|
|
|
|
Server 2 (CX11 - €3.69):
|
|
Template: worker-node
|
|
Services: Worker + vmagent
|
|
|
|
Server 3 (CX11 - €3.69):
|
|
Template: control-plane-node
|
|
Services: Proxy + Control + Grafana
|
|
|
|
Server 4 (CX11 - €3.69):
|
|
Template: monitoring-node
|
|
Services: VictoriaMetrics + Loki
|
|
|
|
Total: €17.22/mo
|
|
```
|
|
|
|
### Example 3: Large Production (6 servers)
|
|
|
|
``yaml
|
|
Server 1-3 (CX21 - €6.94 each):
|
|
Template: db-node
|
|
Services: PostgreSQL cluster (3 nodes)
|
|
|
|
Server 4-5 (CX11 - €3.69 each):
|
|
Template: worker-node
|
|
Services: Workers (2 nodes)
|
|
|
|
Server 6 (CX11 - €3.69):
|
|
Template: control-plane-node
|
|
Services: Proxy + Control + Grafana + VictoriaMetrics + Loki
|
|
|
|
Total: €30.70/mo
|
|
```
|
|
|
|
---
|
|
|
|
## Template Selection Guide
|
|
|
|
**Start with these questions**:
|
|
|
|
1. **What's your budget?**
|
|
- €15/mo → Use composite templates
|
|
- €25/mo → Use pure templates
|
|
|
|
2. **What's your traffic?**
|
|
- <100 req/min → all-in-one
|
|
- <1000 req/min → worker-db-combo
|
|
- >1000 req/min → pure templates
|
|
|
|
3. **Do you need database HA?**
|
|
- Yes → db-node (3 nodes minimum)
|
|
- No → worker-db-combo
|
|
|
|
4. **Do you need centralized monitoring?**
|
|
- Yes → monitoring-node or worker-monitor-combo
|
|
- No → Skip (use worker vmagent only)
|
|
|
|
---
|
|
|
|
## Control Plane API Integration
|
|
|
|
Templates are used by the Control Plane API to provision servers:
|
|
|
|
```http
|
|
POST /api/v1/servers
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"name": "worker-1",
|
|
"template": "worker-node",
|
|
"hetzner_plan": "CX11",
|
|
"region": "fsn1",
|
|
"features": ["worker", "monitoring"],
|
|
"environment": "production"
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
``json
|
|
{
|
|
"server_id": "abc123",
|
|
"status": "provisioning",
|
|
"ip_address": "167.235.123.45",
|
|
"services": [
|
|
{"name": "worker", "port": 8002},
|
|
{"name": "vmagent", "port": 8429}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Resource Profiles
|
|
|
|
Each service can be tuned with resource profiles:
|
|
|
|
``yaml
|
|
minimal:
|
|
cpu_limit: "0.5"
|
|
memory_limit: "512Mi"
|
|
|
|
balanced:
|
|
cpu_limit: "2"
|
|
memory_limit: "2Gi"
|
|
|
|
cpu_intensive:
|
|
cpu_limit: "4"
|
|
memory_limit: "4Gi"
|
|
```
|
|
|
|
Default profiles are assigned in templates but can be overridden:
|
|
|
|
```http
|
|
POST /api/v1/servers
|
|
|
|
{
|
|
"template": "worker-node",
|
|
"overrides": {
|
|
"worker": {
|
|
"resource_profile": "cpu_intensive"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Choose template** based on budget and traffic
|
|
2. **Provision servers** via Control Plane API or Hetzner CLI
|
|
3. **Configure monitoring** (vmagent + promtail)
|
|
4. **Verify health** with Grafana dashboards
|
|
5. **Scale up/down** as needed
|
|
|
|
For more details, see:
|
|
- `STORAGE_CONFIGURATION.md` - Storage backend setup
|
|
- `QUICKSTART_HETZNER_STORAGE.md` - Hetzner Bucket Storage guide
|
|
- `4SERVER_DEPLOYMENT_GUIDE.md` - Multi-server deployment
|