Delve — Hosting & Infrastructure Plan

Hosting Philosophy
Phased Infrastructure
Phase 1 — Solo Dev / Alpha
Phase 2 — Closed Beta (500–2,000 players)
Phase 3 — Launch (2,000–10,000 players)
Phase 4 — Growth (10,000–50,000 players)
Phase 5 — Scale (50,000+ players)
Service-by-Service Breakdown
Cost vs. Revenue Analysis
Domain, DNS & CDN
Backups & Disaster Recovery
Local Development
CI/CD Pipeline
Monitoring Stack
Decision Log

1. Hosting Philosophy

Delve is an indie project with an ethical monetization model ($3/mo subscriptions, no whales). Infrastructure costs must stay well below revenue at every stage. This means:

No Kubernetes until it’s actually needed. K8s adds operational overhead that doesn’t pay off until you have multiple engineers and dozens of services. A single VPS running Docker Compose can handle thousands of concurrent players for Delve’s async workload.
No managed cloud databases at small scale. A self-hosted PostgreSQL on a dedicated VPS is 5-10x cheaper than AWS RDS or equivalent, and fine when you’re the only operator.
Graduate infrastructure with player count. Every upgrade should be a response to measured bottlenecks, not anticipated ones.
Prefer value VPS providers. Hetzner, OVH, and Vultr offer 3-5x the compute-per-dollar compared to AWS/GCP/Azure for baseline infrastructure.
Use managed services only where the operational cost of self-hosting exceeds the price difference. Email delivery, push notifications, and payment processing are always managed. Databases and app servers are self-hosted until scale demands otherwise.

2. Phased Infrastructure

Phase	Players	Monthly Cost	Infrastructure
1. Solo Dev / Alpha	1–50	~$10–25	Single VPS
2. Closed Beta	500–2,000	~$50–100	2 VPS + managed DB option
3. Launch	2,000–10,000	~$150–350	3-4 VPS, dedicated DB server
4. Growth	10,000–50,000	~$500–1,500	Multi-server, read replicas, Redis cluster
5. Scale	50,000+	~$2,000–5,000+	Managed K8s, multi-region

3. Phase 1 — Solo Dev / Alpha

Goal: Get the game running, playable, testable. You and a handful of testers.

Infrastructure

Single VPS (Hetzner CX32 or equivalent)
├── 4 vCPU, 8 GB RAM, 80 GB NVMe
├── Docker Compose runs everything:
│   ├── Caddy (reverse proxy + auto TLS)
│   ├── delve-api (Rust binary)
│   ├── delve-workers (Rust binary — simulation, economy, crafting, pvp)
│   ├── PostgreSQL 16
│   └── Redis 7 (Valkey)
├── SvelteKit SPA served by Caddy as static files
└── Cost: ~€8/mo ($9/mo) on Hetzner Cloud

Provider: Hetzner Cloud

Spec	Value
Plan	CX32 (shared vCPU)
CPU	4 vCPU
RAM	8 GB
Disk	80 GB NVMe
Transfer	20 TB/mo
Location	Falkenstein, DE (or Ashburn, VA for US)
Cost	€7.59/mo (~$8.50)

Why Hetzner: Best price-to-performance for European/US hosting. Their ARM options (CAX line) are even cheaper if the stack supports it (Rust cross-compiles to aarch64 trivially, PostgreSQL + Redis run fine on ARM).

What Runs Where

Everything on one box. Docker Compose with a single docker-compose.yml. PostgreSQL data on a persistent volume. Caddy handles TLS via Let’s Encrypt.

Backups

PostgreSQL: Daily pg_dump to Hetzner’s 20GB backup space (free with server) or to a Backblaze B2 bucket ($0.005/GB/mo)
Schedule: cron job at 04:00 UTC

Total Phase 1 Cost

Service	Monthly Cost
Hetzner CX32	$9
Domain (.com)	~$1 (amortized)
Backblaze B2 (backups)	$0.10
Email (Resend free tier)	$0
Total	~$10/mo

4. Phase 2 — Closed Beta (500–2,000 players)

Goal: Real players, real load. Validate the game systems, economy, and multiplayer features. Start collecting subscription revenue.

Infrastructure

VPS 1 — Application (Hetzner CX42)
├── 8 vCPU, 16 GB RAM, 160 GB NVMe
├── Docker Compose:
│   ├── Caddy
│   ├── delve-api
│   ├── delve-workers (×2 containers)
│   ├── delve-workers --scheduler (×1)
│   └── Redis 7
└── Cost: ~€16/mo ($18/mo)

VPS 2 — Database (Hetzner CX32)
├── 4 vCPU, 8 GB RAM, 80 GB NVMe
├── PostgreSQL 16 (dedicated, not sharing CPU with app)
├── Automated WAL backups to B2
└── Cost: ~€8/mo ($9/mo)

Why Split the Database

PostgreSQL performance degrades when it competes for CPU and I/O with the application. Isolating it on a dedicated VPS is the highest-impact scaling move at this stage, and it costs only $9/mo.

Push Notifications

At this phase, mobile testers need push notifications.

Service	Free Tier	Paid
Firebase Cloud Messaging (FCM)	Unlimited Android + web push	Free
APNs (via Firebase)	Unlimited iOS push	Free (Apple Developer Program $99/yr already required)

FCM is free at any scale. The only cost is the Apple Developer Program membership ($99/yr) required to publish to iOS.

Payments

Stripe for subscription billing and one-time purchases. Stripe processes payment on the web (not via in-app purchase), so no App Store / Play Store commission on subscriptions.

Service	Cost
Stripe	2.9% + $0.30 per transaction

At $3/mo subscription: Stripe takes ~$0.39, you keep ~$2.61 per subscriber.

Email

Transactional email for account verification, password reset, subscription receipts.

Service	Free Tier	After Free
Resend	3,000 emails/mo	$20/mo for 50K
Postmark	100 emails/mo	$15/mo for 10K

Resend’s free tier covers beta easily. Upgrade to paid at launch.

Total Phase 2 Cost

Service	Monthly Cost
Hetzner CX42 (app)	$18
Hetzner CX32 (db)	$9
Backblaze B2	$0.50
Domain + DNS	$1
Apple Developer	$8 (amortized)
Stripe fees	Variable
Email (Resend free)	$0
Total	~$37/mo (before Stripe)

Revenue at This Phase

If 500 beta players, 15% subscribe: 75 × $2.61 net = ~$196/mo. Comfortably profitable on infrastructure.

5. Phase 3 — Launch (2,000–10,000 players)

Goal: Public launch. Stable, performant, ready for organic growth.

Infrastructure

VPS 1 — API (Hetzner CX42)
├── 8 vCPU, 16 GB RAM, 160 GB NVMe
├── Caddy (reverse proxy)
├── delve-api (×2 containers, load balanced by Caddy)
└── Cost: ~€16/mo ($18/mo)

VPS 2 — Workers (Hetzner CX42)
├── 8 vCPU, 16 GB RAM, 160 GB NVMe
├── delve-workers (×4 containers)
├── delve-workers --scheduler (×1)
└── Cost: ~€16/mo ($18/mo)

VPS 3 — Database (Hetzner CX42)
├── 8 vCPU, 16 GB RAM, 160 GB NVMe
├── PostgreSQL 16 (primary)
├── PgBouncer (connection pooling)
├── Automated WAL archiving to B2
└── Cost: ~€16/mo ($18/mo)

VPS 4 — Redis + Monitoring (Hetzner CX32)
├── 4 vCPU, 8 GB RAM, 80 GB NVMe
├── Redis 7 (dedicated, persistent)
├── Prometheus + Grafana + Loki (monitoring stack)
└── Cost: ~€8/mo ($9/mo)

CDN — Cloudflare (Free plan)
├── Static SPA assets, game data JSON
├── DDoS protection
└── Cost: $0 (free plan is sufficient)

Why 4 Servers

Server	Bottleneck it addresses
API	Handles all REST polling traffic. Isolated so poll load doesn’t compete with simulation CPU. Rust’s efficiency means a CX42 handles this easily.
Workers	Simulation is CPU-intensive. Isolated so a spike in dungeon completions doesn’t lag the API.
Database	PostgreSQL needs dedicated I/O. Shared CPU causes query latency spikes.
Redis + Monitoring	Redis needs stable memory. Monitoring (Prometheus, Grafana) is a nice-to-have that shouldn’t compete with game systems.

Object Storage

Run replay logs (JSONB stored in DB for now, but if they get large):

Service	Cost
Backblaze B2	$0.005/GB/mo storage, $0.01/GB egress
Hetzner Object Storage	€0.0065/GB/mo

At 10K players with ~50KB per run log and 3 runs/day average: ~1.5 GB/day = ~45 GB/mo. Cost: ~$0.25/mo on B2. Negligible.

Total Phase 3 Cost

Service	Monthly Cost
Hetzner CX42 (API)	$18
Hetzner CX42 (workers)	$18
Hetzner CX42 (database)	$18
Hetzner CX32 (Redis + monitoring)	$9
Cloudflare (CDN)	$0
Backblaze B2	$2
Resend (email)	$20
Apple Developer	$8
Domain + DNS	$1
Sentry (error tracking, free tier)	$0
Total	~$94/mo

Revenue at This Phase

If 5,000 active players, 15% subscribe: 750 × $2.61 = ~$1,958/mo. Plus one-time purchases (~$0.50 ARPU across all players): +$2,500 cumulative.

Infrastructure is ~6% of subscription revenue. Very healthy margin.

6. Phase 4 — Growth (10,000–50,000 players)

Goal: Handle sustained growth. Start introducing redundancy for uptime guarantees.

Infrastructure

Load Balancer — Hetzner Load Balancer
├── Routes /api/* to API pool
├── Health checks, automatic failover
└── Cost: €6/mo

API Pool (2× Hetzner CX42)
├── 8 vCPU, 16 GB RAM each
├── delve-api + Caddy per node
└── Cost: 2 × €16 = €32/mo

Worker Pool (3× Hetzner CX42)
├── delve-workers (distributed via Redis job queue)
├── Economy queue consumed serially by one instance
├── delve-workers --scheduler on one designated node
└── Cost: 3 × €16 = €48/mo

Database — Primary + Read Replica
├── Primary: Hetzner CX52 (16 vCPU, 32 GB RAM)
│   ├── PostgreSQL 16 + PgBouncer
│   └── Cost: €36/mo
├── Read Replica: Hetzner CX42 (8 vCPU, 16 GB RAM)
│   ├── Streaming replication, serves read-heavy queries
│   │   (leaderboards, marketplace search, profile lookups)
│   └── Cost: €16/mo
└── Total: €52/mo

Redis — Hetzner CX42
├── 16 GB RAM, persistent, Sentinel for failover (or Valkey cluster)
└── Cost: €16/mo

Monitoring — Hetzner CX32
├── Prometheus, Grafana, Loki, Alertmanager
├── Sentry (cloud, Team plan for higher limits)
└── Cost: €8/mo + $26/mo Sentry

Search

At this scale, marketplace search benefits from a dedicated search engine:

Service	Hosting	Cost
Meilisearch	Self-hosted on worker VPS	$0 (already have spare capacity)
Meilisearch Cloud	Managed	$30/mo (if self-hosting is too much operational burden)

Total Phase 4 Cost

Service	Monthly Cost
Hetzner Load Balancer	$7
API servers (2×)	$36
Worker servers (3×)	$54
Database primary	$40
Database replica	$18
Redis	$18
Monitoring VPS	$9
Sentry Team	$26
Cloudflare Pro	$20
Backblaze B2	$5
Resend	$20
Apple Developer	$8
Domain	$1
Total	~$262/mo

Revenue at This Phase

If 25,000 active players, 15% subscribe: 3,750 × $2.61 = ~$9,788/mo. Infrastructure is ~3% of subscription revenue.

At this point you’re making real money and could afford managed services or an ops hire if needed.

7. Phase 5 — Scale (50,000+ players)

Goal: Professional-grade infrastructure. Consider managed Kubernetes, multi-region, and a dedicated ops approach.

When to Move to Kubernetes

Move to K8s when at least two of these are true:

More than one person is deploying and operating infrastructure
You need auto-scaling (traffic is spiky, not steady)
You’re managing 15+ containers across 10+ servers and Docker Compose is unwieldy
You need zero-downtime rolling deployments across multiple server pools

Infrastructure

Kubernetes Cluster (Hetzner Cloud or CIVO)
├── Control plane (managed by provider)
├── Node pool — API: 3× CX42 (auto-scaling 2-5)
├── Node pool — Workers: 4× CX42 (auto-scaling 2-8)
└── Estimated: €150-350/mo for nodes

Managed PostgreSQL (Hetzner Managed DB or Neon)
├── Primary: 16 vCPU, 64 GB RAM
├── 2 read replicas
├── Automated backups, point-in-time recovery
├── Connection pooling (PgBouncer built-in)
└── Estimated: €150-300/mo

Managed Redis (Upstash or self-hosted Valkey cluster)
├── 3-node cluster for HA
├── 32 GB total memory
└── Estimated: €50-100/mo

Multi-Region Consideration:
├── If majority US players: US primary + EU edge CDN
├── If global: US primary + EU secondary with DB replication
└── Add ~50-100% to compute costs for second region

Total Phase 5 Cost (Estimated)

Service	Monthly Cost
Kubernetes nodes	$300–500
Managed PostgreSQL	$200–350
Managed Redis	$60–120
Monitoring (Grafana Cloud or self-hosted)	$50–100
Cloudflare Pro	$20
Object storage	$15
Email (Resend or Postmark)	$40
Sentry Business	$80
Push notifications	$0 (FCM free)
Total	~$800–1,300/mo

Revenue at This Phase

If 50,000 active players, 15% subscribe: 7,500 × $2.61 = ~$19,575/mo. If 100,000 active players: ~$39,150/mo.

Infrastructure at 2-5% of revenue. Very healthy.

8. Service-by-Service Breakdown

PostgreSQL

Phase	Setup	Cost
1–2	Single instance on shared/dedicated VPS	$0–9
3	Dedicated VPS, PgBouncer, WAL backups	$18
4	Primary + read replica, PgBouncer	$58
5	Managed, primary + 2 replicas	$200–350

Key configuration:

shared_buffers: 25% of RAM
effective_cache_size: 75% of RAM
work_mem: 64MB (for marketplace queries, leaderboard aggregation)
max_connections: 200 (with PgBouncer in front, actual app connections pool at ~20 per API instance)
WAL level: replica (for streaming replication readiness from day 1)

Redis

Phase	Setup	Cost
1–2	Shared VPS, appendonly persistence	$0
3	Dedicated VPS	$9
4	Dedicated VPS, Sentinel	$18
5	Cluster or managed	$60–120

Memory estimation at 50K players:

Sessions: ~50K × 0.5KB = 25MB
Leaderboards: ~10 boards × 50K entries × 0.1KB = 50MB
Job queue: ~10K pending × 1KB = 10MB
PVP queues: negligible
Rate limiting: ~50K counters × 0.1KB = 5MB
Total: ~90MB — Redis memory is not a concern until extreme scale

Caddy / Load Balancer

Phase	Setup
1–3	Caddy on the API VPS (reverse proxy + auto TLS)
4+	Hetzner Load Balancer ($7/mo) in front of Caddy instances

Caddy handles:

Automatic HTTPS via Let’s Encrypt
HTTP/2
Static file serving (SPA bundle)
Gzip/brotli compression

Job Queue

The custom Redis-backed job queue runs in-process on worker binaries. No separate infrastructure. Job types and their expected volumes:

Job Type	Trigger	Volume (10K players)
`resolve-run`	Run timer completes	~30K/day
`resolve-craft`	Craft timer completes	~15K/day
`resolve-gathering`	Expedition completes	~10K/day
`marketplace-buy`	Player purchases listing	~5K/day
`auction-expiry`	Every 5 min (batch)	288/day
`pvp-resolve`	Match found	~3K/day
`daily-reset`	00:00 UTC	1/day
`weekly-reset`	Monday 00:00 UTC	1/week
`mail-delivery`	1 hour after send	~2K/day
`guild-buff-expiry`	Every 1 min (batch)	1,440/day

9. Cost vs. Revenue Analysis

Revenue Model (from monetization doc)

Subscription: $3/mo, ~$2.61 net after Stripe fees
Target subscription rate: 15% of active players
One-time purchases: ~$0.50 ARPU lifetime average

Break-Even Table

Active Players	Subscribers (15%)	Subscription Revenue	Infra Cost	Margin
500	75	$196/mo	$37/mo	$159
2,000	300	$783/mo	$60/mo	$723
5,000	750	$1,958/mo	$94/mo	$1,864
10,000	1,500	$3,915/mo	$150/mo	$3,765
25,000	3,750	$9,788/mo	$262/mo	$9,526
50,000	7,500	$19,575/mo	$1,000/mo	$18,575
100,000	15,000	$39,150/mo	$2,500/mo	$36,650

Infrastructure stays at 2-6% of revenue across all phases. The async, server-resolved nature of Delve means the compute cost per player is very low compared to real-time multiplayer games.

Break-Even Point

At $10/mo infrastructure (Phase 1), you need 4 subscribers to break even on hosting. That’s ~27 active players at 15% subscription rate. The game is profitable on infrastructure almost immediately once anyone is paying.

The real costs are development time (your time), not infrastructure.

10. Domain, DNS & CDN

Domain

DNS

Cloudflare DNS (free):

delve.game → Cloudflare CDN → origin server (static SPA)
api.delve.game → origin server (REST API, proxied through Cloudflare is fine — no WebSocket concerns)

CDN

Cloudflare free plan:

Cache the SPA shell (index.html, JS, CSS, images)
Cache static game data files (item templates, skill definitions exported as JSON)
DDoS protection (useful once the game has any visibility)
Edge compression (Brotli)

Since there are no WebSockets, all traffic can be proxied through Cloudflare from day one — free DDoS protection and caching for the SPA assets. API traffic proxied through Cloudflare adds ~10-20ms latency but gains DDoS protection, which is worth it.

At Phase 4+, Cloudflare Pro ($20/mo) adds:

Better DDoS mitigation
WAF rules
Cache analytics

11. Backups & Disaster Recovery

Backup Strategy

Data	Method	Frequency	Retention	Storage
PostgreSQL	`pg_dump` (Phase 1–2), WAL archiving (Phase 3+)	Daily full + continuous WAL	30 days full, 7 days WAL	Backblaze B2
Redis	RDB snapshots	Every 6 hours	7 days	Local + B2
Run logs	Stored in PostgreSQL (JSONB)	Covered by DB backup	Same as DB	Same as DB
Application code	Git repository	Every push	Infinite	GitHub
Docker images	Container registry	Every deploy	30 versions	GitHub Container Registry (free for public, 500MB free for private)
Secrets/config	Encrypted in repo or secrets manager	Every change	Infinite	Git (encrypted) or Doppler/Infisical

Disaster Recovery

Scenario	Recovery
App server dies	Deploy new VPS from Docker image (10 min). Stateless — no data loss.
Database server dies	Provision new VPS, restore from latest B2 backup + WAL replay. RPO: minutes. RTO: 30–60 min.
Redis dies	Provision new VPS, restore from RDB snapshot. Sessions regenerate on next login. Leaderboards rebuild from DB. RPO: 6 hours. RTO: 15 min.
Complete datacenter failure	Restore all from B2 backups to a different Hetzner datacenter (or different provider entirely). RTO: 2-4 hours.
Corrupt database (bad migration, bug)	Point-in-time recovery from WAL archive. Restore to any moment before corruption.

Backup Testing

Monthly: restore the latest PostgreSQL backup to a temporary VPS and run a basic health check query. Automate this in CI. Untested backups are not backups.

12. Local Development

Docker Compose (dev)

# docker-compose.yml (development)
services:
  postgres:
    image: postgres:16-alpine
    ports: ["5432:5432"]
    environment:
      POSTGRES_DB: delve
      POSTGRES_USER: delve
      POSTGRES_PASSWORD: devpassword
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: valkey/valkey:7-alpine
    ports: ["6379:6379"]

  mailpit:
    image: axllent/mailpit
    ports: ["8025:8025", "1025:1025"]
    # Catches all outgoing email in dev — accessible at localhost:8025

volumes:
  pgdata:

The Rust API server, workers, and SvelteKit dev server run directly on the host (not in containers) for fast iteration. They connect to the containerized dependencies.

# Terminal 1: Start dependencies
docker compose up -d

# Terminal 2: API server (with cargo-watch for auto-rebuild)
cargo watch -x 'run --bin api'

# Terminal 3: Workers (with cargo-watch)
cargo watch -x 'run --bin workers'

# Terminal 4: Client (SvelteKit dev server)
pnpm --filter client dev

Rust build times: Initial full build will take 1-3 minutes. Incremental rebuilds via cargo-watch are typically 5-15 seconds. Use cargo-chef in the Docker multi-stage build to cache dependency compilation separately from application code.

Seed Data

A seed binary (cargo run --bin seed) populates the dev database with:

Test user accounts (free, patron, admin)
Characters at various levels with gear
Active marketplace listings
Guild with members
In-progress runs, crafts, and expeditions

13. CI/CD Pipeline

GitHub Actions

On Pull Request:
  ├── cargo fmt --check (formatting)
  ├── cargo clippy (linting)
  ├── cargo test (unit + integration tests against Docker Compose services)
  ├── cargo sqlx prepare --check (verify query cache is up to date)
  ├── Client: pnpm lint + pnpm check + pnpm build
  └── Build check: cargo build --release (ensure it compiles)

On Push to main:
  ├── All PR checks
  ├── Build Docker images (delve-api, delve-workers) via cargo-chef multi-stage
  ├── Build SvelteKit SPA
  ├── Push images to GitHub Container Registry
  ├── Deploy to staging (auto)
  └── Smoke test against staging

On Git Tag (v*):
  ├── All main checks
  ├── Build + push production Docker images
  ├── Deploy to production (manual approval gate)
  ├── Build iOS (Xcode Cloud or self-hosted Mac runner)
  ├── Build Android (.aab)
  ├── Upload to TestFlight / Play Console internal track
  └── Tag container images with version

Deployment (Phase 1–4)

Simple SSH-based deployment. No need for fancy orchestration:

# deploy.sh (run from CI or manually)
ssh deploy@app-server "
  cd /opt/delve &&
  docker compose pull &&
  docker compose up -d --remove-orphans
"

ssh deploy@worker-server "
  cd /opt/delve &&
  docker compose pull &&
  docker compose up -d --remove-orphans
"

Database migrations run as a separate step before app deployment:

ssh deploy@app-server "
  cd /opt/delve &&
  docker compose run --rm api sqlx migrate run
"

Uptime: UptimeRobot (free, 50 monitors) — ping API endpoint every 5 min
Errors: Sentry free tier (5K events/mo)
Logs: docker compose logs — review manually

Phase 3+: Full Stack

All self-hosted on the monitoring VPS:

Prometheus
├── Scrapes API server metrics (request rate, latency, error rate)
├── Scrapes worker metrics (queue depth, job duration, failure rate)
├── Scrapes PostgreSQL (pg_exporter: connections, query time, replication lag)
├── Scrapes Redis (redis_exporter: memory, commands/sec, keyspace)
├── Scrapes Node exporter (CPU, RAM, disk, network per VPS)
└── Retention: 30 days local

Grafana
├── Dashboards:
│   ├── Game Overview: active players, runs/hour, marketplace volume
│   ├── API Performance: request rate, p50/p95/p99 latency, error rate
│   ├── Worker Health: queue depths, processing time, failure rate
│   ├── Database: query latency, connections, replication lag
│   ├── Redis: memory usage, commands/sec, hit rate
│   └── Infrastructure: CPU, RAM, disk, network per server
└── Alerts → Discord webhook (or email)

Loki
├── Aggregates structured JSON logs from all services (via tracing + tracing-subscriber)
├── Queryable from Grafana
└── Retention: 14 days

Alertmanager
├── Routes alerts to Discord channel
├── Key alerts:
│   ├── API p95 > 500ms for 5 min
│   ├── Any worker queue > 1000 pending for 10 min
│   ├── PostgreSQL replication lag > 30s
│   ├── Any server disk > 85%
│   ├── Any server CPU > 90% sustained 10 min
│   └── Error rate > 1% for 5 min
└── Silence/snooze via Grafana UI

Application Metrics (exposed via Prometheus client)

#![allow(unused)]
fn main() {
// Key metrics (via metrics + metrics-exporter-prometheus crates):

// API server:
http_request_duration_seconds    // histogram, labeled by route + method
http_requests_total              // counter, labeled by route + status
notification_poll_duration_ms    // histogram — track polling endpoint performance

// Workers:
simulation_runs_resolved_total   // counter
simulation_run_duration_seconds  // histogram
job_queue_depth                  // gauge, labeled by queue name
job_processing_duration_seconds  // histogram, labeled by job type
job_failures_total               // counter, labeled by job type

// Business metrics:
runs_started_total               // counter
marketplace_transactions_total   // counter
subscriptions_active             // gauge (query from DB, cached)
}

15. Decision Log

Key hosting decisions and the reasoning behind them. Update this as decisions change.

Decision	Chosen	Alternatives Considered	Why
Backend language	Rust	TypeScript/Node.js, Go, Python, C#	Best performance for CPU-bound simulation engine. Strong type system. Single binary deploys (~10-20MB Docker images). No GC pauses.
API framework	Axum	Actix-web, Rocket, Warp	Tokio-native, ergonomic extractors, tower middleware ecosystem. Most active Rust web framework.
Client-server communication	REST + polling	WebSockets, SSE, long polling	Game is async — players wait minutes to hours. Polling every 30-60s is adequate. Eliminates persistent connection management entirely. Server stays stateless.
Chat	Discord (external)	In-game WebSocket chat	Community already lives on Discord. Eliminates real-time messaging system, chat storage, moderation tools, presence tracking. Massive complexity reduction.
Primary hosting provider	Hetzner Cloud	AWS, DigitalOcean, Vultr, OVH	3-5x cheaper than AWS for equivalent specs. Reliable. EU + US datacenters. ARM options for future savings.
Container orchestration (Phase 1–4)	Docker Compose	Kubernetes, Nomad, bare metal	K8s is overkill for <10 containers on <10 servers. Docker Compose is simple, well-understood, and sufficient.
Database	Self-hosted PostgreSQL	Neon, Supabase, PlanetScale, AWS RDS	Self-hosted is $9–40/mo vs. $50–200/mo managed for equivalent specs. Acceptable risk for a single operator. Move to managed at Phase 5.
CDN	Cloudflare (free)	Bunny CDN, Fastly, AWS CloudFront	Free tier is generous. DDoS protection included. Upgrade to Pro at Phase 4 ($20/mo).
Object storage	Backblaze B2	AWS S3, Hetzner Object Storage, Cloudflare R2	Cheapest. S3-compatible API. Free egress via Cloudflare bandwidth alliance.
Email	Resend	Postmark, SendGrid, SES	Good free tier (3K/mo). Simple API. Scales cheaply.
Push notifications	Firebase (FCM)	OneSignal, Pusher	Free at any scale. Direct integration with Capacitor.
Payments	Stripe (web checkout)	In-app purchase (Apple/Google)	Avoids 30% platform commission. Web checkout is compliant if not selling digital goods consumed within the app (subscriptions for server-side speed are defensible).
Monitoring	Self-hosted Prometheus/Grafana	Datadog, New Relic, Grafana Cloud	Free. Full control. Datadog would cost $100+/mo for equivalent coverage.
Error tracking	Sentry	Bugsnag, self-hosted	Best free tier. Rust + JS SDKs. Essential for client + server error visibility.
Secrets management	Environment variables (Phase 1–3), Infisical (Phase 4+)	Doppler, HashiCorp Vault, AWS Secrets Manager	Env vars are fine while it’s one person deploying. Move to a proper secrets manager when there are multiple operators.
App Store payments strategy	Web checkout via Stripe	Native in-app purchase	Apple/Google take 30% (or 15% for small business). At $3/mo, that’s $0.45–0.90 per sub lost. Web checkout keeps the full margin minus Stripe’s 2.9%+$0.30. Requires careful compliance with store policies.

Keyboard shortcuts

Delve — Game Design Documents