Scaling Node.js Applications
Node.js runs on a single thread for JavaScript execution. Scaling requires using multiple cores, distributing load across processes and machines, and eliminating I/O bottlenecks. This guide covers patterns from single-server clustering to multi-region deployments.
Understanding the Event Loop Bottleneck
Request → Event Loop → async I/O (non-blocking)
↓
CPU work (blocking!)
CPU-intensive tasks (JSON parsing huge payloads, image resizing, crypto) block the event loop. Solutions:
- Worker threads for CPU-bound work
- Separate microservices for heavy computation
- Horizontal scaling — more instances behind a load balancer
Monitor event loop lag with perf_hooks or prom-client event loop metrics.
Cluster Module (Multi-Core)
import cluster from 'node:cluster';
import os from 'node:os';
import process from 'node:process';
if (cluster.isPrimary) {
const numCPUs = os.cpus().length;
console.log(`Primary ${process.pid} spawning ${numCPUs} workers`);
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker) => {
console.log(`Worker ${worker.process.pid} died, restarting`);
cluster.fork();
});
} else {
await import('./server.js');
}
Each worker is a separate process with its own memory. PM2 automates this:
pm2 start dist/server.js -i max --name api
pm2 startup && pm2 save
Load Balancing
Nginx / ALB
/ | \
Node-1 Node-2 Node-3
\ | /
Redis
PostgreSQL
Nginx upstream
upstream node_api {
least_conn;
server 10.0.1.10:3000;
server 10.0.1.11:3000;
server 10.0.1.12:3000;
}
server {
location / {
proxy_pass http://node_api;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header X-Real-IP $remote_addr;
}
}
Use least_conn for long-lived connections; round_robin for uniform short requests.
Stateless Application Design
Each instance must handle any request:
| Stateful (avoid) | Stateless (prefer) |
|---|---|
| In-memory sessions | Redis session store |
| Local file uploads | S3 / object storage |
| In-process caches only | Shared Redis cache |
| WebSocket on one node | Redis adapter for Socket.IO |
import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';
const pub = createClient({ url: process.env.REDIS_URL });
const sub = pub.duplicate();
io.adapter(createAdapter(pub, sub));
Connection Pooling
Database connections are expensive. Limit per instance:
import { Pool } from 'pg';
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // max connections per instance
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 5_000,
});
Total connections = instances × pool.max. Stay under database limits (RDS default ~100–500). Use PgBouncer for connection multiplexing.
Caching Layers
Client → CDN → API Gateway cache → Redis → Database
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
async function getUser(id: string) {
const cached = await redis.get(`user:${id}`);
if (cached) return JSON.parse(cached);
const user = await db.user.findUnique({ where: { id } });
await redis.setex(`user:${id}`, 300, JSON.stringify(user));
return user;
}
Invalidate on write:
await db.user.update({ where: { id }, data });
await redis.del(`user:${id}`);
Rate Limiting at Scale
In-memory rate limiters fail with multiple instances. Use Redis:
import { RateLimiterRedis } from 'rate-limiter-flexible';
const limiter = new RateLimiterRedis({
storeClient: redis,
keyPrefix: 'rl',
points: 100,
duration: 60,
});
app.use(async (req, res, next) => {
try {
await limiter.consume(req.ip);
next();
} catch {
res.status(429).json({ error: 'Too many requests' });
}
});
Auto-Scaling
Kubernetes Horizontal Pod Autoscaler:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Scale on CPU, memory, or custom metrics (request rate, queue depth). Set minReplicas ≥ 2 for availability.
Health Checks and Graceful Shutdown
app.get('/health', async (req, res) => {
try {
await pool.query('SELECT 1');
await redis.ping();
res.json({ status: 'ok' });
} catch {
res.status(503).json({ status: 'degraded' });
}
});
process.on('SIGTERM', async () => {
console.log('Shutting down gracefully');
server.close();
await pool.end();
process.exit(0);
});
Kubernetes sends SIGTERM before removing pods — finish in-flight requests before exit.
Capacity Planning
Estimate required instances:
Required RPS = peak traffic × safety factor (1.5–2×)
Per-instance RPS = load test result (e.g., 500 RPS at p95 < 200ms)
Instances = Required RPS / Per-instance RPS
Load test with k6:
import http from 'k6/http';
import { check } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 100 },
{ duration: '5m', target: 500 },
{ duration: '2m', target: 0 },
],
};
export default function () {
const res = http.get('https://api.example.com/users');
check(res, { 'status is 200': (r) => r.status === 200 });
}
Observability at Scale
Centralize logs (structured JSON), metrics (Prometheus), and traces (OpenTelemetry):
import { trace } from '@opentelemetry/api';
const span = trace.getTracer('api').startSpan('getUser');
try {
const user = await fetchUser(id);
return user;
} finally {
span.end();
}
Alert on: error rate > 1%, p95 latency doubling, event loop lag > 100ms.
Scaling Checklist
- Application stateless; sessions in Redis
- Cluster mode or multiple K8s replicas
- Load balancer with health checks
- DB connection pooling with total limit calculated
- Redis for cache, rate limits, pub/sub
- CDN for static assets
- Graceful shutdown handling
- Load tested at 2× expected peak
- Auto-scaling policies configured
Scaling Node.js is less about the runtime and more about architecture: stateless services, shared stores, and measured capacity drive reliable growth.