Production Docker
Development Docker and production Docker are different disciplines. Production requires security hardening, monitoring, resource governance, and a plan for failures.
Security Best Practices
Run as Non-Root
# Create dedicated user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
Verify:
docker exec my-container whoami
# appuser (not root)
Use Minimal Base Images
| Image | Size | Use |
|---|---|---|
node:20 |
~1 GB | Avoid in production |
node:20-slim |
~200 MB | Better |
node:20-alpine |
~130 MB | Good default |
distroless |
~20 MB | Maximum security — no shell |
Google distroless images contain only your app and runtime — no package manager, no shell to exploit.
Scan Images for Vulnerabilities
# Docker Scout (built into Docker Desktop)
docker scout cves my-api:1.0
# Trivy (open source)
docker run aquasec/trivy image my-api:1.0
# Fail CI if critical CVEs found
trivy image --severity CRITICAL --exit-code 1 my-api:1.0
Integrate scanning into CI/CD — block deployment of images with critical vulnerabilities.
Secrets Management
Never bake secrets into images or commit them to Compose files:
# ❌ Bad
environment:
DATABASE_PASSWORD: supersecret
# ✅ Good — Docker secrets (Swarm mode)
secrets:
db_password:
file: ./secrets/db_password.txt
services:
db:
secrets:
- db_password
For Kubernetes: use Secrets and External Secrets Operator. For cloud: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager.
Read-Only Filesystem
services:
web:
read_only: true
tmpfs:
- /tmp
- /app/cache
Prevents attackers from writing malware to the container filesystem.
Resource Limits
Prevent one container from consuming all host resources:
services:
web:
deploy:
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
cpus: '0.25'
memory: 256M
Plain docker run:
docker run -m 512m --cpus 0.5 my-api:1.0
Logging
Containers log to stdout/stderr — capture with a log driver:
services:
web:
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
Production: ship logs to centralized systems:
logging:
driver: fluentd
options:
fluentd-address: localhost:24224
tag: myapp.web
Alternatives: ELK stack (Elasticsearch, Logstash, Kibana), Loki, CloudWatch, Datadog.
Health Checks and Restart Policies
services:
web:
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
| Restart policy | Behavior |
|---|---|
no |
Never restart |
always |
Always restart |
unless-stopped |
Restart unless manually stopped |
on-failure |
Restart only on non-zero exit |
Production Deployment Patterns
Single Server (Small Apps)
Internet → Nginx (SSL) → Docker Compose → [web, db, redis]
Suitable for: MVPs, internal tools, low-traffic sites.
Container Orchestration (Scale)
When one server isn’t enough, use an orchestrator:
| Tool | Best For |
|---|---|
| Kubernetes (K8s) | Industry standard, complex apps, multi-cloud |
| AWS ECS/Fargate | AWS-native, less operational overhead |
| Google Cloud Run | Serverless containers, auto-scaling |
| Docker Swarm | Simpler than K8s, Docker-native |
Kubernetes deployment sketch:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-api
spec:
replicas: 3
selector:
matchLabels:
app: my-api
template:
metadata:
labels:
app: my-api
spec:
containers:
- name: api
image: myregistry.io/my-api:1.0.0
ports:
- containerPort: 3000
resources:
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
CI/CD Integration
Build and push images in CI, deploy by updating image tag:
# GitHub Actions excerpt
- name: Build and push
uses: docker/build-push-action@v5
with:
push: true
tags: myregistry.io/my-api:${{ github.sha }}
Deploy triggers on merge to main — never build images on production servers.
Zero-Downtime Deployments
Rolling update: replace containers one at a time behind a load balancer.
Blue-green: run two identical environments; switch traffic atomically.
Canary: route 5% of traffic to new version; increase if metrics are healthy.
# Docker Swarm rolling update
docker service update --image my-api:2.0.0 my-api
Monitoring
Monitor containers, not just the host:
| Metric | Tool |
|---|---|
| Container CPU/memory | cAdvisor, Prometheus |
| Application metrics | Prometheus + Grafana |
| Traces | Jaeger, OpenTelemetry |
| Uptime | Uptime Kuma, Pingdom |
# Prometheus scrape target
services:
web:
labels:
- "prometheus.scrape=true"
- "prometheus.port=3000"
- "prometheus.path=/metrics"
Backup and Disaster Recovery
- Database volumes — scheduled snapshots (pg_dump, mysqldump, or volume snapshots)
- Image registry — images are reproducible from Dockerfile + Git
- Configuration — store in Git (Compose files, K8s manifests)
- Test restores — a backup you haven’t restored is worthless
Production Checklist
- Non-root user in Dockerfile
- Specific image tags (not
latest) - Image vulnerability scanning in CI
- Secrets from vault, not environment files in Git
- Resource limits configured
- Health checks and restart policies
- Centralized logging
- Monitoring and alerting
- HTTPS termination at load balancer
- Backup strategy tested
What Comes Next
Automate the entire pipeline with CI/CD — build, test, scan, and deploy containers on every merge to main.