Development Docker and production Docker are different disciplines. Production requires security hardening, monitoring, resource governance, and a plan for failures.

Security Best Practices

Run as Non-Root

  # Create dedicated user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
  

Verify:

  docker exec my-container whoami
# appuser (not root)
  

Use Minimal Base Images

Image Size Use
node:20 ~1 GB Avoid in production
node:20-slim ~200 MB Better
node:20-alpine ~130 MB Good default
distroless ~20 MB Maximum security — no shell

Google distroless images contain only your app and runtime — no package manager, no shell to exploit.

Scan Images for Vulnerabilities

  # Docker Scout (built into Docker Desktop)
docker scout cves my-api:1.0

# Trivy (open source)
docker run aquasec/trivy image my-api:1.0

# Fail CI if critical CVEs found
trivy image --severity CRITICAL --exit-code 1 my-api:1.0
  

Integrate scanning into CI/CD — block deployment of images with critical vulnerabilities.

Secrets Management

Never bake secrets into images or commit them to Compose files:

  # ❌ Bad
environment:
  DATABASE_PASSWORD: supersecret

# ✅ Good — Docker secrets (Swarm mode)
secrets:
  db_password:
    file: ./secrets/db_password.txt

services:
  db:
    secrets:
      - db_password
  

For Kubernetes: use Secrets and External Secrets Operator. For cloud: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager.

Read-Only Filesystem

  services:
  web:
    read_only: true
    tmpfs:
      - /tmp
      - /app/cache
  

Prevents attackers from writing malware to the container filesystem.

Resource Limits

Prevent one container from consuming all host resources:

  services:
  web:
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M
  

Plain docker run:

  docker run -m 512m --cpus 0.5 my-api:1.0
  

Logging

Containers log to stdout/stderr — capture with a log driver:

  services:
  web:
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
  

Production: ship logs to centralized systems:

  logging:
  driver: fluentd
  options:
    fluentd-address: localhost:24224
    tag: myapp.web
  

Alternatives: ELK stack (Elasticsearch, Logstash, Kibana), Loki, CloudWatch, Datadog.

Health Checks and Restart Policies

  services:
  web:
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
  
Restart policy Behavior
no Never restart
always Always restart
unless-stopped Restart unless manually stopped
on-failure Restart only on non-zero exit

Production Deployment Patterns

Single Server (Small Apps)

  Internet → Nginx (SSL) → Docker Compose → [web, db, redis]
  

Suitable for: MVPs, internal tools, low-traffic sites.

Container Orchestration (Scale)

When one server isn’t enough, use an orchestrator:

Tool Best For
Kubernetes (K8s) Industry standard, complex apps, multi-cloud
AWS ECS/Fargate AWS-native, less operational overhead
Google Cloud Run Serverless containers, auto-scaling
Docker Swarm Simpler than K8s, Docker-native

Kubernetes deployment sketch:

  apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-api
  template:
    metadata:
      labels:
        app: my-api
    spec:
      containers:
      - name: api
        image: myregistry.io/my-api:1.0.0
        ports:
        - containerPort: 3000
        resources:
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 10
  

CI/CD Integration

Build and push images in CI, deploy by updating image tag:

  # GitHub Actions excerpt
- name: Build and push
  uses: docker/build-push-action@v5
  with:
    push: true
    tags: myregistry.io/my-api:${{ github.sha }}
  

Deploy triggers on merge to main — never build images on production servers.

Zero-Downtime Deployments

Rolling update: replace containers one at a time behind a load balancer.

Blue-green: run two identical environments; switch traffic atomically.

Canary: route 5% of traffic to new version; increase if metrics are healthy.

  # Docker Swarm rolling update
docker service update --image my-api:2.0.0 my-api
  

Monitoring

Monitor containers, not just the host:

Metric Tool
Container CPU/memory cAdvisor, Prometheus
Application metrics Prometheus + Grafana
Traces Jaeger, OpenTelemetry
Uptime Uptime Kuma, Pingdom
  # Prometheus scrape target
services:
  web:
    labels:
      - "prometheus.scrape=true"
      - "prometheus.port=3000"
      - "prometheus.path=/metrics"
  

Backup and Disaster Recovery

  • Database volumes — scheduled snapshots (pg_dump, mysqldump, or volume snapshots)
  • Image registry — images are reproducible from Dockerfile + Git
  • Configuration — store in Git (Compose files, K8s manifests)
  • Test restores — a backup you haven’t restored is worthless

Production Checklist

  • Non-root user in Dockerfile
  • Specific image tags (not latest)
  • Image vulnerability scanning in CI
  • Secrets from vault, not environment files in Git
  • Resource limits configured
  • Health checks and restart policies
  • Centralized logging
  • Monitoring and alerting
  • HTTPS termination at load balancer
  • Backup strategy tested

What Comes Next

Automate the entire pipeline with CI/CD — build, test, scan, and deploy containers on every merge to main.