Why Log Management Matters

Logs are the primary evidence source when services fail, security incidents occur, or performance degrades. Linux generates logs from the kernel, systemd journal, applications, and audit subsystem — often gigabytes per day on busy servers.

Without rotation and centralization, disks fill up and forensic evidence disappears after local retention limits.

Log Sources Overview

Source Location / Tool Content
systemd journal journalctl, /var/log/journal/ Services, kernel, boot
Traditional syslog /var/log/syslog, messages Legacy apps, kernel
Application logs /var/log/nginx/, /var/log/myapp/ App-specific
auth /var/log/auth.log, secure SSH, sudo, PAM
auditd /var/log/audit/audit.log MAC, syscalls, compliance
Kernel ring buffer dmesg, journalctl -k Hardware, drivers, OOM
  # Quick survey
sudo ls -lah /var/log/
journalctl --disk-usage
du -sh /var/log/*
  

systemd Journal (journald)

Primary log system on modern distros:

  # All logs, newest first
journalctl -e

# Since boot
journalctl -b
journalctl -b -1                 # previous boot

# By unit
journalctl -u nginx.service
journalctl -u nginx -f           # follow live
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx -p err       # priority error and above

# By priority
journalctl -p warning -b

# Kernel only
journalctl -k
journalctl -k -f

# Structured output for parsing
journalctl -u myapp -o json | jq '.MESSAGE'
journalctl -u myapp -o cat --no-pager
  

journald Configuration

/etc/systemd/journald.conf:

  [Journal]
Storage=persistent
SystemMaxUse=1G
SystemMaxFileSize=100M
MaxRetentionSec=30day
ForwardToSyslog=yes
Compress=yes
  

Apply: sudo systemctl restart systemd-journald

Vacuum old logs:

  sudo journalctl --vacuum-size=500M
sudo journalctl --vacuum-time=14d
  

rsyslog and Traditional Syslog

Many apps still write via syslog protocol:

  sudo apt install rsyslog
sudo systemctl enable --now rsyslog

# /etc/rsyslog.d/50-myapp.conf
# if $programname == 'myapp' then /var/log/myapp/app.log
# & stop

sudo systemctl restart rsyslog
logger -t myapp "Test message from logger"
  

Remote shipping (central log server):

  # /etc/rsyslog.d/90-forward.conf
*.* @@logserver.example.com:514
# @@ = TCP, @ = UDP
  

logrotate — Prevent Disk Exhaustion

/etc/logrotate.d/nginx example:

  /var/log/nginx/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        invoke-rc.d nginx rotate >/dev/null 2>&1 || true
    endscript
}
  
  # Test config without rotating
sudo logrotate -d /etc/logrotate.conf

# Force rotation
sudo logrotate -f /etc/logrotate.d/nginx

# Status
cat /var/lib/logrotate/status
  

Default configs cover /var/log/syslog, auth.log, and most packages — verify custom app logs have their own stanza.

Searching and Analysis

  # journalctl grep
journalctl -u ssh --grep "Failed password"
journalctl --grep "error" --since today -p err

# Traditional logs
grep -i error /var/log/nginx/error.log
zgrep "error" /var/log/nginx/error.log.*.gz
tail -f /var/log/syslog

# Time-bounded search
journalctl --since "2026-06-13 08:00" --until "2026-06-13 09:00" -u myapp
  

For high-volume analysis, ship logs to centralized tools rather than grep on production disks.

Centralized Log Shipping

Forward logs to Loki, ELK, Splunk, or CloudWatch:

  # Promtail (Loki agent) — /etc/promtail/config.yml snippet
# scrape_configs:
#   - job_name: journal
#     journal:
#       max_age: 12h
#       labels:
#         job: systemd-journal

# Filebeat (Elasticsearch)
sudo apt install filebeat
# /etc/filebeat/filebeat.yml — inputs for /var/log/*.log
sudo systemctl enable --now filebeat
  

Benefits: cross-host correlation, retention beyond local disk, tamper resistance when logs leave the host immediately.

Structured Logging Best Practices

Applications should log to stdout/stderr (containers) or structured JSON:

  {"timestamp":"2026-06-13T08:00:00Z","level":"error","msg":"DB connection failed","host":"web01","request_id":"abc123"}
  

Structured fields enable filtering in Loki/ELK without fragile regex.

auditd for Compliance

  sudo apt install auditd
sudo systemctl enable --now auditd

# Watch sensitive file
sudo auditctl -w /etc/passwd -p wa -k passwd_changes
sudo auditctl -l

# Search audit log
sudo ausearch -k passwd_changes -ts recent
sudo aureport -f                          # file reports
  

auditd logs are separate from journal — include in central shipping for PCI/SOC2.

Best Practices

Practice Reason
Centralize before local retention expires Attackers delete local logs
Set journal and logrotate limits Prevents disk-full outages
Use UTC timestamps Cross-region correlation
Include request/correlation IDs Trace requests across services
Separate audit/security logs Higher integrity retention

Common Mistakes

Mistake Consequence
No logrotate for custom apps Disk fills; service crash
DEBUG logging in production I/O overhead, PII leakage, cost
Logging secrets/passwords Compliance violation, credential exposure
Only local logs No visibility after compromise or disk loss

Troubleshooting

Disk full from logs:

  df -h /var
du -sh /var/log/* | sort -hr | head
journalctl --disk-usage
sudo journalctl --vacuum-size=200M
sudo logrotate -f /etc/logrotate.conf
  

Missing logs for service:

  systemctl status myapp
journalctl -u myapp -n 20
# Check StandardOutput= in unit file — journal vs file
  

logrotate failing silently:

  grep logrotate /var/log/syslog
sudo logrotate -d /etc/logrotate.d/myapp
  

Production Scenario

A payment platform handles PCI compliance:

  1. Application logs JSON to stdout; Docker captures to journal
  2. Promtail ships journal + nginx logs to Loki within 30 seconds
  3. Retention: 90 days hot in Loki, 7 years cold in S3 (compliance)
  4. Alerts: Loki ruler fires on level=error rate spike and Failed password count
  5. auditd rules track /etc/shadow and sudo usage; shipped to separate immutable bucket
  6. logrotate keeps local copies 7 days as buffer if shipper fails

During incident response, engineers query Loki by request_id across 50 hosts — local grep would take hours.

Effective log management is infrastructure — rotation prevents outages, centralization enables investigation, and structured logs turn noise into actionable signals.