What Is a Process?

A process is a running instance of a program. Each has a PID (process ID), parent PPID, owning user, state, and resource counters. The kernel schedules processes on CPU cores and accounts for memory via cgroups.

  ps aux                         # all processes, BSD style
ps -ef                         # POSIX style with PPID
ps -eo pid,ppid,user,%cpu,%mem,cmd --sort=-%cpu | head

pstree -p                      # tree with PIDs
pstree -p $(pgrep -x systemd)  # tree from init
  

Interactive Monitoring

  top                            # q to quit; M = sort by memory
htop                           # enhanced (install separately)
top -H -p $(pgrep nginx)       # threads of nginx

# Batch mode for scripts
top -b -n 1 | head -20
  

Inside top: P sort CPU, M sort memory, k kill, 1 show all CPUs.

Finding Processes

  pgrep -a nginx                 # PIDs and command lines
pgrep -u www-data              # processes owned by user
pidof nginx

ps aux | grep '[n]ginx'        # bracket trick avoids self-match

# Process listening on port
sudo ss -tlnp | grep :443
sudo lsof -i :8080
sudo fuser -v 443/tcp
  

Map a PID to its systemd unit:

  systemctl status $(ps -p 1234 -o unit= 2>/dev/null)
cat /proc/1234/cgroup           # cgroup path includes service name
  

Signals and kill

Signal Number Effect
SIGHUP 1 Reload config (many daemons)
SIGINT 2 Interrupt (Ctrl+C)
SIGTERM 15 Polite termination (default)
SIGKILL 9 Force kill — cannot be caught or ignored
  kill 1234                      # SIGTERM
kill -15 1234
kill -9 1234                   # last resort only
kill -HUP 1234                 # reload nginx config

killall nginx                  # by name — dangerous on shared hosts
pkill -f "python app.py"       # match full command line
pkill -u deploy                # all processes of user
  

Always try SIGTERM first so applications flush buffers, close connections, and release locks.

Foreground and Background

  long_running_command &
jobs -l
fg %1                          # foreground job 1
bg %1                          # resume stopped job in background
disown -h %1                   # remove from shell job table

nohup ./batch.sh > out.log 2>&1 &
# survives terminal close; output to out.log
  

Priority and Niceness

Niceness ranges from -20 (highest priority) to 19 (lowest). Only root can set niceness below 0.

  nice -n 10 cpu-heavy.sh        # start with lower priority
sudo renice -n -5 -p 1234      # boost running process
ionice -c 3 -p 1234             # idle I/O class for batch jobs

ps -o pid,ni,pri,cmd -p 1234
  

Real-time scheduling (chrt) is for specialized audio/industrial workloads — misuse can freeze the system.

/proc Filesystem

Live process details without special tools:

  cat /proc/1234/status          # state, memory, UID
ls -l /proc/1234/fd            # open file descriptors
cat /proc/1234/cmdline | tr '\0' ' '
cat /proc/1234/environ | tr '\0' '\n'
readlink /proc/1234/exe          # binary path

# System-wide
cat /proc/meminfo
cat /proc/loadavg
cat /proc/cpuinfo
  

cgroups and systemd

systemd manages cgroups v2 for resource limits:

  systemctl status nginx
systemd-cgtop                  # cgroup resource top

# Limit service memory (unit file)
# MemoryMax=512M in [Service] section
  

Or legacy cgexec for ad-hoc limits on arbitrary commands.

Troubleshooting Workflow

  1. Identify resource hogs: top, ps aux --sort=-%mem | head
  2. Map PID to service: systemctl status, /proc/PID/cgroup
  3. Inspect open files: lsof -p PID (too many FDs = leak)
  4. Check logs: journalctl -u service, application logs
  5. Terminate gracefully: SIGTERM, wait, then SIGKILL if stuck
  # Zombie processes (state Z) — parent must reap
ps aux | awk '$8 ~ /Z/ { print }'
# Fix: restart parent or kill parent carefully
  

Best Practices

Practice Reason
SIGTERM before SIGKILL Data integrity and clean shutdown
Use systemd to manage daemons Automatic restart, logging, cgroups
Monitor load vs CPU count Load 8 on 4 cores = CPU saturation or I/O wait
Set LimitNOFILE in unit files Prevents “too many open files” under load

Common Mistakes

Mistake Consequence
kill -9 on database Corrupted tables, incomplete transactions
killall on shared server Kills all matching processes system-wide
Ignoring zombie processes PIDs exhaust over months (rare but real)
Running batch jobs without nice Starves interactive and web services

Performance Tips

  # I/O wait high?
iostat -x 1 5
pidstat -d 1                   # per-process I/O

# Memory pressure?
free -h && cat /proc/meminfo | grep -i avail
# Check OOM kills
dmesg | grep -i "killed process"
journalctl -k | grep -i oom
  

Production Scenario

A Java app consumes 100% CPU after deploy:

  top -H -p $(pgrep -f myapp.jar)    # find hot thread
# Convert TID to hex for thread dump
kill -3 $(pgrep -f myapp.jar)      # SIGQUIT → thread dump in logs

ps -o pid,ni,cmd -p $(pgrep -f myapp.jar)
sudo renice -n 5 -p $(pgrep -f myapp.jar)   # temporary relief
  

Thread dump reveals infinite loop in new code; rollback deploy while hotfix ships.

Understanding processes turns “the server is slow” into actionable fixes — identify the PID, read the evidence, fix or kill with intent instead of reboot roulette.