Process Management
What Is a Process?
A process is a running instance of a program. Each has a PID (process ID), parent PPID, owning user, state, and resource counters. The kernel schedules processes on CPU cores and accounts for memory via cgroups.
ps aux # all processes, BSD style
ps -ef # POSIX style with PPID
ps -eo pid,ppid,user,%cpu,%mem,cmd --sort=-%cpu | head
pstree -p # tree with PIDs
pstree -p $(pgrep -x systemd) # tree from init
Interactive Monitoring
top # q to quit; M = sort by memory
htop # enhanced (install separately)
top -H -p $(pgrep nginx) # threads of nginx
# Batch mode for scripts
top -b -n 1 | head -20
Inside top: P sort CPU, M sort memory, k kill, 1 show all CPUs.
Finding Processes
pgrep -a nginx # PIDs and command lines
pgrep -u www-data # processes owned by user
pidof nginx
ps aux | grep '[n]ginx' # bracket trick avoids self-match
# Process listening on port
sudo ss -tlnp | grep :443
sudo lsof -i :8080
sudo fuser -v 443/tcp
Map a PID to its systemd unit:
systemctl status $(ps -p 1234 -o unit= 2>/dev/null)
cat /proc/1234/cgroup # cgroup path includes service name
Signals and kill
| Signal | Number | Effect |
|---|---|---|
| SIGHUP | 1 | Reload config (many daemons) |
| SIGINT | 2 | Interrupt (Ctrl+C) |
| SIGTERM | 15 | Polite termination (default) |
| SIGKILL | 9 | Force kill — cannot be caught or ignored |
kill 1234 # SIGTERM
kill -15 1234
kill -9 1234 # last resort only
kill -HUP 1234 # reload nginx config
killall nginx # by name — dangerous on shared hosts
pkill -f "python app.py" # match full command line
pkill -u deploy # all processes of user
Always try SIGTERM first so applications flush buffers, close connections, and release locks.
Foreground and Background
long_running_command &
jobs -l
fg %1 # foreground job 1
bg %1 # resume stopped job in background
disown -h %1 # remove from shell job table
nohup ./batch.sh > out.log 2>&1 &
# survives terminal close; output to out.log
Priority and Niceness
Niceness ranges from -20 (highest priority) to 19 (lowest). Only root can set niceness below 0.
nice -n 10 cpu-heavy.sh # start with lower priority
sudo renice -n -5 -p 1234 # boost running process
ionice -c 3 -p 1234 # idle I/O class for batch jobs
ps -o pid,ni,pri,cmd -p 1234
Real-time scheduling (chrt) is for specialized audio/industrial workloads — misuse can freeze the system.
/proc Filesystem
Live process details without special tools:
cat /proc/1234/status # state, memory, UID
ls -l /proc/1234/fd # open file descriptors
cat /proc/1234/cmdline | tr '\0' ' '
cat /proc/1234/environ | tr '\0' '\n'
readlink /proc/1234/exe # binary path
# System-wide
cat /proc/meminfo
cat /proc/loadavg
cat /proc/cpuinfo
cgroups and systemd
systemd manages cgroups v2 for resource limits:
systemctl status nginx
systemd-cgtop # cgroup resource top
# Limit service memory (unit file)
# MemoryMax=512M in [Service] section
Or legacy cgexec for ad-hoc limits on arbitrary commands.
Troubleshooting Workflow
- Identify resource hogs:
top,ps aux --sort=-%mem | head - Map PID to service:
systemctl status,/proc/PID/cgroup - Inspect open files:
lsof -p PID(too many FDs = leak) - Check logs:
journalctl -u service, application logs - Terminate gracefully: SIGTERM, wait, then SIGKILL if stuck
# Zombie processes (state Z) — parent must reap
ps aux | awk '$8 ~ /Z/ { print }'
# Fix: restart parent or kill parent carefully
Best Practices
| Practice | Reason |
|---|---|
| SIGTERM before SIGKILL | Data integrity and clean shutdown |
| Use systemd to manage daemons | Automatic restart, logging, cgroups |
| Monitor load vs CPU count | Load 8 on 4 cores = CPU saturation or I/O wait |
Set LimitNOFILE in unit files |
Prevents “too many open files” under load |
Common Mistakes
| Mistake | Consequence |
|---|---|
kill -9 on database |
Corrupted tables, incomplete transactions |
killall on shared server |
Kills all matching processes system-wide |
| Ignoring zombie processes | PIDs exhaust over months (rare but real) |
| Running batch jobs without nice | Starves interactive and web services |
Performance Tips
# I/O wait high?
iostat -x 1 5
pidstat -d 1 # per-process I/O
# Memory pressure?
free -h && cat /proc/meminfo | grep -i avail
# Check OOM kills
dmesg | grep -i "killed process"
journalctl -k | grep -i oom
Production Scenario
A Java app consumes 100% CPU after deploy:
top -H -p $(pgrep -f myapp.jar) # find hot thread
# Convert TID to hex for thread dump
kill -3 $(pgrep -f myapp.jar) # SIGQUIT → thread dump in logs
ps -o pid,ni,cmd -p $(pgrep -f myapp.jar)
sudo renice -n 5 -p $(pgrep -f myapp.jar) # temporary relief
Thread dump reveals infinite loop in new code; rollback deploy while hotfix ships.
Understanding processes turns “the server is slow” into actionable fixes — identify the PID, read the evidence, fix or kill with intent instead of reboot roulette.