Logs and system monitoring
In this series (15 parts)
- What is Linux and how it differs from other OSes
- Installing Linux and setting up your environment
- The Linux filesystem explained
- Users, groups, and permissions
- Essential command line tools
- Shell scripting fundamentals
- Processes and job control
- Standard I/O, pipes, and redirection
- The Linux networking stack
- Package management and software installation
- Disk management and filesystems
- Logs and system monitoring
- SSH and remote access
- Cron jobs and task scheduling
- Linux security basics for sysadmins
When something breaks on a Linux system, logs tell you what happened. Every service, every authentication attempt, every kernel event gets recorded. Knowing where to look and how to filter these logs is the difference between spending 5 minutes and 5 hours debugging a problem.
Prerequisites
You should be comfortable with text processing tools (grep, awk, sed) and I/O redirection.
Where logs live
Most logs are in /var/log/. Here are the important ones:
| Log file | What it contains |
|---|---|
/var/log/syslog | General system messages (Ubuntu/Debian) |
/var/log/messages | General system messages (RHEL/CentOS) |
/var/log/auth.log | Authentication events (logins, sudo, SSH) |
/var/log/kern.log | Kernel messages |
/var/log/dmesg | Boot and hardware messages |
/var/log/dpkg.log | Package installation/removal |
/var/log/apt/history.log | apt command history |
/var/log/nginx/access.log | Web server access logs |
/var/log/nginx/error.log | Web server error logs |
ls -lh /var/log/ | head -20
Output:
total 15M
-rw-r----- 1 syslog adm 8.5M Jun 15 10:30 auth.log
-rw-r----- 1 syslog adm 12M Jun 15 10:30 kern.log
-rw-r----- 1 syslog adm 45M Jun 15 10:30 syslog
drwxr-xr-x 2 root root 4.0K Jun 15 10:00 nginx
drwxr-x--- 2 root adm 4.0K Jun 15 10:00 journal
journald (systemd journal)
On modern systems using systemd, journalctl is the primary way to read logs. It collects logs from all services, the kernel, and the init system in a structured binary format.
Basic usage
# Show all logs (most recent at the bottom)
journalctl
# Follow logs in real time (like tail -f)
journalctl -f
# Show only the last 50 lines
journalctl -n 50
# Show logs from the current boot
journalctl -b
# Show logs from the previous boot
journalctl -b -1
Filtering by service
# Logs for a specific service
journalctl -u nginx
# Logs for SSH
journalctl -u ssh
# Logs for multiple services
journalctl -u nginx -u postgresql
Filtering by time
# Logs since a specific time
journalctl --since "2026-05-22 09:00:00"
# Logs in the last hour
journalctl --since "1 hour ago"
# Logs between two times
journalctl --since "2026-05-22 09:00" --until "2026-05-22 10:00"
# Today's logs
journalctl --since today
Filtering by priority
# Only errors and above
journalctl -p err
# Warning and above
journalctl -p warning
# Emergency only
journalctl -p emerg
Priority levels (most to least severe): emerg, alert, crit, err, warning, notice, info, debug.
Output formats
# JSON output (useful for parsing)
journalctl -u nginx -o json-pretty -n 1
Output:
{
"_HOSTNAME" : "devbox",
"_SYSTEMD_UNIT" : "nginx.service",
"MESSAGE" : "nginx: worker process 1234 started",
"PRIORITY" : "6",
"__REALTIME_TIMESTAMP" : "1716364800000000"
}
syslog
The traditional logging system. Many applications still write directly to syslog. On Ubuntu, the daemon is rsyslog.
# View syslog
tail -20 /var/log/syslog
Output:
May 22 10:30:01 devbox CRON[12345]: (root) CMD (/usr/lib/apt/apt.systemd.daily)
May 22 10:30:15 devbox systemd[1]: Starting Daily apt download activities...
May 22 10:31:00 devbox kernel: [432100.123456] Out of memory: Killed process 5678 (java)
The format is: date hostname process[PID]: message
dmesg: kernel messages
# Show kernel messages
dmesg | tail -20
# Show with human-readable timestamps
dmesg -T | tail -10
# Filter for errors
dmesg -l err,warn | tail -10
Output:
[Thu May 22 10:00:01 2026] USB disconnect, device number 3
[Thu May 22 10:00:15 2026] EXT4-fs error (device sda2): bad inode #1234567
logrotate
Logs grow over time. Without management, they will fill your disk. logrotate automatically compresses and removes old logs.
# Check logrotate config
cat /etc/logrotate.conf
Output:
weekly
rotate 4
create
dateext
compress
include /etc/logrotate.d
# Application-specific config
cat /etc/logrotate.d/nginx
Output:
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
postrotate
[ -f /run/nginx.pid ] && kill -USR1 $(cat /run/nginx.pid)
endscript
}
This config: rotates daily, keeps 14 copies, compresses old logs, and sends SIGUSR1 to nginx (which tells it to reopen log files).
# Test logrotate without actually rotating
sudo logrotate -d /etc/logrotate.d/nginx
# Force a rotation
sudo logrotate -f /etc/logrotate.d/nginx
Example 1: Find failed SSH login attempts
This is one of the first things you check when investigating a security incident.
# Method 1: grep auth.log directly
grep "Failed password" /var/log/auth.log | tail -10
Output:
May 22 03:15:01 devbox sshd[9876]: Failed password for invalid user admin from 203.0.113.5 port 54321 ssh2
May 22 03:15:03 devbox sshd[9877]: Failed password for invalid user root from 203.0.113.5 port 54322 ssh2
May 22 03:15:05 devbox sshd[9878]: Failed password for invalid user test from 203.0.113.5 port 54323 ssh2
May 22 08:30:12 devbox sshd[9900]: Failed password for pratik from 10.0.0.50 port 12345 ssh2
# Count failed logins per IP
grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn
Output:
347 203.0.113.5
89 198.51.100.23
1 10.0.0.50
203.0.113.5 had 347 failed attempts. That is a brute force attack.
# Method 2: journalctl (more flexible)
journalctl -u ssh --since "24 hours ago" | grep "Failed password" | wc -l
Output:
437
# See successful logins too
grep "Accepted" /var/log/auth.log | tail -5
Output:
May 22 08:30:45 devbox sshd[9910]: Accepted publickey for pratik from 10.0.0.50 port 12346 ssh2
May 22 09:00:00 devbox sshd[9920]: Accepted publickey for deploy from 10.0.0.100 port 23456 ssh2
Good: these logins used public keys from known IPs. For more on securing SSH, see the SSH article. To automatically block brute force IPs, see Linux security basics (fail2ban section).
Example 2: Track a service crash with journalctl
Suppose your PostgreSQL database crashed. Here is how to investigate:
# Check current status
systemctl status postgresql
Output:
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled)
Active: failed (Result: exit-code) since Thu 2026-05-22 09:45:00 UTC
Process: 1100 ExecStart=/usr/bin/pg_ctlcluster 16 main start (code=exited, status=1/FAILURE)
# Get detailed logs around the crash time
journalctl -u postgresql --since "09:40" --until "09:50"
Output:
May 22 09:44:55 devbox postgresql[1100]: 2026-05-22 09:44:55.123 UTC [1100] LOG: starting PostgreSQL 16.3
May 22 09:44:55 devbox postgresql[1100]: 2026-05-22 09:44:55.234 UTC [1100] LOG: listening on IPv4 address "127.0.0.1", port 5432
May 22 09:44:58 devbox postgresql[1100]: 2026-05-22 09:44:58.567 UTC [1100] FATAL: could not open file "base/16384/1234": No space left on device
May 22 09:44:58 devbox postgresql[1100]: 2026-05-22 09:44:58.568 UTC [1100] LOG: database system is shut down
May 22 09:45:00 devbox systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
The problem: “No space left on device.” Let’s confirm:
df -h /var/lib/postgresql
Output:
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 480G 480G 0 100% /
The disk is full. Find what is using the space:
sudo du -sh /var/log/* | sort -rh | head -5
Output:
450G /var/log/app-debug.log
12M /var/log/syslog
8.5M /var/log/auth.log
Someone left debug logging on and a 450GB log file filled the disk. Fix it:
# Truncate the massive log file (don't rm it while the app has it open)
sudo truncate -s 0 /var/log/app-debug.log
# Verify space is freed
df -h /
# Restart PostgreSQL
sudo systemctl start postgresql
# Verify it is running
systemctl status postgresql
Then set up proper log rotation to prevent this from happening again.
Useful log analysis commands
# Count events per hour
awk '{print $1, $2, substr($3,1,2)":00"}' /var/log/auth.log | sort | uniq -c | tail -24
# Find the most common error messages
grep -i "error" /var/log/syslog | awk -F': ' '{print $NF}' | sort | uniq -c | sort -rn | head -10
# Monitor multiple logs at once
tail -f /var/log/syslog /var/log/auth.log
# Find all log entries from a specific PID
journalctl _PID=1234
# Find logs from a specific binary
journalctl /usr/sbin/nginx
What comes next
The next article covers SSH and remote access, where you will learn how to securely connect to remote Linux machines, set up key-based authentication, and create SSH tunnels.
For the security perspective on log analysis, see Defensive security, which covers SIEM systems and detection rules based on log patterns.