Search…
Linux from Scratch · Part 12

Logs and system monitoring

In this series (15 parts)
  1. What is Linux and how it differs from other OSes
  2. Installing Linux and setting up your environment
  3. The Linux filesystem explained
  4. Users, groups, and permissions
  5. Essential command line tools
  6. Shell scripting fundamentals
  7. Processes and job control
  8. Standard I/O, pipes, and redirection
  9. The Linux networking stack
  10. Package management and software installation
  11. Disk management and filesystems
  12. Logs and system monitoring
  13. SSH and remote access
  14. Cron jobs and task scheduling
  15. Linux security basics for sysadmins

When something breaks on a Linux system, logs tell you what happened. Every service, every authentication attempt, every kernel event gets recorded. Knowing where to look and how to filter these logs is the difference between spending 5 minutes and 5 hours debugging a problem.

Prerequisites

You should be comfortable with text processing tools (grep, awk, sed) and I/O redirection.

Where logs live

Most logs are in /var/log/. Here are the important ones:

Log fileWhat it contains
/var/log/syslogGeneral system messages (Ubuntu/Debian)
/var/log/messagesGeneral system messages (RHEL/CentOS)
/var/log/auth.logAuthentication events (logins, sudo, SSH)
/var/log/kern.logKernel messages
/var/log/dmesgBoot and hardware messages
/var/log/dpkg.logPackage installation/removal
/var/log/apt/history.logapt command history
/var/log/nginx/access.logWeb server access logs
/var/log/nginx/error.logWeb server error logs
ls -lh /var/log/ | head -20

Output:

total 15M
-rw-r-----  1 syslog adm    8.5M Jun 15 10:30 auth.log
-rw-r-----  1 syslog adm     12M Jun 15 10:30 kern.log
-rw-r-----  1 syslog adm     45M Jun 15 10:30 syslog
drwxr-xr-x  2 root   root   4.0K Jun 15 10:00 nginx
drwxr-x---  2 root   adm    4.0K Jun 15 10:00 journal

journald (systemd journal)

On modern systems using systemd, journalctl is the primary way to read logs. It collects logs from all services, the kernel, and the init system in a structured binary format.

Basic usage

# Show all logs (most recent at the bottom)
journalctl

# Follow logs in real time (like tail -f)
journalctl -f

# Show only the last 50 lines
journalctl -n 50

# Show logs from the current boot
journalctl -b

# Show logs from the previous boot
journalctl -b -1

Filtering by service

# Logs for a specific service
journalctl -u nginx

# Logs for SSH
journalctl -u ssh

# Logs for multiple services
journalctl -u nginx -u postgresql

Filtering by time

# Logs since a specific time
journalctl --since "2026-05-22 09:00:00"

# Logs in the last hour
journalctl --since "1 hour ago"

# Logs between two times
journalctl --since "2026-05-22 09:00" --until "2026-05-22 10:00"

# Today's logs
journalctl --since today

Filtering by priority

# Only errors and above
journalctl -p err

# Warning and above
journalctl -p warning

# Emergency only
journalctl -p emerg

Priority levels (most to least severe): emerg, alert, crit, err, warning, notice, info, debug.

Output formats

# JSON output (useful for parsing)
journalctl -u nginx -o json-pretty -n 1

Output:

{
    "_HOSTNAME" : "devbox",
    "_SYSTEMD_UNIT" : "nginx.service",
    "MESSAGE" : "nginx: worker process 1234 started",
    "PRIORITY" : "6",
    "__REALTIME_TIMESTAMP" : "1716364800000000"
}

syslog

The traditional logging system. Many applications still write directly to syslog. On Ubuntu, the daemon is rsyslog.

# View syslog
tail -20 /var/log/syslog

Output:

May 22 10:30:01 devbox CRON[12345]: (root) CMD (/usr/lib/apt/apt.systemd.daily)
May 22 10:30:15 devbox systemd[1]: Starting Daily apt download activities...
May 22 10:31:00 devbox kernel: [432100.123456] Out of memory: Killed process 5678 (java)

The format is: date hostname process[PID]: message

dmesg: kernel messages

# Show kernel messages
dmesg | tail -20

# Show with human-readable timestamps
dmesg -T | tail -10

# Filter for errors
dmesg -l err,warn | tail -10

Output:

[Thu May 22 10:00:01 2026] USB disconnect, device number 3
[Thu May 22 10:00:15 2026] EXT4-fs error (device sda2): bad inode #1234567

logrotate

Logs grow over time. Without management, they will fill your disk. logrotate automatically compresses and removes old logs.

# Check logrotate config
cat /etc/logrotate.conf

Output:

weekly
rotate 4
create
dateext
compress
include /etc/logrotate.d
# Application-specific config
cat /etc/logrotate.d/nginx

Output:

/var/log/nginx/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        [ -f /run/nginx.pid ] && kill -USR1 $(cat /run/nginx.pid)
    endscript
}

This config: rotates daily, keeps 14 copies, compresses old logs, and sends SIGUSR1 to nginx (which tells it to reopen log files).

# Test logrotate without actually rotating
sudo logrotate -d /etc/logrotate.d/nginx

# Force a rotation
sudo logrotate -f /etc/logrotate.d/nginx

Example 1: Find failed SSH login attempts

This is one of the first things you check when investigating a security incident.

# Method 1: grep auth.log directly
grep "Failed password" /var/log/auth.log | tail -10

Output:

May 22 03:15:01 devbox sshd[9876]: Failed password for invalid user admin from 203.0.113.5 port 54321 ssh2
May 22 03:15:03 devbox sshd[9877]: Failed password for invalid user root from 203.0.113.5 port 54322 ssh2
May 22 03:15:05 devbox sshd[9878]: Failed password for invalid user test from 203.0.113.5 port 54323 ssh2
May 22 08:30:12 devbox sshd[9900]: Failed password for pratik from 10.0.0.50 port 12345 ssh2
# Count failed logins per IP
grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn

Output:

    347 203.0.113.5
     89 198.51.100.23
      1 10.0.0.50

203.0.113.5 had 347 failed attempts. That is a brute force attack.

# Method 2: journalctl (more flexible)
journalctl -u ssh --since "24 hours ago" | grep "Failed password" | wc -l

Output:

437
# See successful logins too
grep "Accepted" /var/log/auth.log | tail -5

Output:

May 22 08:30:45 devbox sshd[9910]: Accepted publickey for pratik from 10.0.0.50 port 12346 ssh2
May 22 09:00:00 devbox sshd[9920]: Accepted publickey for deploy from 10.0.0.100 port 23456 ssh2

Good: these logins used public keys from known IPs. For more on securing SSH, see the SSH article. To automatically block brute force IPs, see Linux security basics (fail2ban section).

Example 2: Track a service crash with journalctl

Suppose your PostgreSQL database crashed. Here is how to investigate:

# Check current status
systemctl status postgresql

Output:

 postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled)
     Active: failed (Result: exit-code) since Thu 2026-05-22 09:45:00 UTC
    Process: 1100 ExecStart=/usr/bin/pg_ctlcluster 16 main start (code=exited, status=1/FAILURE)
# Get detailed logs around the crash time
journalctl -u postgresql --since "09:40" --until "09:50"

Output:

May 22 09:44:55 devbox postgresql[1100]: 2026-05-22 09:44:55.123 UTC [1100] LOG:  starting PostgreSQL 16.3
May 22 09:44:55 devbox postgresql[1100]: 2026-05-22 09:44:55.234 UTC [1100] LOG:  listening on IPv4 address "127.0.0.1", port 5432
May 22 09:44:58 devbox postgresql[1100]: 2026-05-22 09:44:58.567 UTC [1100] FATAL:  could not open file "base/16384/1234": No space left on device
May 22 09:44:58 devbox postgresql[1100]: 2026-05-22 09:44:58.568 UTC [1100] LOG:  database system is shut down
May 22 09:45:00 devbox systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE

The problem: “No space left on device.” Let’s confirm:

df -h /var/lib/postgresql

Output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       480G  480G     0 100% /

The disk is full. Find what is using the space:

sudo du -sh /var/log/* | sort -rh | head -5

Output:

450G	/var/log/app-debug.log
12M	/var/log/syslog
8.5M	/var/log/auth.log

Someone left debug logging on and a 450GB log file filled the disk. Fix it:

# Truncate the massive log file (don't rm it while the app has it open)
sudo truncate -s 0 /var/log/app-debug.log

# Verify space is freed
df -h /

# Restart PostgreSQL
sudo systemctl start postgresql

# Verify it is running
systemctl status postgresql

Then set up proper log rotation to prevent this from happening again.

Useful log analysis commands

# Count events per hour
awk '{print $1, $2, substr($3,1,2)":00"}' /var/log/auth.log | sort | uniq -c | tail -24

# Find the most common error messages
grep -i "error" /var/log/syslog | awk -F': ' '{print $NF}' | sort | uniq -c | sort -rn | head -10

# Monitor multiple logs at once
tail -f /var/log/syslog /var/log/auth.log

# Find all log entries from a specific PID
journalctl _PID=1234

# Find logs from a specific binary
journalctl /usr/sbin/nginx

What comes next

The next article covers SSH and remote access, where you will learn how to securely connect to remote Linux machines, set up key-based authentication, and create SSH tunnels.

For the security perspective on log analysis, see Defensive security, which covers SIEM systems and detection rules based on log patterns.

Start typing to search across all content
navigate Enter open Esc close