Search…
Linux from Scratch · Part 5

Essential command line tools

In this series (15 parts)
  1. What is Linux and how it differs from other OSes
  2. Installing Linux and setting up your environment
  3. The Linux filesystem explained
  4. Users, groups, and permissions
  5. Essential command line tools
  6. Shell scripting fundamentals
  7. Processes and job control
  8. Standard I/O, pipes, and redirection
  9. The Linux networking stack
  10. Package management and software installation
  11. Disk management and filesystems
  12. Logs and system monitoring
  13. SSH and remote access
  14. Cron jobs and task scheduling
  15. Linux security basics for sysadmins

The command line is where Linux users spend most of their time. A handful of commands, combined properly, can replace entire GUI applications. This article covers the essential tools in three categories: file operations, text processing, and process management.

Prerequisites

You should understand the Linux filesystem layout and file permissions before starting.

File operations

Copying, moving, and deleting

# Copy a file
cp source.txt destination.txt

# Copy a directory recursively
cp -r src/ backup/

# Move (rename) a file
mv old-name.txt new-name.txt

# Move a file to another directory
mv file.txt /tmp/

# Delete a file
rm file.txt

# Delete a directory and everything in it
rm -rf old-project/

rm -rf is permanent. There is no recycle bin on the command line. Double check before you press Enter.

Finding files with find

find searches the filesystem based on file properties: name, type, size, modification time, permissions.

# Find all .py files under the current directory
find . -name "*.py"

# Find all directories named "test"
find /home -type d -name "test"

# Find files larger than 100MB
find / -type f -size +100M 2>/dev/null

# Find files modified in the last 24 hours
find /etc -type f -mtime -1

# Find files with specific permissions
find / -perm -4000 -type f 2>/dev/null    # SUID files

# Find and delete all .tmp files
find /tmp -name "*.tmp" -type f -delete

# Find and execute a command on each result
find . -name "*.log" -exec gzip {} \;

Finding files with locate

locate uses a pre-built database, so it is much faster than find. But the database needs to be updated regularly.

# Update the database
sudo updatedb

# Find files
locate nginx.conf

Output:

/etc/nginx/nginx.conf
/usr/share/doc/nginx/nginx.conf.example

Text processing

Text processing is where Linux really shines. You can chain these tools together with pipes to build powerful data pipelines.

grep: search text

# Search for a pattern in a file
grep "error" /var/log/syslog

# Case-insensitive search
grep -i "warning" app.log

# Show line numbers
grep -n "TODO" *.py

# Recursive search in directories
grep -r "database_url" /etc/

# Invert match (show lines that DON'T match)
grep -v "DEBUG" app.log

# Count matches
grep -c "404" access.log

# Show context (2 lines before and after)
grep -B2 -A2 "FATAL" error.log

# Extended regex
grep -E "error|warning|critical" syslog

sed: stream editor

sed transforms text line by line. Most commonly used for search and replace.

# Replace first occurrence per line
sed 's/old/new/' file.txt

# Replace ALL occurrences per line
sed 's/old/new/g' file.txt

# Replace in-place (modifies the file)
sed -i 's/old/new/g' file.txt

# Delete lines matching a pattern
sed '/^#/d' config.txt        # Remove comment lines

# Print only specific lines
sed -n '5,10p' file.txt       # Print lines 5-10

# Insert text before a line
sed '/\[server\]/i # Server configuration' config.ini

awk: pattern scanning and processing

awk processes text field by field. Each line is split into fields by whitespace (by default).

# Print the second column
awk '{print $2}' file.txt

# Print specific columns with custom separator
awk -F: '{print $1, $3}' /etc/passwd

# Filter rows where column 3 > 100
awk '$3 > 100 {print $0}' data.txt

# Sum a column
awk '{sum += $5} END {print sum}' sales.txt

# Count unique values in a column
awk '{count[$1]++} END {for (k in count) print k, count[k]}' access.log

cut, sort, uniq: quick data transforms

# Extract the first field (colon-separated)
cut -d: -f1 /etc/passwd

# Sort alphabetically
sort names.txt

# Sort numerically, reverse
sort -rn numbers.txt

# Remove duplicate lines (input must be sorted)
sort data.txt | uniq

# Count duplicates
sort data.txt | uniq -c | sort -rn

Process management

Every running program is a process. Understanding how processes work at a deeper level is covered later, but here are the essential tools.

ps: list processes

# Show your processes
ps

# Show all processes with full details
ps aux

# Show process tree
ps auxf

# Find a specific process
ps aux | grep nginx

Output of ps aux | grep nginx:

root      1230  0.0  0.0  12345  2345 ?  Ss  10:00  0:00 nginx: master process
www-data  1234  0.0  0.1  12345  5678 ?  S   10:00  0:00 nginx: worker process
www-data  1235  0.0  0.1  12345  5678 ?  S   10:00  0:00 nginx: worker process

top and htop: real-time monitoring

# Basic real-time process monitor
top

Output (top section):

top - 10:30:00 up 5 days, 3:00,  2 users,  load average: 0.50, 0.45, 0.40
Tasks: 200 total,   1 running, 199 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.0 us,  2.0 sy,  0.0 ni, 92.0 id,  1.0 wa,  0.0 hi,  0.0 si
MiB Mem:  16000.0 total,   8000.0 free,   5000.0 used,   3000.0 buff/cache
MiB Swap:  8000.0 total,   8000.0 free,      0.0 used.  10000.0 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1234 www-data  20   0  123456  56789  12345 S   3.0   0.3   1:23.45 nginx
 5678 postgres  20   0  234567  89012  23456 S   2.0   0.5   5:67.89 postgres

Key fields: PID (process ID), %CPU, %MEM, COMMAND. Press q to quit.

htop is an improved version (install with sudo apt install htop). It has colors, mouse support, and easier process management.

kill: stop processes

# Send SIGTERM (polite shutdown request)
kill 1234

# Send SIGKILL (force kill, cannot be caught)
kill -9 1234

# Kill by name
pkill nginx

# Kill all processes matching a pattern
pkill -f "python server.py"

nice and renice: process priority

# Start a process with lower priority (nice value 10)
nice -n 10 ./heavy-computation.sh

# Change priority of a running process
renice 15 -p 1234

Nice values range from -20 (highest priority) to 19 (lowest). Only root can set negative nice values.

Example 1: Filter a log file with grep and awk

Suppose you have a web server access log and you want to find the IP addresses that made the most requests returning a 404 status.

First, look at the log format:

head -3 /var/log/nginx/access.log

Output:

192.168.1.100 - - [15/Jun/2024:10:00:01 +0000] "GET /index.html HTTP/1.1" 200 612
10.0.0.50 - - [15/Jun/2024:10:00:02 +0000] "GET /missing.html HTTP/1.1" 404 169
192.168.1.100 - - [15/Jun/2024:10:00:03 +0000] "GET /api/users HTTP/1.1" 200 1234

The IP is field 1, the status code is field 9. Let’s filter for 404s and count by IP:

# Step 1: Filter for 404 status codes
grep '" 404 ' /var/log/nginx/access.log | head -3

Output:

10.0.0.50 - - [15/Jun/2024:10:00:02 +0000] "GET /missing.html HTTP/1.1" 404 169
10.0.0.50 - - [15/Jun/2024:10:00:15 +0000] "GET /old-page HTTP/1.1" 404 169
203.0.113.5 - - [15/Jun/2024:10:01:30 +0000] "GET /wp-login.php HTTP/1.1" 404 169
# Step 2: Extract just the IP addresses
grep '" 404 ' /var/log/nginx/access.log | awk '{print $1}' | head -5

Output:

10.0.0.50
10.0.0.50
203.0.113.5
203.0.113.5
203.0.113.5
# Step 3: Count and sort
grep '" 404 ' /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

Output:

    147 203.0.113.5
     89 198.51.100.23
     45 10.0.0.50
     12 192.168.1.100
      3 172.16.0.5

203.0.113.5 triggered 147 404 errors. That /wp-login.php request suggests someone is scanning for WordPress vulnerabilities. You might want to block this IP in your firewall.

Example 2: Find all files modified in the last 24 hours

Suppose you deployed code yesterday and something broke. You need to know exactly which files changed.

# Find all files modified in the last 24 hours under /etc
find /etc -type f -mtime -1 -ls 2>/dev/null

Output:

  12345  4 -rw-r--r--  1 root root   234 Jun 15 14:30 /etc/nginx/sites-enabled/default
  12346  4 -rw-r--r--  1 root root   567 Jun 15 14:30 /etc/nginx/nginx.conf
  12347  4 -rw-r--r--  1 root root    89 Jun 15 15:00 /etc/resolv.conf

Now let’s make this more useful. Find recently changed files, show what changed, and save a report:

# Find modified config files and check each with git
find /etc -type f -mtime -1 2>/dev/null | while read file; do
    echo "=== $file ==="
    ls -l "$file"
    echo "---"
done

Output:

=== /etc/nginx/sites-enabled/default ===
-rw-r--r-- 1 root root 234 Jun 15 14:30 /etc/nginx/sites-enabled/default
---
=== /etc/nginx/nginx.conf ===
-rw-r--r-- 1 root root 567 Jun 15 14:30 /etc/nginx/nginx.conf
---
=== /etc/resolv.conf ===
-rw-r--r-- 1 root root 89 Jun 15 15:00 /etc/resolv.conf
---

Combine with diff if you keep backups:

# Compare current config to backup
diff /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak

Output:

5c5
< worker_connections 2048;
---
> worker_connections 1024;

Someone doubled the worker connections. That might be the cause of the issue.

For a more comprehensive approach to finding what changed on a system, check the system logs and monitoring article.

Quick reference

TaskCommand
Copy filecp src dst
Move/renamemv old new
Delete filerm file
Delete directoryrm -rf dir/
Find by namefind . -name "*.py"
Find by timefind . -mtime -1
Search textgrep "pattern" file
Replace textsed 's/old/new/g' file
Column extractawk '{print $2}' file
Sortsort file
Count uniquessort file | uniq -c
List processesps aux
Kill processkill PID

What comes next

You now know the core commands. The next article, Shell scripting fundamentals, teaches you how to combine these commands into reusable scripts with variables, loops, and conditionals.

If you want to understand how to chain commands together with pipes and redirection, see Standard I/O, pipes, and redirection.

Start typing to search across all content
navigate Enter open Esc close