sed and awk have been on every Unix since the late 70s. They are slower to type than Python, but for a one-off "replace this text across these files" you reach for sed; for "average this column" you reach for awk. Knowing the eight or nine common patterns covers 95 percent of daily needs.
Prerequisites
- A shell on any Linux/macOS box.
- A test file. Use a copy —
sed -irewrites the original.
Step 1: sed — simple substitution
sed 's/old/new/' file.txt # first match per line
sed 's/old/new/g' file.txt # global on each line
sed 's/old/new/3' file.txt # only the 3rd match per line
sed 's|/old/path|/new/path|g' file.txt # use | as delimiter when text contains /
In-place edit (with backup):
sed -i.bak 's/old/new/g' file.txt
ls file.txt* # file.txt and file.txt.bak
macOS gotcha: BSD sed -i requires an extension argument, even an empty one:
sed -i '' 's/old/new/g' file.txt # macOS / BSD
Step 2: sed — address ranges
sed -n '5,10p' file.txt # print lines 5-10 (-n suppresses default print)
sed '5,10d' file.txt # delete lines 5-10
sed '/^#/d' file.txt # delete lines starting with #
sed '/^$/d' file.txt # delete blank lines
sed '/START/,/END/d' file.txt # delete from START line to END line
Step 3: sed — capture groups
# Reverse "key: value" to "value: key"
sed -E 's/^([^:]+): (.+)$/\2: \1/' file.txt
# Wrap version numbers in quotes
sed -E 's/version = ([0-9.]+)/version = "\1"/' config.toml
GNU sed accepts -r or -E for extended regex; BSD/macOS sed uses -E. Always use -E for portability.
Step 4: awk — field-oriented one-liners
awk splits each line on whitespace into $1, $2, $3, etc. $0 is the whole line, NF is the number of fields, NR is the line number.
# Print just the first column
awk '{ print $1 }' file.txt
# Print column 3 if column 1 == "ERROR"
awk '$1 == "ERROR" { print $3 }' app.log
# Sum a column
awk '{ sum += $2 } END { print sum }' nums.txt
# Average
awk '{ sum += $2; n++ } END { printf "%.2f\n", sum/n }' nums.txt
Step 5: awk — CSV / custom delimiter
awk -F, '{ print $2 }' data.csv # 2nd CSV column
awk -F: '{ print $1 }' /etc/passwd # all usernames
awk -F'[,;]' '{ print $1, $3 }' mixed.txt # multi-char split
Step 6: awk — group + count (uniq-c on steroids)
# Count HTTP status codes in an nginx access log
awk '{ count[$9]++ } END { for (s in count) print s, count[s] }' /var/log/nginx/access.log
# Top 10 client IPs
awk '{ ip[$1]++ } END { for (i in ip) print ip[i], i }' access.log | sort -nr | head -10
# Total bytes served
awk '{ sum += $10 } END { printf "%.2f MB\n", sum/1024/1024 }' access.log
Step 7: Combine them — pipelines
# All Apache 5xx errors, deduplicated by URL, top 20
awk '$9 ~ /^5/ { print $7 }' /var/log/apache2/access.log \
| sort | uniq -c | sort -nr | head -20
# All systemd units that have ever failed
journalctl -p err --no-pager \
| sed -nE 's/.*Failed to start (.+)\.( service|)/\1/p' \
| sort -u
Step 8: When to reach for Python instead
Once your script has more than 3 pipes, more than 2 awk conditions, or any state across more than a single field, switch to Python. The two-minute time you save fighting sed/awk quoting is paid back ten times over in maintainability.
Verify
sed --version | head -1
awk --version 2>&1 | head -1 # mawk/gawk
echo "a 1\nb 2" | awk '{ sum += $2 } END { print sum }' # expect: 3
Conclusion
sed 's/old/new/g', awk -F, '{print $2}', and awk '{count[$1]++} END {for(k in count) print count[k], k}' are the three patterns to remember. Everything else you can look up when you need it.
Next steps
- Search files with grep effectively.
- Glue scripts together with Bash scripting fundamentals.
- Find and chmod files with Linux file permissions.
Comments
0 total · 0 threads