awk 101
Previously I mentioned the unix tool awk
, but I didn’t really understand it. Today I heard something that really made me understand what this tool is all about:
awk
treats each row in a text file as a “record”- Each word in a row is a “field”
- The default field separator is whitespace, but it can easily be changed to commas or whatever.
The basic awk
command looks like this: awk '{ }' file.txt
Inside the {}
is where we enter the commands.
Each “record” (remember - row) in awk
contains 4 special variables that can be useful:
# $0 represents the entire current record
awk '{ print $0 }' text.txt
# NR represents the current record's number (aka line number)
awk '{ print NR }' text.txt
# NF represents the number of fields in the current record
awk '{ print NF }' text.txt
# FS represents the field separator.
# Here I change it to the letter 'i'
awk 'BEGIN { FS = "i" } ; { print $1 }' text.txt
Additionally, each field in a record can be referenced by its index position, starting with 1.
# print the 2rd "field" in each "record"
# aka the 2rd word in each line
awk '{ print $2 }' text.txt
Let’s pretty-print a csv
file using awk
.
Input:
id,first_name,last_name,email,gender,ip_address
1,Waiter,Takos,wtakos0@github.com,Male,75.26.211.80
2,Susana,Pincked,spincked1@dropbox.com,Female,17.83.220.43
3,Gaile,Persse,gpersse2@fc2.com,Male,111.61.80.169
awk appears to have all the power of a scripting language, and its syntax is also flexible. Check out how we can create a “function”-like syntax that changes the FS
(field separator) and OFS
(output field separator), and also use a conditional to only print our header and lines that contain a specific word (“dropbox” in our example)
awk 'BEGIN {FS=","; OFS=" =====> "}
{
if (NR == 1) {
print toupper($2),tolower($4);
}
if ($0 ~ /dropbox/) {
print $2,$4;
}
}' example.csv
# FIRST_NAME =====> email
# Susana =====> spincked1@dropbox.com