Previously I mentioned the unix tool awk, but I didn’t really understand it. Today I heard something that really made me understand what this tool is all about:

  • awk treats each row in a text file as a “record”
  • Each word in a row is a “field”
  • The default field separator is whitespace, but it can easily be changed to commas or whatever.

The basic awk command looks like this: awk '{ }' file.txt

Inside the {} is where we enter the commands.

Each “record” (remember - row) in awk contains 4 special variables that can be useful:

# $0 represents the entire current record
awk '{ print $0 }' text.txt

# NR represents the current record's number (aka line number)
awk '{ print NR }' text.txt

# NF represents the number of fields in the current record
awk '{ print NF }' text.txt

# FS represents the field separator.  
# Here I change it to the letter 'i'
awk 'BEGIN { FS = "i" } ; { print $1 }' text.txt

Additionally, each field in a record can be referenced by its index position, starting with 1.

# print the 2rd "field" in each "record"
# aka the 2rd word in each line
awk '{ print $2 }' text.txt

Let’s pretty-print a csv file using awk.

Input:

id,first_name,last_name,email,gender,ip_address
1,Waiter,Takos,wtakos0@github.com,Male,75.26.211.80
2,Susana,Pincked,spincked1@dropbox.com,Female,17.83.220.43
3,Gaile,Persse,gpersse2@fc2.com,Male,111.61.80.169

awk appears to have all the power of a scripting language, and its syntax is also flexible. Check out how we can create a “function”-like syntax that changes the FS (field separator) and OFS (output field separator), and also use a conditional to only print our header and lines that contain a specific word (“dropbox” in our example)

awk 'BEGIN {FS=","; OFS=" =====> "}
{
  if (NR == 1) {
    print toupper($2),tolower($4);
  }

  if ($0 ~ /dropbox/) {
    print $2,$4;
  }
}' example.csv

# FIRST_NAME =====> email
# Susana =====> spincked1@dropbox.com