In Brief
awk
is a powerfull command line tool found included in all modern GNU/linux distributions. It is often excluded from beginner level Linux tutorials, where more fundamental commands like cd
, pwd
, ls
etc are given more priority, and rightfully so. One must have a good grasp on the basic commmand line tools in order to harness the full power of awk.
awk
in essence is a pattern scanning and processing language, emphasis on the language part because awk
possesses functionality akin to most mainstream programming languages.
awk
loves tabulated data. So to begin with, you'll need a piece of text represented as columns delimited by some common character. You see this pattern often in shells, for instance, the ls -l
command that prints file information delimited by a space :
ls -l
total 24
lrwxrwxrwx 1 root root 7 Sep 7 07:45 bin -> usr/bin
drwxr-xr-x 1 root root 0 Apr 15 2020 boot
drwxr-xr-x 9 root root 480 Sep 8 13:34 dev
drwxr-xr-x 1 root root 1902 Sep 7 20:29 etc
However, for the sake of demonstration, let us use the following sample.txt
to illustrate awk
's capabilities.
emp_id;emp_name;emp_sex;emp_salary;emp_yoj;
1;john;male;3000;2014
2;sarah;female;2500;2018
3;lily;female;5000;2012
4;jack;male;3000;2014
5;mark;male;2500;2017
awk
is like any other UNIX command, i.e it takes in options and arguments, however, its power lies in a script argument that can either be included inline or as an external file with a .awk extension. An awk
script has the following basic structure:
BEGIN {commands}
/pattern1/ {commands}
/pattern2/ {commands}
...
/patternN/ {commands}
END {commands}
In essence, awk
loops through each row of the input file and executes commands based on conditions. The patterns correspond to regex patterns that you can use to distinguish different rows. The default pattern matches every row.
Commands following the BEGIN keyword are executed before the loop, and the ones following the END keyword, after.
Note: the numbering of the pattern has no correlation whatsoever with the row number
Alright! Lets process some text
1. Print only the name and salary of all employees
awk 'BEGIN {FS = ";"}
{print $2 " " $4}' test.txt
emp_name emp_salary
john 3000
sarah 2500
lily 5000
jack 3000
mark 2500
The string argument, marked by single quotes, is the awk
script mentioned before. Here's the rundown of its working step by step:
- Set the delimiter to ";" using the predefined FS (Field Seperator) variable before processing any rows.
- For every row, print the 3rd and 4th column separated by a space.
2. Print all the male names.
awk ' BEGIN { FS = ";" }
{if($3 == "male") print $2;}' test.txt
john
jack
mark
This script basically instructs awk
to :
- Set the delimiter to ";" before processing any rows.
- For every row, check if the third column ($3) is "male", if true then print the second column ($2) i.e the name.
Note: A command without a pattern applies to every row
3. Derive the average salary.
awk ' BEGIN { FS = ";"; sum = 0; }
{if (NR > 1) sum += $4;}
END{
total = NR - 1;
avg = sum / total;
print avg;
}' test.txt
3200
FS is one of many variables have been predefined. In this example, however, we will be defining a couple custom variables. So here is the rundown for this command:
- Before processing any rows, set the delimiter to ";" and a custom variable
sum
to 0. - For every row, if the row number is greater than one, increment the sum variable. Note: here the NR variable is predefined and it gives us the row number
- At the end, after all rows have been processed, set a variable
total
toNR - 1
, calculate average assum / total
and print it.
As you can see, these three examples encapsulate the sheer power and versatility of the awk
command. It supports features like variables, if else blocks and arithmetic evaluation as seen above, all of which you can find in full fledged programming languages.
But this is just the tip of the iceberg, awk
even supports arrays, for loops, processing multiple files with just one script, and much more. Therefore awk
is more than just a convenient command line tool, it is a powerful language in in of itself and hopefully, this inspired you to learn more about it.
Further reading
If you're interested you can check out The GNU Awk's user guide
Top comments (1)
This is highly recommended tool when executing shell commands. Thanks