What is AWK?
AWK is a command-line programming language primarily oriented to text and files processing - some might call it a tool -, simple yet elegant continuous lines of code can replace multiple lines of a more robust language like java or node without losing their intention.
In essence, AWK code is so simple that you can just throw it away after the execution or once your program has finished its work.
% awk 'BEGIN { print "Hello World" }'
Hello World
But there is so much more than that; considering the constant need to process data files, once you have started with AWK, you will stop building complete programs to process CSV or log files for faster and more straightforward with a couple of instructions
% awk '{ print $0 }' example.txt
This is an AWK example
% awk '{ print $4, $1, $5, $3, $2 }' example.txt
AWK This example an is
% awk '{ print $1, "could be your", $4, $5 }' example.txt
This could be your AWK example
Calculations become somewhat ridiculously simple to process
% awk '{ print $0 }' example_numbers.txt
1 2 3 testing
% awk '{ print $1 + $2 + $3, $4 }' example_numbers.txt
6 testing
% awk '{ print $2 * $3, $4 }' example_numbers.txt
6 testing
% awk '{ print $2 / $3, $4 }' example_numbers.txt
0.666667 testing
But the real potential of AWK is still beyond simple operations. With the help of control statements, loops, switch functions, this command-line tool is closer to a programming language hand to hand with multiple file processing operations to make our lives even simpler
For loop example:
% cat loop.awk
#!/bin/awk -f
BEGIN {
for (i = 1; i <= 3; i++)
print i
}
% awk -f loop.awk
1
2
3
Why is relevant today?
In a generation of powerful and versatile programming languages, sometimes we engineers tend to overcomplicate problems, most commonly because of lack of knowledge in other options, so think about how many times have you develop a small Python, NodeJS, or Golang script to read a huge CSV file, or even build a small JVM-oriented language utility with the language of your choice and without even realizing already develop multiple lines of boilerplate (useless) code.
Python script to read a file line by line and print result
import sys
def main():
filepath = sys.argv[1]
with open(filepath) as f:
for index, line in enumerate(f):
print("Line {}: {}".format(index, line.strip()))
if __name__ == '__main__':
main()
The same but with AWK
awk '{ print "Line ", $1, ":", $2 }' example.txt
And you could create more examples to explain the difference between creating scripts with AWK and with any other language, but also it is pretty performant in comparison with other
AWK and its variations' performance measurements 1
As you can see, this old-school language (AWK was created initially in 1977) could outshine some of these more robust and modern languages in some tasks, and learning it might give you a new tool you didn't even know you want to have.
First steps in AWK
Let's start by mentioning that AWK is in every Linux and macOS distribution (how cool is that?); for Windows, you have to install it (but I am pretty sure it cannot be that hard, right?).
How to know what version of AWK you currently have installed?
% awk -version
awk version 20200816
And now let's start with the basics; AWK commands' structure is pretty simple; however, there are some tricks to it, especially if you want to use it for actual text processing, the basic command could be described in this way <condition> { action }
where condition
is optional as we saw in a previous example awk '{ print $0 }' example.txt
while the action
is the operation you need to execute.
For the conditions, there are only two types of conditions, BEGIN
and END
, and they also can have actions, for example, consider BEGIN
as the entry instruction where you can enable, disable or configure different variables within the script run execution, for example, if you want to change the delimiter character from the default space (' '
) to a semicolon (;
) you can add something like at the beginning of the script BEGIN { FS= ';'}
.
AWK provides 8 built-in variables:
- FILENAME - Name of the current input file
- FS - Input field separator variable
- FNR - Number of Records relative to the current input file
- NF - Number of Fields in a record
- NR - Number of Records Variable
- OFS - Output Field Separator Variable
- ORS - Output Record Separator Variable
- RS - Record Separator variable
END
, on the other hand, will always be at the closing statement and can be used to execute any finishing commands after the main body has been completed, for example, printing final variables' values:
BEGIN {
for (i = 1; i <= 3; i++)
s += $i
}
END { print s }
Something else worth mentioning is the fact that AWK supports the creation of custom functions when you need to do more complex operations and the script starts to become hard to manage 2
awk '{ print "The square root of", $1, "is", sqrt($1) }'
AWK also provides the functionality to create Arrays (and operations built-in to manage them) and multiple other data types that we won't be discussing in this post because it might take a couple of hundreds of lines. Still, you can find a good description of them here, so please take a look if you are curious to learn more.
Example of array operations in AWK:
Array addition
BEGIN {
for (i = 1; i <= 3; i++)
array[$i];
}
END {
for (position in array)
print position ": " array[position]
}
Array deleting
BEGIN {
for (i = 1; i <= 3; i++)
array[$i];
}
END {
for (position in array)
delete array[position]
}
And in case you are thinking how powerful this is and like me trying to take it further to create small AWK powered "apps" to do the monotonous tasks while wondering how can you verify if what you are coding is valid, you can execute any number of unit tests for shell scripts, and therefore, AWK scripts using shunit2
Data processing with AWK
As mentioned a couple of times during this post, AWK's main objective is to process data, which could mean data in files, lines provided command output, or any other form of input data, but let's start simple.
Opening a file and reading the data
% cat example.txt
> This is an AWK example
% awk '{ print $0 }' example.txt
This is an AWK example
From the previous example AWK, we can notice some things like how AWK uses indexes to split the data provided within the file; these indexes are created using the delimiter, which by default is the blank space (check the example in this post on how to define a new delimiter)
Using $0
will print the whole line, while using the sequence generated based on the number of columns will give you control of the data.
% cat example.txt
> This is an AWK example
% awk '{ print $4, $1, $5, $3, $2 }' example.txt
AWK This example an is
You can also straightforwardly concatenate strings:
% cat example.txt
> This is an AWK example
% awk '{ print $1, "could be your", $4, $5 }' example.txt
This could be your AWK example
Searching a value
AWK can search information within the provided input, and one way is using regexp.
% cat example.txt
> This is an AWK example
% awk '/This/ { print $0 }'
This is an AWK example
% awk '/test/ { print $0 }'
Another searching mechanism is using control operations like if
, for example:
% cat example.txt
> This is an AWK example
% awk 'if ($1=="This"){ print $0 }'
This is an AWK example
AWK, GAWK, NAWK or MAWK
Finally as usual in any programming language, variants tend to appear with time, and AWK was not the exception; what could be considered the most important (according to me) are the next.
- GAWK - GNU AWK is available from the GNU project's open source and is currently maintained.
- NAWK - New AWK Computing, a news release on the AWK project 3
- MAWK - Fast AWK implementation which it's codebase is based on a byte-code interpreter
Of course, there are other multiple variants out there, and you won't have any trouble finding them.
As you can see, AWK is an excellent flexible and robust command-line tool, which takes a while to ramp up to, but once you get the basics is pretty simple to use and explode its potential.
In the next post, I will go deeper into different and more complex scenarios and examples; let me know if you have any questions or comments or want more specific related content.
-
https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language/ ↩
-
https://www.gnu.org/software/gawk/manual/html_node/Function-Calls.html ↩
-
Robbins, Arnold (March 2014). "The GNU Project and Me: 27 Years with GNU AWK" (PDF). skeeve.com. Retrieved October 4, 2014. ↩
Top comments (3)
great article!
also check out
ezrosent / frawk
an efficient awk-like language
frawk
frawk is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of the AWK language; many common Awk programs produce equivalent output when passed to frawk. You might be interested in frawk if you want your scripts to handle escaped CSV/TSV like standard Awk fields, or if you want your scripts to execute faster.
The info subdirectory has more in-depth information on frawk:
frawk is…
for a Rust based AWK like language
That is awesome, however, sometimes I feel like we have taken AWK a little too far or as a teammate would say probably not far enough yet
This is a Golang POSIX AWK variant
benhoyt / goawk
A POSIX-compliant AWK interpreter written in Go
GoAWK: an AWK interpreter written in Go
AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse The AWK Programming Language I was inspired to write an interpreter for it in Go. So here it is, feature-complete and tested against "the one true AWK" test suite.
Read more about how GoAWK works and performs here.
Basic usage
To use the command-line version, simply use
go install
to install it, and then run it usinggoawk
(assuming$GOPATH/bin
is in yourPATH
):On Windows,
"
is the shell quoting character, so use"
around the entire AWK program on the command line, and use'
around AWK strings -- this is a non-POSIX extension to make…I love AWK