Awk is basically a proto-Perl. As Perl was one of the most influential language of all times (JavaScript, Ruby, and PHP are all Perl's direct descendants), Awk is indirectly quite historically important.
There hasn't been any good reason to use Awk for decades now. As I keep saying over and over, if you write anything nontrivial, just use a real programming language like Ruby, Python, or Perl. But it's still interesting for historical reason, so let's check what coding was like back in the 1980s.
Hello, World!
Awk scripts are a series of pattern { command }
, where pattern is most often a regular expression. If script contains any such pattern, it will be executed on each line.
Here's one way to say Hello, World! in Awk:
#!/usr/bin/awk -f
/./ { print "Hello, " $1 "!" }
$ seq 1 5 | ./hello.awk
Hello, 1!
Hello, 2!
Hello, 3!
Hello, 4!
Hello, 5!
$ ./hello.awk
World
Hello, World!
Bob Ross
Hello, Bob!
So any line that contains non-whitespace characters will result in a hello. String concatenation is done by just putting a few strings next to each other. "Hello, " $1 "!"
is what would be "Hello, " + $1 + "!"
or "Hello, " . $1 . "!"
or such in a more reasonable language.
Each line is $0
, and it's also automatically split into words, so $1
means first word of currently processed lines, $2
means second word, etc. Those special variables are used for regular expression's first, second etc. match in Perl, Ruby, and some other languages, and I think that's where they came from.
Sum numbers from STDIN
There are some other patterns like BEGIN
and END
to do before and after processing lines. Here's a simple Awk program for adding all numbers, one per line:
#!/usr/bin/awk -f
BEGIN { x = 0 }
/[0-9]+/ { x += $1 }
END { print x }
Which works like this:
$ seq 10 20 | ./sum.awk
165
Awk has pre-Perl regular expressions, so things like \d
don't work. That's another reason why it's better to use something more modern.
Awk's BEGIN { }
and END { }
blocks are still present in Perl, Ruby, and some other languages.
FizzBuzz with regexp
One way to do the FizzBuzz is to reuse our regular expressions from episode 7. At first you might think command block would just do { print "FizzBuzz" }
or such, but then all the other blocks would match too (divisible by 15 is also divisible by 3 and 5 etc.). An easy way is to modify $0
variable (current line). We print it at the end.
#!/usr/bin/awk -f
/^(([0369]*[147]([258][0369]*[147]|[0369])*([258][0369]*[258]|[147])|[0369]*[258])(([147][0369]*[147]|[258])([258][0369]*[147]|[0369])*([258][0369]*[258]|[147])|([147][0369]*[258]|[0369]?))*(([147][0369]*[147]|[258])([258][0369]*[147]|[0369])*[258][0369]*|[147][0369]*)|([0369]*[147]([258][0369]*[147]|[0369])*[258][0369]*|[0369]*))0$/ { $0="FizzBuzz" }
/^(([0369]*[147]([258][0369]*[147]|[0369])*([258][0369]*[258]|[147])|[0369]*[258])(([147][0369]*[147]|[258])([258][0369]*[147]|[0369])*([258][0369]*[258]|[147])|([147][0369]*[258]|[0369]?))*(([147][0369]*[147]|[258])([258][0369]*[147]|[0369])*([258][0369]*[147]|[0369]?)|([147][0369]*[147]|[258]))|([0369]*[147]([258][0369]*[147]|[0369])*([258][0369]*[147]|[0369]?)|[0369]*[147]))5$/ { $0="FizzBuzz" }
/^.*[05]$/ { $0="Buzz" }
/^(([0369]*[147]([258][0369]*[147]|[0369])*([258][0369]*[258]|[147])|[0369]*[258])(([147][0369]*[147]|[258])([258][0369]*[147]|[0369])*([258][0369]*[258]|[147])|([147][0369]*[258]|[0369]?))*(([147][0369]*[147]|[258])([258][0369]*[147]|[0369])*[258][0369]*|[147][0369]*)|([0369]*[147]([258][0369]*[147]|[0369])*[258][0369]*|[0369]*))$/ { $0="Fizz" }
/./ { print $1 }
To use it:
$ seq 1 20 | ./fizzbuzz.awk
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
FizzBuzz
A less ridiculous version would be this:
#!/usr/bin/awk -f
$0 % 15 == 0 { print "FizzBuzz"; next }
$0 % 5 == 0 { print "Buzz"; next }
$0 % 3 == 0 { print "Fizz"; next }
{ print }
Any expression can be used as pattern. next
prevents all other pattern checks for the current lines.
File output
Awk makes it really easy to print to files. This script sort out the input to odd.txt
and even.txt
:
#!/usr/bin/awk -f
/[13579]$/ { print >"odd.txt" }
/[02468]$/ { print >"even.txt" }
Like in shell >
means overwrite the file, and >>
means append. But while it might look like it will keep reopening and overwriting so you only see last line, each file will be opened just once:
% seq 20 30 | ./file_output.awk
$ cat odd.txt
21
23
25
27
29
$ cat even.txt
20
22
24
26
28
30
And print
without arguments is the same as print $0
.
Pipe output
Even nicer, we can do similar redirection with pipes:
#!/usr/bin/awk -f
/[13579]$/ { print | "tac" }
Which matches all the lines with odd numbers and send them to tac
program to print them in backward order.
$ seq 10 30 | ./reverse_odds.awk
29
27
25
23
21
19
17
15
13
11
Fibbonacci
Awk has normal function definitions. There's no distinction between number and string variables. If we put a command block without a pattern, it will match every line.
#!/usr/bin/awk -f
function fib(n) {
if (n <= 2) {
return 1;
} else {
return fib(n - 1) + fib(n - 2);
}
}
{ print fib($1) }
Which does:
$ seq 1 20 | ./fib.awk
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
Rolling Dice
Awk has some trouble with command line arguments - it normally treats them as files to open. This code only works because we don't actually have any per-line patterns.
#!/usr/bin/awk -f
BEGIN {
for(i=0; i<ARGV[2]; i++) {
print int(rand() * ARGV[1]);
}
}
We can use it to roll 5 100-sided dice:
$ ./dice.awk 100 5
84
39
78
79
91
Tally
Awk has associative arrays (nowadays usually called hashes or dictionaries).
#!/usr/bin/awk -f
{ tally[$0]++ }
END {
for(n in tally) {
print n, tally[n]
}
}
Awk has no way to print regular arrays or associative arrays. If you try to print tally
it will give you an error. It's another feature of modern programming languages that has roots in times of Awk, but is now done in much better ways.
$ ./dice.awk 6 100 | ./tally.awk
2 17
3 18
4 17
5 22
0 13
1 13
Should you use Awk?
No.
Special purpose languages have their place, but what Awk is doing - processing text files - is no longer "special purpose". Pretty much every modern language excels at processing text files and matching regular expressions, and handles everything Awk does a lot better.
Awk made a lot of sense back when its originated, as C was godawful at text processing, and Unix shell was godawful at writing any kind of structured programs, so Awk was addressing an obvious need. In modern times where every programmer is familiar with a language like Ruby, Python, Perl, or pretty much anything else that can process text, there's no place for Awk.
The language also definitely shows its age. Its regular expression engine is bad. It doesn't have console.log
equivalent. It can't handle common text formats like CSV or JSON. It doesn't have sufficient Unicode capabilities. And so on. It does quite decently on conciseness, but only if you do exactly the kind of programs it likes - common requirements like parsing command line arguments will not work too well.
Awk is mainly of historical relevance, but it's not completely dead yet. If you work with a lot of Unix shell scripts, short Awk programs will be occasionally used there. I don't approve of this at all (seriously, just use real programming language like Ruby, Python, or Perl), but it might be useful to learn basics of Awk so you can read such shell code.
Code
All code examples for the series will be in this repository.
Top comments (0)