DEV Community

Mark Gardner
Mark Gardner

Posted on • Originally published at phoenixtrap.com on

Better Perl: Using map and grep

As a Perl developer, you’re probably aware of the language’s strengths as a text-processing language and how many computing tasks can be broken down into those types of tasks. You might not realize, though, that Perl is also a world-class list processing language and that many problems can be expressed in terms of lists and their transformations.

Chief among Perl’s tools for list processing are the functions map and grep. I can’t count how many times in my twenty-five years as a developer I’ve run into code that could’ve been simplified if only the author was familiar with these two functions. Once you understand map and grep, you’ll start seeing lists everywhere and the opportunity to make your code more succinct and expressive at the same time.

What are lists?

Before we get into functions that manipulate lists, we need to understand what they are. A list is an ordered group of elements, and those elements can be any kind of data you can represent in the language: numbers, strings, objects, regular expressions, references, etc., as long as they’re stored as scalars. You might think of a list as the thing that an array stores, and in fact Perl is fine with using an array where a list can go.

my @foo = (1, 2, 3);
Enter fullscreen mode Exit fullscreen mode

Here we’re assigning the list of numbers from 1 to 3 to the array @foo. The difference between the array and the list is that the list is a fixed collection, while arrays and their elements can be modified by various operations. perlfaq4 has a great discussion on the differences between the two.

Lists are everywhere, man!

Ever wanted to sort some data? You were using a list.

join a bunch of things together into a string? List again.

split a string into pieces? You got a list back (in list context; in scalar context, you got the size of the list.)

Heck, even the humble print function and its cousin say take a list (and an optional filehandle) as arguments; it’s why you can treat Perl as an upscale AWK and feed it scalars to output with a field separator.

You’re using lists all the time and may not even know it.

map: The list transformer

The map function is devious in its simplicity: It takes two inputs, an expression or block of code, and a list to run it on. For every item in the list, it will alias $_ to it, and then return none, one, or many items in a list based on what happens in the expression or code block. You can call it like this:

my @foo = map bar($_), @list;
Enter fullscreen mode Exit fullscreen mode

Or like this:

my @foo = map { bar($_) } @list;
Enter fullscreen mode Exit fullscreen mode

We’re going to ignore the first way, though because Conway (Perl Best Practices, 2005) tells us that when you specify the first argument as an expression, it’s harder to tell it apart from the remaining arguments, especially if that expression uses a built-in function where the parentheses are optional. So always use a code block!

You should always turn to map (and not, say, a for or foreach loop) when generating a new list from an old list. For example:

my @lowercased = map { lc } @mixed_case;
Enter fullscreen mode Exit fullscreen mode

When paired with a lookup table, map is also the most efficient way to tell if a member of a list equals a string, especially if that list is static:

use Const::Fast;

const my %IS_EXIT_WORD => map { ($_ => 1) }
  qw(q quit bye exit stop done last finish aurevoir);

...

die if $IS_EXIT_WORD{$command};
Enter fullscreen mode Exit fullscreen mode

Here we’re using map’s ability to return multiple items per source element to generate a constant hash, and then testing membership in that hash.

grep: The list filter

You may recognize the word “grep” from the Unix command of the same name. It’s a tool for finding lines of text inside of other text using a regular expression describing the desired result.

Perl, of course, is really good at regular expressions, but its grep function goes beyond and enables you to match using any expression or code block. Think of it as a partner to map; where map uses a code block to transform a list, grep uses one to filter it down. In fact, other languages typically call this function filter.

You can, of course, use regular expressions with grep, especially because a regexp match in Perl defaults to matching on the $_ variable and grep happens to provide that to its code block argument. So:

my @months_with_a = grep { /[Aa]/ } qw(
  January February March
  April May June
  July August September
  October November December
);
Enter fullscreen mode Exit fullscreen mode

But grep really comes into its own when used for its general filtering capabilities; for instance, making sure that you don’t accidentally try to compare an undefined value:

say $_ > 5
  ? "$_ is bigger"
  : "$_ is equal or smaller"
  for grep { defined } @numbers;
Enter fullscreen mode Exit fullscreen mode

Or when executing a complicated function that returns true or false depending on its arguments:

my @results = grep { really_large_database_query($_) }
              @foo;
Enter fullscreen mode Exit fullscreen mode

You might even consider chaining map and grep together. Here’s an example for getting the JPEG images out of a file list and then lowercasing the results:

my @jpeg_files = map { lc }
                 grep { /\.jpe?g$/i } @files;
Enter fullscreen mode Exit fullscreen mode

“Side effects may include…” (updated)

When introducing map above I noted that it aliased $_ for every element in the list. I used that term deliberately because modifications to $_ will modify the original element itself, and that is usually an error. Programmers call that a “side effect,” and they can lead to unexpected behavior or at least difficult-to-maintain code. Consider:

my @needs_docs = grep { s/\.pm$/.pod/ && !-e }
                 @pm_files;
Enter fullscreen mode Exit fullscreen mode

The intent may have been to find files ending in .pm that don’t have a corresponding .pod file, but the actual behavior is replacing the .pm suffix with .pod, then checking whether that filename exists. If it doesn’t, it’s passed through to @needs_docs; regardless, @pm_files has had its contents modified.

If you really do need to modify a copy of each element, assign a variable within your code block like this:

my @needs_docs = grep {
                   my $file = $_;
                   $file =~ s/\.pm$/.pod/;
                   !-e $file
                 } @pm_files;
Enter fullscreen mode Exit fullscreen mode

But at that point you should probably refactor your multi-line block as a separate function:

my @needs_docs = grep { file_without_docs($_) }
                 @pm_files;

sub file_without_docs {
    my $file = shift;
    $file =~ s/\.pm$/.pod/;
    return !-e $file;
}
Enter fullscreen mode Exit fullscreen mode

In this case of using the substitution operator s///, you could also do this when using Perl 5.14 or above to get non-destructive substitution:

use v5.14;

my @needs_docs = grep { !-e s/\.pm$/.pod/r }
                 @pm_files;
Enter fullscreen mode Exit fullscreen mode

And if you do need side effects, just use a for or foreach loop; future code maintainers (i.e., you in six months) will thank you.

Taking you higher

map and grep are examples of higher-order functions, since they take a function (in the form of a code block) as an argument. So congratulations, you just significantly leveled up your knowledge of Perl and computer science. If you’re interested in more such programming techniques, I recommend Mark Jason Dominus’ Higher Order Perl (2005), available for free online.

Top comments (4)

Collapse
 
matthewpersico profile image
Matthew O. Persico

Glad to see that you did NOT throw a Schwarzian transform example in there. That usually makes people’s eyes glaze over. Maybe in a part 2 article?

Collapse
 
mjgardner profile image
Mark Gardner

Maybe, though the focus of such an article would be on sorting. Would also cover Joseph N. Hall's Orcish maneuver.

Collapse
 
matthewpersico profile image
Matthew O. Persico

Gee, I have to look that up - I forgot what that was. Meanwhile I own an original copy of the book he first described it in. 🤦‍♂️

Collapse
 
yukikimoto profile image
Yuki Kimoto

I often use map and grep instead of for loop when I want to write one line easily