DEV Community

Cover image for A Beginner's Guide: Glob Patterns
malik
malik

Posted on • Edited on • Originally published at malikbrowne.com

A Beginner's Guide: Glob Patterns

This post was originally posted on malikbrowne.com.

Recently, one of my coworkers was having trouble because Jest wasn't running tests on a new folder he had created.

After some investigation, it turns out that the Jest configuration glob didn't include this whole folder of tests that weren't running! (Scary!)

Understanding how globs work was essential to understanding how to fix this problem, and there isn't a ton of documentation on it other than the Linux manual. Let's change that!

In this post, we'll go over the history of globs, how to use wildcard characters, and define the three main characters of wildcard matching.

What the heck are globs?

Globs, also known as glob patterns are patterns that can expand a wildcard pattern into a list of pathnames that match the given pattern.

On the early versions of Linux, the command interpreters relied on a program that expanded these characters into unquoted arguments to a command: /etc/glob.

This command was later on provided as a library function, which is now used by tons of programs, including the shell. Several different tools and languages have adopted globs, putting their little spin on it. It's quite the extensive list:

  • Node.js
  • Go
  • Java
  • Haskell
  • Python
  • Ruby
  • PHP

Now that we know a little bit about the history of globs, let's get into the part that makes it useful: wildcard matching.

Wildcard Matching

A string can be considered a wildcard pattern if it contains one of the following characters: *, ?, or [.

Asterisks (*)

The most common wildcard that you'll see is the asterisk. This character is used in many ways but is mainly used to match any number of characters (like parts of a string).

The three main use cases of asterisks that I've seen used are:

  • * - On Linux, will match everything except slashes. On Windows, it will avoid matching backslashes as well as slashes.
  • ** - Recursively matches zero or more directories that fall under the current directory.
  • *(pattern_list) - Only matches if zero or one occurrence of any pattern is included in the pattern-list above

These use cases can also be used in conjunction with each other! For example, to find all Markdown files recursively that end with .md, the pattern would be **/*.md

Note: *.md would only return the values in the current directory, which is why we append **/ at the beginning.

Question Marks (?)

The question mark wildcard is commonly used to match any single character.

For example, let's say were given a list of files:

  • Cat.png
  • Bat.png
  • Rat.png
  • car.png
  • list.png
  • mom.jpg
  • cat.jpg

If you wanted to find all the files that had _at in the folder, you could conveniently use a pattern like ?at which would return the following results:

  • Cat.png
  • Bat.png
  • Rat.png
  • cat.jpg

Note: A cool thing about this pattern is that it didn't care about the case of the character. I've found this useful in scripts when trying to find files that I've marked with certain dates.

Character classes and Ranges ([)

The square brackets ( [, and ] ) can be used to denote a pattern that should match a single character that is enclosed inside of the brackets. These are called character classes.

An important thing to know is that the string inside of the brackets is not allowed to be empty. This can lead to misunderstandings of weird patterns like this: [][!]

This would match the first three characters in a string that had "\[", "\]", and "!".

For example, let's continue to use the same list we used in the previous example:

  • Cat.png
  • Bat.png
  • Rat.png
  • car.png
  • list.png
  • mom.jpg
  • cat.jpg

If you wanted to match only the title cased files in this list, you could use the pattern [CBR]at.

This would return the result:

  • Cat.png
  • Bat.png
  • Rat.png

Ranges

A cool feature that is available for globs are ranges, which are denoted by two characters that are separated by a dash '-'.

For example, the pattern [A-E] would match any starting character that included ABCDE. Ranges can be used in conjunction with each other to make powerful patterns.

A common pattern that you may have seen before is the pattern to match alphanumerical strings: [A-Za-z0-9 ]

This would match the following:

  • [A-Z] All uppercase letters from A to Z
  • [a-z] All lowercase letters from a to z
  • [0-9] All numbers from 0 to 9

This can be used for data validation in tons of different areas since ranges work in regex expressions as well!

Complementation

A feature worth mentioning is that globs can be used in complement with special characters that can change how the pattern works. The two complement characters that I see are exclamation marks (!) and backslashes (\).

The exclamation mark can negate a pattern that it is put in front of. In the character class example I shared above, we used the pattern [CBR]at.

If we wanted to explicitly filter the results we wanted, we could negate the pattern by placing the exclamation point in front of the class [!CBR]at.

Backslashes are used to remove the special meaning of single characters '?', '*', and '[', so that they can be used in patterns.

Why are globs useful?

I've found globs extremely useful for doing a lot of scripting and automation tasks in recent months. Being able to specify certain files recursively in a directory tree is invaluable - especially when working in CI environments where you don't have control over the names of root directories.

Something important that I want to note is that while wildcard patterns are similar to regex patterns, they are not explicitly the same for two main reasons:

  1. Globs are meant to match filenames rather than text
  2. Not all conventions are the same between them (example: * means zero or more copies of the same thing in regex)

Conclusion

Hopefully, this overview of globs provides some transparency when looking over different configuration files in the future. I know this is something that I struggled with understanding when trying to read webpack/typescript/jest configurations, so if this is helpful to you, let me know in the comments or on Twitter!

Useful Links/Resources

http://www.globtester.com/
https://en.wikipedia.org/wiki/Glob_(programming)
https://commandbox.ortusbooks.com/usage/parameters/globbing-patterns
http://teaching.idallen.com/cst8207/15w/notes/190_glob_patterns.html
http://man7.org/linux/man-pages/man7/glob.7.html

Top comments (6)

Collapse
 
clavinjune profile image
Clavin June

Isn't it regex? Or it is different?

Collapse
 
milkstarz profile image
malik • Edited

Quoting something from the post:

Something important that I want to note is that while wildcard patterns are similar to regex patterns, they are not explicitly the same for two main reasons:

  1. Globs are meant to match filenames rather than text
  2. Not all conventions are the same between them (example: * means zero or more copies of the same thing in regex)

They're not the same they have some nuances but are similar!

Collapse
 
clavinjune profile image
Clavin June

Thanks Malik! I think i missed that paragraph before!

Collapse
 
jannikwempe profile image
Jannik Wempe

Haha, i just thought something similar right before the abstract malik cited :-D

Collapse
 
clavinjune profile image
Clavin June

Yup, perhaps we need more focus when reading haha

Collapse
 
jillejr profile image
Kalle Fagerberg

Very neat, but how does the *(pattern_list) asterisc use case actually work that you mentioned in the page?