Cover image from Unsplash
Introduction
This post is not intended to teach you how to use regular expressions but perhaps to convince you why you should learn to use them :)
You can find the regular expression definition in many websites and books. Here is one I particularly like it:
“In short a regular expression is a formal method of specifying a pattern of text. In detail it's a composition of symbols and characters with special functions which, when grouped together and with literal characters, make a sequence, an expression. This expression is interpreted as a rule which indicates success if a given input matches exactly all its conditions.“
(Translated from the book in Portuguese: Expressões Regulares by Aurélio Marinho Jargas)
Regex wasn’t pleasant to me until I started to understand it. And more than understanding it, it became really fun when I was trying to solve challenges with the requirement of using only Regex. It seems like a nice puzzle game like those ones that the more you play, the faster and easier you solve it. Like those ones that make you feel good when you finally find a solution.
When to use regular expressions
There is no rule about when to use regex. And probably it’s not the kind of task you will do every day. But understanding its concept and getting to know how to use it can be helpful once regex might be applied in several use cases.
Here are a few ones:
- validations using pattern matching (passwords, phone numbers, credit card numbers, etc)
- searching and replacing substrings
- parsing user input
- parsing logs
Usually you can apply regular expressions to cases in which you want to find a way to search for a text that follows a pattern and you know the possible variations.
Regex example
That's a simple Python example using regular expression just to show how it looks like:
#first you import the re module:
import re
phone_number = "123-456-7890"
#then you search the pattern in the string or input you wish
re.search("^(\d{3}-){2}\d{4}$", phone_number)
In the example above I'm just trying to check if the given phone number follows the pattern of having three digits followed by a dash then three more digits followed by another dash and four final digits. Simple like that.
So for this case the written pattern matches the given string.
I'm not going to dive deep into the meaning of each character and meta character in the expression because there is a huge list for that. But you can check the Regular expression syntax cheatsheet used in JavaScript.
Other examples
The list is too long but here are a few use case examples of what kind of text and formats you can validate using regular expressions:
- Phone number
- ID format
- Credit card number
- Date and time
- user name
- password
- URLs
- text contained at the beginning or at the end of a line
- text between symbols, e.g., tags: <>some text</>
A brief discussion about validation beyond regex: the e-mail address example
As I said, you can use regex in many different scenarios including writing patterns to create formatting validations when it's needed. A few days ago I came through a code challenge to write a regular expression pattern to match valid email addresses. And that was a tricky one because it's hard to translate that to a simple pattern. In addition to being tricky to translate to a simple pattern, that's a kind of validation involving layers other than just writing a regex.
So I found a comment on StackOverflow discussing this kind of validation. The first pattern length created was bizarre. And the explanation presented in the comment of the link above is interesting (it even has a diagram representing the regex) but just to sum up why validating email using regex might be so complicated: an email address is usually composed by a username followed by a "@" and a domain name, and finally an extension. Like that: username@domain.extension
. So when creating a regular expression to validate each syntax part of the email address you'll come across conventions that people are already familiar with to identify whether each part of the email is valid or not. And these conventions involve many details. So many details that if you follow them strictly, they can make your regex look huge. For the challenge I came across, a simpler regex was enough. But in real life the syntax validation will be just the first validation obstacle.
And that's because formatting/syntax validation is not enough to validate email addresses once this kind of validation shouldn't happen only on the client-side. And that's why it's more complicated than just using a regex to solve it. You can have an invalid email that attends the formatting validation imposed by your regex and if that's the only validation your program depends on you should review it.
Of course you can consider this argument for many other types of information such as phone numbers, ID document numbers, etc. But emails are possible to be verified through server-side validations. As well as phone numbers. So don't trust only on regexes to validate them. It's important to understand that it's neither necessary nor recommended to validate an email very rigidly, after all we will only know if it's truly valid when we send an email to the address concerned.
regex101
“regex101 is a regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET.”
This tool's been very useful when I'm trying to solve regex challenges. It's basically a regex playground with a friendly UI where you can write the regex in the chosen programming language and you can also pass possible inputs to test your regular expression.
The interface allows you to have a brief explanation of what each part of the regex does as well as gives the match information of each of them. There is also a quick reference section to guide you when you need to make a quick search of what to use for some functionality you want to assign to your regular expression.
That's how it looks like:
There are other similar alternatives to regex101, this is just the one I use. But you can find other options such as RegExr, Expresso, RegexBuddy, Regex Hero, etc.
Conclusion
Regular expressions are not necessarily a requirement to solve a problem. However, knowing how to use them might be helpful and sometimes even quicker when it comes to finding a solution for some specific problems such as those related to formatting or validation. They might be fun when you start to use them more often and you face the situation of becoming more fluent when writing and reading them.
As I said: writing a regex pattern that you proudly make work under your own effort is as fun as finishing any puzzle game :)
Top comments (0)