My experience with Regex, and how it's a TIME-SAVER
Regular expressions (Regex) look intimidating due to their complex set of characters and symbols.
However, when you get to the know-how of things, Regex can be made simpler.
Take my story, for instance. I recently faced a problem in developing LiveAPI, where I had to take a codebase and extract the files that had API definitions.
Since there were many frameworks, I needed a solution that could match certain patterns in the code and filter out files with API definitions.
For instance, when working with a Flask codebase, I needed to locate files with API route definitions like this:
@app.route('/api/resource', methods=['GET'])
def get_resource():
# Implementation here
pass
or
@app.post('/api/resource')
def create_resource():
# Implementation here
pass
I didn't have a solid idea of how Regex expressions worked when I got approached with this problem.
So, I gradually started learning the necessary techniques, and I could make Regex expressions with ease.
This enabled me to design a solution for extracting the files that had these API definitions and also saved me considerable time compared to manual searching.
Let's see how we can start with Regex, and slowly move towards how I solved the problem in detail.
Make your step towards learning Regex: The Essentials
Before going into the techniques for Regex, we need a solid understanding of what they exactly are, and what are the principles behind them. So we can treat this logically.
Regex is short for Regular Expression. It helps to match, find or manage text.
Regular Expressions are a string of characters that express a search pattern. It is especially used to find or replace words in texts.
Additionally, we can test whether a text complies with the rules we set.
Now let's go through each concept one by one.
Basic Matchers and Characters
-
Direct Matching
- For this one, just input the characters you want to match, and you are done
- Example: To match "cat" in a string, just use
cat
.
-
The full stop
.
- The period
.
Allows to select any character, including special characters and spaces- Example:
c.t
will match "cat", "cut", "c t", and even "c$t".
- Example:
- Exception: The
.
is a special character in regular expressions, so to match an actual period, you must escape it using a backslash (.).- Example:
c\.t
will match only "c.t" and not "cat" or "cut".
- Example:
- The period
-
Character Sets
[]
- If one of the characters in a word can be various characters, we write it in square brackets
- Example:
- I want something that can match "cat", "cet", "cit", "cot", and "cut".
- The common letters here are c and t
- The letters in between are different, a,e,i,o,u
- So the Regex required will be
c[aeiou]t
-
Negated Character sets
[^]
- If you want to exclude some characters for a particular position then write it in
[^]
- Example:
- I do not want the words "cat", "cet", "cit", "cot", and "cut" to match
- So the Regex required will be
c[^aeiou]t
- If you want to exclude some characters for a particular position then write it in
Ranges and Repetition
-
Letter Ranges
[A-Z]
- If you want to find letters in a certain range then use starting letter and ending letter separated by a dash between them like
[a-z]
,[g-r]
- Example:
[a-z]
matches any lowercase letter from a to z.[g-r]
matches any lowercase letter from g to r, so h would match, but s would not.
- If you want to find letters in a certain range then use starting letter and ending letter separated by a dash between them like
-
Number Range
[0-9]
- If you want to find numbers in a certain range then the starting number and ending number are separated by a dash between them. Like
[0-9]
- Example: [0-7] matches any single digit from 0 to 7, so 5 would match but 9 would not.
- If you want to find numbers in a certain range then the starting number and ending number are separated by a dash between them. Like
-
Asterisk
*
- We put an asterisk
*
after a character to indicate that the character may either not match at all or can match many times - Example:
go*gle
matches "ggle", "gogle", "google", "gooogle", etc. - So the character
o
here can appear 0 or more times
- We put an asterisk
-
Plus Sign
+
-
+
sign is used to indicate a character can occur one or more times - Example:
go+gle
matches "gogle", "google", "gooogle", but not "ggle". - So, here the character
o
appears 1 or more times
-
-
Question Mark
?
- To indicate a character is optional. We use the question mark
?
- Example:
colou?r
matches both "color" and "colour". - Here the character u is optional.
- To indicate a character is optional. We use the question mark
-
Curly Braces
{}
- Use curly braces
{n}
to specify the exact number of times a character should occur. - Example:
a{3}
matches "aaa" but not "aa" or "aaaa".
- Use curly braces
Grouping and Alternation
Read the rest of the article on journal
Top comments (0)