What is regular expression
This is a sequence of character that define a search pattern in a form or text. It is used in popular languages like Javascript, Go, Python, Java, C# which supports regex fully. Text editors like Atom, Sublime and VS code editor use it to find and replace matches in your code.
Example in vs code editor. Click (ALT+ R) to use regex
Applications
- Grabbing HTML tags
- Trimming white spaces
- Removing duplicate text
- Finding or verifying card numbers
- Form Validation
- Matching Ip addresses
- Matching a specific word in a large block of text.
Literal character
It matches a single character. Example, if you want to match character 'e' in bees and cats.
Meta character
Match a range of characters. Example lets do an easy regex to find specific numbers 643 in a series of numbers.It will only match 643 not the rest of the numbers. I am using Regex101
Two ways of writing regex
1) const regex = /[a-z]/gi;
2) const regex = new RegExp(/[a-z], 'gi'/);
Different types of meta characters include:
1) Single character
let regex;
// shorthand for the single characters
regex = /\d/; //Matches any digital character
regex = /\w/; // Matches any word character [a-zA-z0-9_]
regex = /\s/; // Matches any whitespace
regex = /./; //Matches any character except line terminators
regex = /\W/; //Matches any non-word characters. Anything that's not [^a-zA-z0-9]
regex = /\S/; // Matches any non whitespace
regex = /\D/; //Matches any non-digit character [^0-9]
regex = /\b/; //assert position at a word boundary
regex = /\B/; // matches non-boundary word
// Single characters
regex = /[a-z]/; // Matches lowercase letters between a-z (char code 97-122)
regex = /[A-Z]/; // Matches uppercase letters between A-z (char code 65-90)
regex = /[0-9]/; // Matches digits numbers between 0-9 (char code 48- 57)
regex = /[a-zA-Z]/; // matches matches both lower and uppercase letters
regex = /\./ ; // matches literal character . (char code 46)
regex = /\(/ ; // matches literal character (
regex = /\)/ ; // matches literal character )
regex = /\-/ ; // matches literal character - (char code 95)
2) Quantifiers
They measure how many times you want the single characters to appear.
* : 0 or more
+ : 1 or more
? : 0 or 1
{n,m} : min and max
{n} : max
/^[a-z]{5,8}$/; //Matches 5-8 letters btw a-z
/.+/; // Matches at least one character to unlimited times
const regex = /^\d{3}-\d{3}-\d{4}$/; // Matches 907-643-6589
const regex = /^\(?\d{3}\)?$/g // matches (897) or 897
const regex = /.net|.com|.org/g // matches .com or .net or .org
3) Position
^ : asserts position at the start
$ : asserts position at the end
\b : word boundary
const regex = /\b\w+{4}\b/; // Matches four letter word.
If you want to look for words with any 4 word character use \b without the boundary it will select any 4 word letters from word characters.
Character Classes
This are characters that appear with the square brackets [...]
let regex;
regex = /[-.]/; //match a literal . or - character
regex = /[abc]/; //match character a or b or c
regex =/^\(?\d{3}\)?[-.]\d{3}[-.]\d{4}$/; // matches (789)-876-4378, 899-876-4378 and 219.876.4378
Capturing groups
This is used to separate characters within a regular expression and is enclosed with parentheses (....)
The below regex pattern captures different groups of the numbers
Capturing groups is useful when you want to find and replace some characters. Example you can capture a phone number or a card number and replace it by showing only the first 3-4 digits. Take a look at the example below.
//How to create a regex pattern for email address
const regex = /^(\w+)@(\w+)\.([a-z]{2,8})([\.a-z]{2,8})?$/
// It matches janetracy@jsninja.co.uk or janetracy@hey.com
Back reference
You can capture a group within a regex pattern by using (\1)
const regex = /^\b(\w+)\s\1\b$/;
// This will capture repeated words in a text.
Back reference can be used to replace markdown text to html.
Types of methods used regular expression
1) Test method
This is a method that you can call on a string and using a regular expression as an argument and returns a boolean as the result. True if the match was found and false if no match found.
const regex = /^\d{4}$/g;
regex.test('4567'); // output is true
2) match method
It is called on a string with a regular expression and returns an array that contains the results of that search or null if no match is found.
const s = 'Hello everyone, how are you?';
const regex = /how/;
s.match(regex);
// output "how"
3) exec method
It executes a search for a match in a specified string. Returns a result array or null. Both full match and captured groups are returned.
const s = '234-453-7825';
const regex = /^(\d{3})[-.](\d{3})[.-](\d{4})$/;
regex.exec(s);
//output ["234-453-7825", "234", "453", "7825"]
4) replace method
Takes in two arguments, regex and the string/ callback function you want to replace it with. This method is really powerful and can be used to create different projects like games.
const str = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.';
const regex = /\b\w{4,6}\b/g;
const results = str.replace(regex, replace)
function replace(match){
return 'replacement';
}
// output
replacement replacement replacement sit replacement, consectetur adipiscing replacement, sed do eiusmod replacement incididunt ut replacement et replacement replacement replacement.
5) split method
The sequence of character that makes where you should split the text. You can call the method it on a string and it takes regular expression as an argument.
const s = 'Regex is very useful, especially when verifying card
numbers, forms and phone numbers';
const regex = /,\s+/;
regex.split(s);
// output ["Regex is very useful", "especially when verifying card numbers", "forms and phone numbers"]
// Splits the text where is a , or whitespace
Let's make a small fun project
We want to make a textarea, where you can write any word character and when you click the submit button, the text generated will be individual span tags. When you hover on the span text, background color will change and also the text to (Yesss!!).
Let's do this!!!!!
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Regex expression</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<h1>Regex expression exercises</h1>
<div class="text-container">
<textarea name="textarea" id="textarea" class = "textarea" cols="60" rows="10">
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus.
Most people 234-9854 infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness.
The best way to prevent and slow down 456-2904 transmission is be well informed about the COVID-19 virus, the disease it causes and how it spreads. Protect yourself and others from infection by washing your hands or using an alcohol based rub frequently and not touching your face.
The COVID-19 virus spreads 860-8248 primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, so itโs important that you also practice respiratory etiquette (for example, by coughing into a flexed elbow). </textarea>
<div class="result-text">
</div>
<button type="submit">Submit</button>
</div>
<script src="regex.js"></script>
</body>
</html>
Let's write the Javascript part
const button = document.querySelector('button');
const textarea = document.querySelector('textarea');
const resultText = document.querySelector('.result-text');
function regexPattern (){
const regex = /(\W+)/g;
const str = textarea.value;
const results = str.split(regex);
console.log(results);
results.forEach(result =>{
if(result != null){
const span = document.createElement('span');
span.innerHTML = result;
resultText.appendChild(span);
span.addEventListener ('mouseover', () => {
const randomColour = Math.floor(Math.random()* 255);
const randomColour1 = Math.floor(Math.random()* 255);
const randomColour2 = Math.floor(Math.random()* 255);
span.style.backgroundColor = `rgba(${randomColour}, ${randomColour1}, ${randomColour2})`;
span.textContent = 'Yesss!'
});
}
});
};
button.addEventListener('click', () => {
resultText.innerHTML += `<p class ='text-info'>This is what I matched</P>`;
regexPattern();
});
results
Source code in my GitHub
Watch the result video
Websites resources for learning regex in Js
- ๐ปRegular expression info
- ๐ปRegex.com
- ๐ปRegexone
- ๐ปRegex101
Youtube videos
- ๐ฅRegular Expressions (Regex) Mini Bootcamp by Colt Steele
- ๐ฅLearn Regular Expressions In 20 Minutes by Web Dev Simplified
- ๐ฅRegular Expressions (RegEx) Tutorial by NetNinja
- ๐ฅRegular Expressions (Regex) in JavaScript by FreecodeCamp
Books
- ๐Mastering Regular Expressions by Jeffrey E. F. Friedl
- ๐Regular Expressions Cookbook by Jan Goyvaerts
- ๐Introducing Regular Expressions by Michael Fitzgerald
Conclusion
As a code newbie I was terrified when i first saw how regex looks like but this week, I decided to learn it and write about. To be honest I will use this post as a future reference, I hope you will too.
Now that you know how powerful regex is and where it can be applied. Especially in form validation or card number validation. I hope this helps any beginner to understand how powerful regex can be and how to use it.
Top comments (6)
Great article!
I'm not sure about this part:
* : 0 or more
+ : 1 or more
? : 0 or more
In my opinion, last line should be:
? : 1 or NONE
Because AFAIK question mark saying that something is optional and you are using this sign as optional marker in next few examples.
(?(\d{3})?)-.-. - in this example it means it can match
903-455-2346
(657)-878-9065
234.345.5676
(657)-878-9065
That means the (..) can appear 0 0r 1
I look at most documentation it will say ? is a greedy quantifier
? Quantifier โ Matches between zero and one times, as many times as possible, giving back as needed (greedy)
As You wrote in above comment:
"That means the (..) can appear 0 or 1"
So this is 0 or 1 (NONE or ONE) not "zero or MORE" (as in article).
In article, there is info, that "?" could be zero or more. For me, ie "5" is one of "more" ;)
So I think, in article it should be:
* : 0 or more
+ : 1 or more
? : 0 or 1
Because in current version, there is no difference between "*" and "?"
Yes, Jakub. I update the article a few days ago.
Thank you for pointing out the error.
Nice work @tracycss , thank you!
Thanks. Really tried my best to summarize my notes and post the most important information. ๐