A regular expression, or regex, is a character sequence that defines patterns for matching text. These patterns are integral in tasks like searching, "find and replace" operations, and input validation.
Widely used in applications from search engines to text editors and processing tools like sed
and AWK
, regex patterns make it easy to specify complex search criteria with minimal code. Many programming languages provide libraries, often called "regex engines," that allow for regex pattern matching and replacement.
In JavaScript, regular expressions are implemented as objects and work seamlessly with methods such as exec()
and test()
from the RegExp
class, and string methods like match()
, replace()
, search()
, and split()
. These allow developers to search within strings, validate input, or even split text based on specific patterns. Such tools are essential for handling text data dynamically and efficiently in programming.
JavaScript’s regex syntax enables developers to create precise pattern-matching rules that can be reused across projects. This guide covers the basics of regex patterns in JavaScript, from syntax to application, providing an overview of each key method and a reference for more in-depth learning.
The original source of this article can be found in Axxellanceblog
History of Regex
Now let me just ship in a little bit of history about regular expression (what is the use of knowledge if you don't know its origin eh!). Regular expressions were first described in 1951 by Stephen Cole Kleene for theoretical computer science and automata theory, later finding practical use in the 1960s in text editors and compilers through implementations like Ken Thompson's QED editor and Unix's grep
.
Through the 1970s and 1980s, regexes became foundational in Unix tools (e.g., sed
, AWK
, vi
) and were standardized in POSIX. The 1980s saw Perl’s advanced regex features, leading to innovations like PCRE (Perl Compatible Regular Expressions) in 1997, now widely used across languages and platforms, with hardware-accelerated regex engines emerging in the 2010s.
JavaScript incorporated regular expressions as first-class objects in its earliest versions, supporting pattern matching via RegExp
methods like exec()
and test()
, along with string methods like match()
and replace()
. This integration allowed regex to be central in web development for tasks like input validation and string manipulation.
Today, JavaScript regex is a core tool for both client- and server-side applications, with expanded features introduced in ES6, such as Unicode support and sticky flags, enhancing its flexibility and power in modern programming.
Now that we know its history, let's move on to something more knowledgeable by discussing the types of regex expressions.
Types of Regex Expressions
Regular expressions can be crafted in two distinct ways:
- Literal Expression: This method employs a regex literal, where the pattern is encapsulated within slashes
const re = /ab+c/
This technique compiles the expression at the time the script loads, optimizing performance when the pattern is static.
-
Dynamic Constructor: Alternatively, the
RegExp
constructor can be utilized
const re = new RegExp("ab+c");
This method allows for runtime compilation, making it ideal for scenarios where the pattern is variable or derived from external inputs, such as user-generated content.
Now that we know the types of expressions we have on regex, let's look at some of its patterns, and special characters, and use them in Javascript as reserved methods.
Regex Patterns
A regular expression pattern consists of basic characters, like /abc/
, or a mix of basic and special characters, such as /ab*c/
or /Chapter (\d+)\.\d*/
. In the last example, parentheses serve as a capturing mechanism. The match obtained from this section of the pattern is stored for future reference, as detailed in the section on using groups.
Basic Patterns | Special Patterns |
---|---|
Basic patterns are formed using characters that you aim to match exactly. For instance, the pattern /abc/ will match character sequences in a string only when the precise order "ABC" appears together. This match would be successful in the strings "Hi, do you know your ABCs?" and "The latest airplane designs evolved from slab-craft," where it identifies the substring "abc."However, there is no match in the string "Grab crab," since it contains "ab c" but not the exact substring "abc." | When searching for a match that goes beyond a direct comparison—such as identifying one or more instances of "b" or locating whitespace—you can incorporate special characters into the pattern. For example, to match a single "a" followed by zero or more "b"s and then "c," you would use the pattern /ab*c/. Here, the asterisk (*) after "b" signifies "zero or more occurrences of the preceding element." In the string "cbbabbbbcdebc," this pattern successfully matches the substring "abbbbc." |
Special characters in regular expressions
If you're interested in viewing all the special characters that can be utilized in regular expressions, refer to the table below:
Characters/constructs | Corresponding article |
---|---|
[xyz], [^xyz], ., \d, \D, \w, \W, \s, \S, \t, \r, \n, \v, \f, [\b], \0, \cX, \xhh, \uhhhh, \u{hhhh}, x\ | y |
^, $, \b, \B, x(?=y), x(?!y), (?<=y)x, (?<!y)x | Assertions (Boundary-type assertions Included) |
(x), (?x), (?:x), \n, \k | Groups and backreferences |
x*, x+, x?, x{n}, x{n,}, x{n,m} | Quantifiers |
Using regular expressions in JavaScript
Regular expressions are utilized in conjunction with the RegExp methods test()
and exec()
, as well as various String methods including match()
, matchAll()
, replace()
, replaceAll()
, search()
, and split()
.
Method | Description |
---|---|
exec() | Executes a search for a match in a string. It returns an array of information or null on a mismatch. |
test() | Checks for a match within a string and returns either true or false. |
match() | Returns an array of all matches, including capturing groups, or null if no matches are found. |
matchAll() | Returns an iterator that provides all matches, including capturing groups. |
search() | Tests for a match within a string and returns the index of the match, or -1 if no match is found. |
replace() | Performs a search for a match in a string and replaces the matched substring with a specified replacement substring. |
replaceAll() | Conducts a search for all matches in a string and replaces each matched substring with a specified replacement substring. |
split() | Uses a regular expression or a fixed string to split a string into an array of substrings. |
To determine whether a pattern exists within a string, use the test()
or search()
methods. For more detailed information—albeit with slower execution—you can utilize the exec()
or match()
methods. When using exec()
or match()
, if a match is found, these methods return an array and update properties of both the associated regular expression object and the built-in RegExp
object. If the match fails, the exec()
method returns null, which is coerced to false.
Advanced searching with flags
Regular expressions come with optional flags that enable global searching and case-insensitive matching features. These flags can be combined or used individually in any order and are incorporated as part of the regular expression.
Flag | Description | Example |
---|---|---|
d | Generate indices for substring matches. | new RegExp(pattern, 'd'); // not standard at the time of this writing |
g | Global search. | const matchesGlobal = [...str.matchAll(/pattern/g)]; |
i | Case-insensitive search. | const matchesCaseInsensitive = str.match(/pattern/i); |
m | Multi-line mode. It changes the behavior of ^ and $ to match the start and end of each line. | const matchesMultiLine = str.match(/^pattern$/m); |
s | Dot All mode. This allows the dot . to match newline characters as well. | const matchesDotAll = str.match(/pattern.s/); |
u | Unicode. Treats the pattern and input as a sequence of Unicode code points. | const matchesUnicode = str.match(/\u{1F600}/u); // Matches the grinning face emoji |
v | An upgrade to the u mode with more Unicode features. | new RegExp(pattern, 'v'); // not standard at the time of this writing |
y | Perform a "sticky" search that matches starting at the current position in the target string. | const stickyMatch = /pattern/y;const result = stickyMatch.exec(str); // Executes from the current position |
Keep in mind that flags are essential components of a regular expression and cannot be modified after the expression has been created.
Examples
For instance, re = /\w+\s/g
defines a regular expression that searches for one or more word characters followed by a space, and it searches for this pattern throughout the entire string.
const re = /\w+\s/g; // or new RegExp("\\w+\\s", "g");
const str = "fee fi fo fum";
const myArray = str.match(re);
console.log(myArray); // ["fee ", "fi ", "fo "]
The m
flag indicates that a multiline input string should be interpreted as consisting of multiple lines. When the m
flag is applied, the ^
and $
anchors match at the beginning or end of each line within the input string, rather than just at the start or end of the entire string.
Additionally, the i
, m
, and s
flags can be activated or deactivated for specific sections of a regex using the modifier syntax.
Addition
As you explore regex, you'll find that they can be intimidating at first, but with practice, they become invaluable in your coding toolkit. For hands-on practice, visit regex101.com to test your patterns and see immediate results.
If you need an in-depth explanation or more, do check out Wikipedia as they have more in-depth explanations on this. Remember, mastering regex opens up a world of possibilities in text processing, so don't hesitate to experiment and enhance your skills!
Thank you for reading!
Top comments (0)