Regex:
Regular expression, often abbreviated as Regex, is a pattern or sequence of characters or words which is used to search and manipulate strings in various languages. It is highly useful for various applications, such as user input validation, searching for patterns in a document, or searching and manipulating large datasets.
Regex in Python:
Regex is used in python using a module called 're'. Before using regex, we need to import it as:
>>> import re
After importing it, we can start using it.
Steps:
Import re module in python as
import re
Create a regex as
re.compile()
.Pass a string into the module's
search()
method, in which you want to search for a pattern. It will return amatch
object.Call the module's group method that will return the matched string with given pattern.
Basic syntax of regex in python
1. Occurrence of a character or an expression:
-
[]
: whatever characters written inside square bracket, will be considered for the matching patternex.
[abc]
will match for a, b or c. \d
: will match for any digit inside the string.\D
: will match for any non-digit character (i.e. any character except a number)\w
: will match for any word character (i.e. letter, number or underscore)\W
: will match for non-word character (i.e. anything except a letter, number or an underscore)\s
: will match for a whitespace character\s
: will match for a whitespace character[^ . . .]
matches any character which is not within the square brackets.-
(...)
groups sub-expressions.Ex.
(abc)
will match for abc. -
|
: acts as or in between sub-expressions.Ex.
(a|b|c)
will match for either a, b or c. -
\
: acts as the escape character. Computer takes a character followed by escape character as a literal.Ex. 1.
[a-z\d]
will match for any character from a to z or a digit.2.
[a\-z]
will match for 'a' or '-' or 'z'
2. Number of occurrences of a character or an expression:
-
{}
: Curly braces match the preceding expression a specific number of timesEx.
[a]{3}
will match foraaa
in the string. *
: expressions preceded by*
will get matched for zero or more occurrences.+
: expressions preceded by+
will get matched for one or more occurrences.?
: expressions preceded by?
will get matched for zero or single occurrence.{m,n}
matches from m to n occurrences of the preceding expression.{,m}
matches for up to m occurrences of the preceding expression{m,}
matches for at least m occurrences of the preceding expression
for more information, refer to official documentation of regex in python.
Some important regex functions:
re.search(pattern, string)
: It searches for the first occurrence of the pattern in the string and returns a match object if found, otherwise returns None.re.findall(pattern, string)
: returns a list of all non-overlapping matches in the string as strings.-
re.sub()
: Replaces all occurrences of a pattern in a string with a given replacement. It is very useful in manipulating large datasets.Syntax:
re.sub(string, replacement, text)
import re text = "The quick brown dog jumps over the lazy dog." new_text = re.sub("dog", "cat", text) print(new_text) #output: The quick brown cat jumps over the lazy cat.
-
re.split
: Splits a string by the occurrences of a pattern.
import re string = "The quick brown fox jumps over the lazy dog" split_list = re.split("\s", string) print(split_list) # output: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
Example:
- Email Validation:
Output:
Conclusion:
" In conclusion, Regular Expressions are a powerful tool in any programming language for searching and manipulating strings in a dataset. They have become a commonly used tool among programmers on any platform. With a solid understanding of the Regular Expression syntax, you will be able to effectively search and manipulate strings in Python, making it a valuable skill to have in your programming toolkit. "
Top comments (0)