Preface & Prerequisites
This series will teach you Ruby regular expressions from basics to advanced levels with plenty of examples and exercises.
You should have prior experience working with Ruby, should know concepts like blocks, string formats, string methods, Enumerable, etc.
You can get free pdf/epub versions of the book using these links:
Why is it needed?
Regular Expressions is a versatile tool for text processing. You'll find them included as part of standard library of most programming languages that are used for scripting purposes. If not, you can usually find a third-party library. Syntax and features of regular expressions vary from language to language. Ruby's offering is based upon the Onigmo regular expressions library.
The String
class comes loaded with variety of methods to deal with text. So, what's so special about regular expressions and why would you need it? For learning and understanding purposes, one can view regular expressions as a mini programming language in itself, specialized for text processing. Parts of a regular expression can be saved for future use, analogous to variables and functions. There are ways to perform AND, OR, NOT conditionals. Operations similar to range and string repetition operators and so on.
Here's some common use cases:
- Sanitizing a string to ensure that it satisfies a known set of rules. For example, to check if a given string matches password rules.
- Filtering or extracting portions on an abstract level like alphabets, numbers, punctuation and so on.
- Qualified string replacement. For example, at the start or the end of a string, only whole words, based on surrounding text, etc.
You are likely to be familiar with graphical search and replace tool, like the screenshot shown below from LibreOffice Writer. Match case, Whole words only, Replace and Replace All are some of the basic features supported by regular expressions.
Another real world use case is password validation. The screenshot below is from GitHub sign up page. Performing multiple checks like string length and the type of characters allowed is another core feature of regular expressions.
Here's some articles on regular expressions to know about its history and the type of problems it is suited for.
- The true power of regular expressions — it also includes a nice explanation of what regular means in this context
- softwareengineering: Is it a must for every programmer to learn regular expressions?
- softwareengineering: When you should NOT use Regular Expressions?
- codinghorror: Now You Have Two Problems
- wikipedia: Regular expression — this article includes discussion on regular expressions as a formal language as well as details on various implementations
Regexp introduction
In this chapter, you'll get to know how to declare and use regexps. For some examples, the equivalent normal string method is shown for comparison. Regular expression features will be covered next chapter onwards. The main focus will be to get you comfortable with syntax and text processing examples. Three methods will be introduced in this chapter. The match?
method to search if the input contains a string and the sub
and gsub
methods to substitute a portion of the input with something else.
This book will use the terms regular expressions and regexp interchangeably.
Regexp documentation
It is always a good idea to know where to find the documentation. Visit ruby-doc: Regexp for information on Regexp
class, available methods, syntax, features, examples and more. Here's a quote:
Regular expressions (regexps) are patterns which describe the contents of a string. They're used for testing whether a string contains a given pattern, or extracting the portions that match. They are created with the
/pat/
and%r{pat}
literals or theRegexp.new
constructor.
match? method
First up, a simple example to test whether a string is part of another string or not. Normally, you'd use the include?
method and pass a string as argument. For regular expressions, use the match?
method and enclose the search string within //
delimiters (regexp literal).
>> sentence = 'This is a sample string'
# check if 'sentence' contains the given string argument
>> sentence.include?('is')
=> true
>> sentence.include?('z')
=> false
# check if 'sentence' matches the pattern as described by the regexp argument
>> sentence.match?(/is/)
=> true
>> sentence.match?(/z/)
=> false
The match?
method accepts an optional second argument which specifies the index to start searching from.
>> sentence = 'This is a sample string'
>> sentence.match?(/is/, 2)
=> true
>> sentence.match?(/is/, 6)
=> false
Some of the regular expressions functionality is enabled by passing modifiers, represented by an alphabet character. If you have used command line, modifiers are similar to command options, for example grep -i
will perform case insensitive matching. It will be discussed in detail in Modifiers chapter. Here's an example for i
modifier.
>> sentence = 'This is a sample string'
>> sentence.match?(/this/)
=> false
# 'i' is a modifier to enable case insensitive matching
>> sentence.match?(/this/i)
=> true
Regexp literal reuse and interpolation
The regexp literal can be saved in a variable. This helps to improve code clarity, pass around as method argument, enable reuse, etc.
>> pet = /dog/i
>> pet
=> /dog/i
>> 'They bought a Dog'.match?(pet)
=> true
>> 'A cat crossed their path'.match?(pet)
=> false
Similar to double quoted string literals, you can use interpolation and escape sequences in a regexp literal. See ruby-doc: Strings for syntax details on string escape sequences. Regexp literals have their own special escapes, which will be discussed in Escape sequences section.
>> "cat\tdog".match?(/\t/)
=> true
>> "cat\tdog".match?(/\a/)
=> false
>> greeting = 'hi'
>> /#{greeting} there/
=> /hi there/
>> /#{greeting.upcase} there/
=> /HI there/
>> /#{2**4} apples/
=> /16 apples/
sub and gsub methods
For search and replace, use sub
or gsub
methods. The sub
method will replace only the first occurrence of the match, whereas gsub
will replace all the occurrences. The regexp pattern to match against the input string has to be passed as the first argument. The second argument specifies the string which will replace the portions matched by the pattern.
>> greeting = 'Have a nice weekend'
# replace first occurrence of 'e' with 'E'
>> greeting.sub(/e/, 'E')
=> "HavE a nice weekend"
# replace all occurrences of 'e' with 'E'
>> greeting.gsub(/e/, 'E')
=> "HavE a nicE wEEkEnd"
Use sub!
and gsub!
methods for in-place substitution.
>> word = 'cater'
# this will return a string object, won't modify 'word' variable
>> word.sub(/cat/, 'wag')
=> "wager"
>> word
=> "cater"
# this will modify 'word' variable itself
>> word.sub!(/cat/, 'wag')
=> "wager"
>> word
=> "wager"
Regexp operators
Ruby also provides operators for regexp matching.
-
=~
match operator returns index of the first match andnil
if match is not found -
!~
match operator returnstrue
if string doesn't contain the given regexp andfalse
otherwise -
===
match operator returnstrue
orfalse
similar to thematch?
method
>> sentence = 'This is a sample string'
# can also use: /is/ =~ sentence
>> sentence =~ /is/
=> 2
>> sentence =~ /z/
=> nil
# can also use: /z/ !~ sentence
>> sentence !~ /z/
=> true
>> sentence !~ /is/
=> false
Just like match?
method, both =~
and !~
can be used in a conditional statement.
>> sentence = 'This is a sample string'
>> puts 'hi' if sentence =~ /is/
hi
>> puts 'oh' if sentence !~ /z/
oh
The ===
operator comes in handy with Enumerable methods like grep
, grep_v
, all?
, any?
, etc.
>> sentence = 'This is a sample string'
# regexp literal has to be on LHS and input string on RHS
>> /is/ === sentence
=> true
>> /z/ === sentence
=> false
>> words = %w[cat attempt tattle]
>> words.grep(/tt/)
=> ["attempt", "tattle"]
>> words.all?(/at/)
=> true
>> words.none?(/temp/)
=> false
A key difference from
match?
method is that these operators will also set regexp related global variables.
Exercises
For practice problems, visit Exercises.md file from this book's repository on GitHub.
Top comments (0)