Sundeep

Posted on Mar 25, 2019 • Edited on Jan 11, 2021 • Originally published at learnbyexample.github.io

Ruby Regexp Part 1 - Introduction

#ruby #regex #ebook

Preface & Prerequisites

This series will teach you Ruby regular expressions from basics to advanced levels with plenty of examples and exercises.

You should have prior experience working with Ruby, should know concepts like blocks, string formats, string methods, Enumerable, etc.

You can get free pdf/epub versions of the book using these links:

Why is it needed?

Regular Expressions is a versatile tool for text processing. You'll find them included as part of standard library of most programming languages that are used for scripting purposes. If not, you can usually find a third-party library. Syntax and features of regular expressions vary from language to language. Ruby's offering is based upon the Onigmo regular expressions library.

The String class comes loaded with variety of methods to deal with text. So, what's so special about regular expressions and why would you need it? For learning and understanding purposes, one can view regular expressions as a mini programming language in itself, specialized for text processing. Parts of a regular expression can be saved for future use, analogous to variables and functions. There are ways to perform AND, OR, NOT conditionals. Operations similar to range and string repetition operators and so on.

Here's some common use cases:

Sanitizing a string to ensure that it satisfies a known set of rules. For example, to check if a given string matches password rules.
Filtering or extracting portions on an abstract level like alphabets, numbers, punctuation and so on.
Qualified string replacement. For example, at the start or the end of a string, only whole words, based on surrounding text, etc.

You are likely to be familiar with graphical search and replace tool, like the screenshot shown below from LibreOffice Writer. Match case, Whole words only, Replace and Replace All are some of the basic features supported by regular expressions.

Another real world use case is password validation. The screenshot below is from GitHub sign up page. Performing multiple checks like string length and the type of characters allowed is another core feature of regular expressions.

Here's some articles on regular expressions to know about its history and the type of problems it is suited for.

The true power of regular expressions — it also includes a nice explanation of what regular means in this context
softwareengineering: Is it a must for every programmer to learn regular expressions?
softwareengineering: When you should NOT use Regular Expressions?
codinghorror: Now You Have Two Problems
wikipedia: Regular expression — this article includes discussion on regular expressions as a formal language as well as details on various implementations

Regexp introduction

In this chapter, you'll get to know how to declare and use regexps. For some examples, the equivalent normal string method is shown for comparison. Regular expression features will be covered next chapter onwards. The main focus will be to get you comfortable with syntax and text processing examples. Three methods will be introduced in this chapter. The match? method to search if the input contains a string and the sub and gsub methods to substitute a portion of the input with something else.

This book will use the terms regular expressions and regexp interchangeably.

Regexp documentation

It is always a good idea to know where to find the documentation. Visit ruby-doc: Regexp for information on Regexp class, available methods, syntax, features, examples and more. Here's a quote:

Regular expressions (regexps) are patterns which describe the contents of a string. They're used for testing whether a string contains a given pattern, or extracting the portions that match. They are created with the /pat/ and %r{pat} literals or the Regexp.new constructor.

match? method

First up, a simple example to test whether a string is part of another string or not. Normally, you'd use the include? method and pass a string as argument. For regular expressions, use the match? method and enclose the search string within // delimiters (regexp literal).

>> sentence = 'This is a sample string'

# check if 'sentence' contains the given string argument
>> sentence.include?('is')
=> true
>> sentence.include?('z')
=> false

# check if 'sentence' matches the pattern as described by the regexp argument
>> sentence.match?(/is/)
=> true
>> sentence.match?(/z/)
=> false

The match? method accepts an optional second argument which specifies the index to start searching from.

>> sentence = 'This is a sample string'

>> sentence.match?(/is/, 2)
=> true
>> sentence.match?(/is/, 6)
=> false

Some of the regular expressions functionality is enabled by passing modifiers, represented by an alphabet character. If you have used command line, modifiers are similar to command options, for example grep -i will perform case insensitive matching. It will be discussed in detail in Modifiers chapter. Here's an example for i modifier.

>> sentence = 'This is a sample string'

>> sentence.match?(/this/)
=> false
# 'i' is a modifier to enable case insensitive matching
>> sentence.match?(/this/i)
=> true

Regexp literal reuse and interpolation

The regexp literal can be saved in a variable. This helps to improve code clarity, pass around as method argument, enable reuse, etc.

>> pet = /dog/i
>> pet
=> /dog/i

>> 'They bought a Dog'.match?(pet)
=> true
>> 'A cat crossed their path'.match?(pet)
=> false

Similar to double quoted string literals, you can use interpolation and escape sequences in a regexp literal. See ruby-doc: Strings for syntax details on string escape sequences. Regexp literals have their own special escapes, which will be discussed in Escape sequences section.

>> "cat\tdog".match?(/\t/)
=> true
>> "cat\tdog".match?(/\a/)
=> false

>> greeting = 'hi'
>> /#{greeting} there/
=> /hi there/
>> /#{greeting.upcase} there/
=> /HI there/
>> /#{2**4} apples/
=> /16 apples/

sub and gsub methods

For search and replace, use sub or gsub methods. The sub method will replace only the first occurrence of the match, whereas gsub will replace all the occurrences. The regexp pattern to match against the input string has to be passed as the first argument. The second argument specifies the string which will replace the portions matched by the pattern.

>> greeting = 'Have a nice weekend'

# replace first occurrence of 'e' with 'E'
>> greeting.sub(/e/, 'E')
=> "HavE a nice weekend"
# replace all occurrences of 'e' with 'E'
>> greeting.gsub(/e/, 'E')
=> "HavE a nicE wEEkEnd"

Use sub! and gsub! methods for in-place substitution.

>> word = 'cater'

# this will return a string object, won't modify 'word' variable
>> word.sub(/cat/, 'wag')
=> "wager"
>> word
=> "cater"

# this will modify 'word' variable itself
>> word.sub!(/cat/, 'wag')
=> "wager"
>> word
=> "wager"

Regexp operators

Ruby also provides operators for regexp matching.

=~ match operator returns index of the first match and nil if match is not found
!~ match operator returns true if string doesn't contain the given regexp and false otherwise
=== match operator returns true or false similar to the match? method

>> sentence = 'This is a sample string'

# can also use: /is/ =~ sentence
>> sentence =~ /is/
=> 2
>> sentence =~ /z/
=> nil

# can also use: /z/ !~ sentence
>> sentence !~ /z/
=> true
>> sentence !~ /is/
=> false

Just like match? method, both =~ and !~ can be used in a conditional statement.

>> sentence = 'This is a sample string'

>> puts 'hi' if sentence =~ /is/
hi

>> puts 'oh' if sentence !~ /z/
oh

The === operator comes in handy with Enumerable methods like grep, grep_v, all?, any?, etc.

>> sentence = 'This is a sample string'

# regexp literal has to be on LHS and input string on RHS
>> /is/ === sentence
=> true
>> /z/ === sentence
=> false

>> words = %w[cat attempt tattle]
>> words.grep(/tt/)
=> ["attempt", "tattle"]
>> words.all?(/at/)
=> true
>> words.none?(/temp/)
=> false

A key difference from match? method is that these operators will also set regexp related global variables.

Exercises

For practice problems, visit Exercises.md file from this book's repository on GitHub.

DEV Community

Ruby Regexp Part 1 - Introduction

Preface & Prerequisites

Why is it needed?

Regexp introduction

Regexp documentation

match? method

Regexp literal reuse and interpolation

sub and gsub methods

Regexp operators

Exercises

Top comments (0)

Read next

Must-Know Ruby on Rails Gems for Improved Productivity

Release 0.4 Progress

How to Switch Your Rails Application Database from PostgreSQL to SQLite

Understanding Rails Initializers: Configuring Your Application Easily