Most of Ruby's built-in types will be familiar to all programmers. A
majority of languages have strings, integers, floats, arrays, and so
on. However, until Ruby came along, regular expression support was
generally built into only the so-called scripting languages, such as
Perl, Python, and awk. This is a shame: regular expressions, although
cryptic, are a powerful tool for working with text.
Entire books have been written about regular expressions (for example,
Mastering Regular Expressions ), so we
won't try to cover everything in just a short section. Instead, we'll
look at just a few examples of regular expressions in action. You'll
find full coverage of regular expressions starting
on page 56.
A regular expression is simply a way of specifying a
pattern of
characters to be matched in a string. In Ruby, you typically create a
regular expression by writing a pattern between slash characters
(/
pattern/). And, Ruby being Ruby, regular expressions are of
course objects and can be manipulated as such.
For example, you could write a pattern that matches a string
containing the text ``Perl'' or the text ``Python'' using the
following regular expression.
The forward slashes delimit the pattern, which consists of the two
things we're matching, separated by a pipe character (``
|
'').
You can use parentheses within patterns, just as you can in arithmetic
expressions, so you could also have written this pattern as
You can also specify repetition within patterns.
/ab+c/
matches a
string containing an ``a'' followed by one or more ``b''s, followed by
a ``c''. Change the plus to an asterisk, and
/ab*c/
creates a
regular expression that matches an ``a'', zero or more ``b''s, and a
``c''.
You can also match one of a group of characters within a pattern. Some
common examples are character classes such as ``
\s
'', which
matches a whitespace character (space, tab, newline, and so on),
``
\d
'', which matches any digit, and ``
\w
'', which matches
any character that may appear in a typical word. The single character
``.'' (a period) matches any character.
We can put all this together to produce some useful regular
expressions.
/\d\d:\d\d:\d\d/ # a time such as 12:34:56
/Perl.*Python/ # Perl, zero or more other chars, then Python
/Perl\s+Python/ # Perl, one or more spaces, then Python
/Ruby (Perl|Python)/ # Ruby, a space, and either Perl or Python
|
Once you have created a pattern, it seems a shame not to use it. The
match operator ``
=~
'' can be used to match a string against a
regular expression. If the pattern is found in the string,
=~
returns its starting position, otherwise it returns
nil
. This means
you can use regular expressions as the condition in
if
and
while
statements. For example, the following code fragment writes
a message if a string contains the text 'Perl' or 'Python'.
if line =~ /Perl|Python/
puts "Scripting language mentioned: #{line}"
end
|
The part of a string matched by a regular expression can also be
replaced with different text using one of Ruby's substitution methods.
line.sub(/Perl/, 'Ruby') # replace first 'Perl' with 'Ruby'
line.gsub(/Python/, 'Ruby') # replace every 'Python' with 'Ruby'
|
We'll have a lot more to say about regular expressions as we go
through the book.