Ruby Programming - Regular Expressions

Ruby Programming
Previous Page	Home	Next Page

Regular Expressions

Ruby Essentials
eBook

$8.99

eBookFrenzy.com

Most of Ruby's built-in types will be familiar to all programmers. A majority of languages have strings, integers, floats, arrays, and so on. However, until Ruby came along, regular expression support was generally built into only the so-called scripting languages, such as Perl, Python, and awk. This is a shame: regular expressions, although cryptic, are a powerful tool for working with text.

Entire books have been written about regular expressions (for example, Mastering Regular Expressions ), so we won't try to cover everything in just a short section. Instead, we'll look at just a few examples of regular expressions in action. You'll find full coverage of regular expressions starting on page 56.

A regular expression is simply a way of specifying a pattern of characters to be matched in a string. In Ruby, you typically create a regular expression by writing a pattern between slash characters (/pattern/). And, Ruby being Ruby, regular expressions are of course objects and can be manipulated as such.

For example, you could write a pattern that matches a string containing the text ``Perl'' or the text ``Python'' using the following regular expression.

/Perl|Python/

The forward slashes delimit the pattern, which consists of the two things we're matching, separated by a pipe character (``|''). You can use parentheses within patterns, just as you can in arithmetic expressions, so you could also have written this pattern as

/P(erl|ython)/

You can also specify repetition within patterns. /ab+c/ matches a string containing an ``a'' followed by one or more ``b''s, followed by a ``c''. Change the plus to an asterisk, and /ab*c/ creates a regular expression that matches an ``a'', zero or more ``b''s, and a ``c''.

You can also match one of a group of characters within a pattern. Some common examples are character classes such as ``\s'', which matches a whitespace character (space, tab, newline, and so on), ``\d'', which matches any digit, and ``\w'', which matches any character that may appear in a typical word. The single character ``.'' (a period) matches any character.

We can put all this together to produce some useful regular expressions.

/\d\d:\d\d:\d\d/     # a time such as 12:34:56
/Perl.*Python/       # Perl, zero or more other chars, then Python
/Perl\s+Python/      # Perl, one or more spaces, then Python
/Ruby (Perl|Python)/ # Ruby, a space, and either Perl or Python

Once you have created a pattern, it seems a shame not to use it. The match operator ``=~'' can be used to match a string against a regular expression. If the pattern is found in the string, =~ returns its starting position, otherwise it returns nil. This means you can use regular expressions as the condition in if and while statements. For example, the following code fragment writes a message if a string contains the text 'Perl' or 'Python'.

if line =~ /Perl|Python/
  puts "Scripting language mentioned: #{line}"
end

The part of a string matched by a regular expression can also be replaced with different text using one of Ruby's substitution methods.

line.sub(/Perl/, 'Ruby')    # replace first 'Perl' with 'Ruby'
line.gsub(/Python/, 'Ruby') # replace every 'Python' with 'Ruby'

We'll have a lot more to say about regular expressions as we go through the book.

Ruby Programming
Previous Page	Home	Next Page