Ruby Programming - Extensions

Ruby Programming
Previous Page	Home	Next Page

Extensions

Ruby Essentials
eBook

$8.99

eBookFrenzy.com

In common with Perl and Python, Ruby regular expressions offer some extensions over traditional Unix regular expressions. All the extensions are entered between the characters (? and ). The parentheses that bracket these extensions are groups, but they do not generate backreferences: they do not set the values of \1 and $1 etc.

(?# comment)

Inserts a comment into the pattern. The content is ignored during pattern matching.

(?:re)

Makes re into a group without generating backreferences. This is often useful when you need to group a set of constructs but don't want the group to set the value of $1 or whatever. In the example that follows, both patterns match a date with either colons or spaces between the month, day, and year. The first form stores the separator character in $2 and $4, while the second pattern doesn't store the separator in an external variable.

`date = "12/25/01"`
`date =~ %r{(\d+)(/\|:)(\d+)(/\|:)(\d+)}`
`[$1,$2,$3,$4,$5]`	�	`["12", "/", "25", "/", "01"]`
`date =~ %r{(\d+)(?:/\|:)(\d+)(?:/\|:)(\d+)}`
`[$1,$2,$3]`	�	`["12", "25", "01"]`

(?=re)

Matches re at this point, but does not consume it (also known charmingly as ``zero-width positive lookahead''). This lets you look forward for the context of a match without affecting $&. In this example, the scan method matches words followed by a comma, but the commas are not included in the result.

`str = "red, white, and blue"`
`str.scan(/[a-z]+(?=,)/)`	�	`["red", "white"]`

(?!re)

Matches if re does not match at this point. Does not consume the match (zero-width negative lookahead). For example, /hot(?!dog)(\w+)/ matches any word that contains the letters ``hot'' that aren't followed by ``dog'', returning the end of the word in $1.

(?>re)

Nests an independent regular expression within the first regular expression. This expression is anchored at the current match position. If it consumes characters, these will no longer be available to the higher-level regular expression. This construct therefore inhibits backtracking, which can be a performance enhancement. For example, the pattern /a.*b.*a/ takes exponential time when matched against a string containing an ``a'' followed by a number of ``b''s, but with no trailing ``a.'' However, this can be avoided by using a nested regular expression /a(?>.*b).*a/. In this form, the nested expression consumes all the the input string up to the last possible ``b'' character. When the check for a trailing ``a'' then fails, there is no need to backtrack, and the pattern match fails promptly.

require "benchmark"
include Benchmark
str = "a" + ("b" * 5000)
bm(8) do |test|
  test.report("Normal:") { str =~ /a.*b.*a/ }
  test.report("Nested:") { str =~ /a(?>.*b).*a/ }
end

produces:

              user     system      total        real
Normal:   0.420000   0.000000   0.420000 (  0.414843)
Nested:   0.000000   0.000000   0.000000 (  0.001205)

(?imx)

Turns on the corresponding ``i,'' ``m,'' or ``x'' option. If used inside a group, the effect is limited to that group.

(?-imx)

Turns off the ``i,'' ``m,'' or ``x'' option.

(?imx:re)

Turns on the ``i,'' ``m,'' or ``x'' option for re.

(?-imx:re)

Turns off the ``i,'' ``m,'' or ``x'' option for re.

Ruby Programming
Previous Page	Home	Next Page