|
In common with Perl and Python, Ruby regular expressions offer some
extensions over traditional Unix regular expressions. All the extensions are
entered between the characters (? and ) . The parentheses
that bracket these extensions are groups, but they do not generate
backreferences: they do not set the values of \1 and $1
etc.
-
(?# comment)
- Inserts a comment into the pattern. The content is ignored during
pattern matching.
-
(?:re)
-
Makes re into a group without generating backreferences. This
is often useful when you need to group a set of constructs but don't
want the group to set the value of
$1 or whatever. In the
example that follows, both patterns match a date with either colons
or spaces between the month, day, and year. The first form stores
the separator character in $2 and $4 , while the second
pattern doesn't store the separator in an external variable.
date = "12/25/01"
|
date =~ %r{(\d+)(/|:)(\d+)(/|:)(\d+)}
|
[$1,$2,$3,$4,$5]
|
� |
["12", "/", "25", "/", "01"]
|
date =~ %r{(\d+)(?:/|:)(\d+)(?:/|:)(\d+)}
|
[$1,$2,$3]
|
� |
["12", "25", "01"]
|
-
(?=re)
-
Matches re at this point, but does not consume it (also known
charmingly as ``zero-width positive lookahead''). This lets
you look forward for the context of a match without affecting
$& . In this example, the scan method matches words
followed by a comma, but the commas are not included in the result.
str = "red, white, and blue"
|
str.scan(/[a-z]+(?=,)/)
|
� |
["red", "white"]
|
-
(?!re)
-
Matches if re does not match at this point. Does not
consume the match (zero-width negative lookahead). For example,
/hot(?!dog)(\w+)/ matches any word that contains the
letters ``hot'' that aren't followed by ``dog'', returning the end
of the word in $1 .
-
(?>re)
-
Nests an independent regular expression within the first regular
expression.
This expression is anchored at the current match position. If it
consumes characters, these will no longer be available to the
higher-level regular expression. This construct therefore inhibits
backtracking, which can be a performance enhancement. For example,
the pattern
/a.*b.*a/ takes exponential time when matched
against a string containing an ``a'' followed by a number of ``b''s,
but with no trailing ``a.'' However, this can be avoided by using a
nested regular expression /a(?>.*b).*a/ . In this form, the
nested expression consumes all the the input string up to the last
possible ``b'' character. When the check for a trailing ``a'' then
fails, there is no need to backtrack, and the pattern match fails promptly.
require "benchmark"
include Benchmark
str = "a" + ("b" * 5000)
bm(8) do |test|
test.report("Normal:") { str =~ /a.*b.*a/ }
test.report("Nested:") { str =~ /a(?>.*b).*a/ }
end
|
produces:
user system total real
Normal: 0.420000 0.000000 0.420000 ( 0.414843)
Nested: 0.000000 0.000000 0.000000 ( 0.001205)
|
-
(?imx)
-
Turns on the corresponding ``i,'' ``m,'' or ``x'' option. If used
inside a group, the effect is limited to that group.
-
(?-imx)
-
Turns off the ``i,'' ``m,'' or ``x'' option.
-
(?imx:re)
-
Turns on the ``i,'' ``m,'' or ``x'' option for re.
-
(?-imx:re)
-
Turns off the ``i,'' ``m,'' or ``x'' option for re.
|
|