Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

Ruby Programming
Previous Page Home Next Page

Extensions

In common with Perl and Python, Ruby regular expressions offer some extensions over traditional Unix regular expressions. All the extensions are entered between the characters (? and ). The parentheses that bracket these extensions are groups, but they do not generate backreferences: they do not set the values of \1 and $1 etc.

(?# comment)
Inserts a comment into the pattern. The content is ignored during pattern matching.

(?:re)
Makes re into a group without generating backreferences. This is often useful when you need to group a set of constructs but don't want the group to set the value of $1 or whatever. In the example that follows, both patterns match a date with either colons or spaces between the month, day, and year. The first form stores the separator character in $2 and $4, while the second pattern doesn't store the separator in an external variable.

date = "12/25/01"
date =~ %r{(\d+)(/|:)(\d+)(/|:)(\d+)}
[$1,$2,$3,$4,$5] ["12", "/", "25", "/", "01"]
date =~ %r{(\d+)(?:/|:)(\d+)(?:/|:)(\d+)}
[$1,$2,$3] ["12", "25", "01"]

(?=re)
Matches re at this point, but does not consume it (also known charmingly as ``zero-width positive lookahead''). This lets you look forward for the context of a match without affecting $&. In this example, the scan method matches words followed by a comma, but the commas are not included in the result.

str = "red, white, and blue"
str.scan(/[a-z]+(?=,)/) ["red", "white"]

(?!re)
Matches if re does not match at this point. Does not consume the match (zero-width negative lookahead). For example, /hot(?!dog)(\w+)/ matches any word that contains the letters ``hot'' that aren't followed by ``dog'', returning the end of the word in $1.

(?>re)
Nests an independent regular expression within the first regular expression. This expression is anchored at the current match position. If it consumes characters, these will no longer be available to the higher-level regular expression. This construct therefore inhibits backtracking, which can be a performance enhancement. For example, the pattern /a.*b.*a/ takes exponential time when matched against a string containing an ``a'' followed by a number of ``b''s, but with no trailing ``a.'' However, this can be avoided by using a nested regular expression /a(?>.*b).*a/. In this form, the nested expression consumes all the the input string up to the last possible ``b'' character. When the check for a trailing ``a'' then fails, there is no need to backtrack, and the pattern match fails promptly.

require "benchmark"
include Benchmark
str = "a" + ("b" * 5000)
bm(8) do |test|
  test.report("Normal:") { str =~ /a.*b.*a/ }
  test.report("Nested:") { str =~ /a(?>.*b).*a/ }
end
produces:
              user     system      total        real
Normal:   0.420000   0.000000   0.420000 (  0.414843)
Nested:   0.000000   0.000000   0.000000 (  0.001205)

(?imx)
Turns on the corresponding ``i,'' ``m,'' or ``x'' option. If used inside a group, the effect is limited to that group.

(?-imx)
Turns off the ``i,'' ``m,'' or ``x'' option.

(?imx:re)
Turns on the ``i,'' ``m,'' or ``x'' option for re.

(?-imx:re)
Turns off the ``i,'' ``m,'' or ``x'' option for re.

Ruby Programming
Previous Page Home Next Page

 
 
  Published under the terms of the Open Publication License Design by Interspire