Earlier we noted that the sequences
\1
,
\2
,
and so on are
available in the pattern, standing for the
nth group matched so
far. The same sequences are available in the second argument of
sub
and
gsub
.
"fred:smith".sub(/(\w+):(\w+)/, '\2, \1')
|
� |
"smith, fred"
|
"nercpyitno".gsub(/(.)(.)/, '\2\1')
|
� |
"encryption"
|
There are additional backslash sequences that work in substitution
strings:
\&
(last match),
\+
(last matched group),
\`
(string prior to match),
\'
(string after match), and
\\
(a literal backslash).
It gets confusing if you want to include a literal backslash in a
substitution. The obvious thing is to write
Clearly, this code is trying to replace each backslash in
str
with two. The programmer doubled up the backslashes in the replacement
text, knowing that they'd be converted to ``
\\
'' in syntax
analysis. However, when the substitution occurs, the regular
expression engine performs another pass through the string, converting
``
\\
'' to ``
\
'', so the net effect is to replace
each single backslash with another single backslash. You need to write
gsub(/\\/, '\\\\\\\\')
!
str = 'a\b\c'
|
� |
"a\b\c"
|
str.gsub(/\\/, '\\\\\\\\')
|
� |
"a\\b\\c"
|
However, using the fact that
\&
is replaced by the matched
string, you could also write
str = 'a\b\c'
|
� |
"a\b\c"
|
str.gsub(/\\/, '\&\&')
|
� |
"a\\b\\c"
|
If you use the block form of
gsub
, the string
for substitution is analyzed only once (during the syntax pass) and
the result is what you intended.
str = 'a\b\c'
|
� |
"a\b\c"
|
str.gsub(/\\/) { '\\\\' }
|
� |
"a\\b\\c"
|
Finally, as an example of the wonderful expressiveness of combining
regular expressions with code blocks, consider the following code
fragment from the CGI library module, written by Wakou Aoyama. The code takes a string containing
HTML
escape sequences and converts it into normal ASCII. Because it was
written for a Japanese audience, it uses the ``n'' modifier on the
regular expressions, which turns off wide-character processing. It
also illustrates Ruby's
case
expression, which we discuss
starting on page 81.
def unescapeHTML(string)
str = string.dup
str.gsub!(/&(.*?);/n) {
match = $1.dup
case match
when /\Aamp\z/ni then '&'
when /\Aquot\z/ni then '"'
when /\Agt\z/ni then '>'
when /\Alt\z/ni then '<'
when /\A#(\d+)\z/n then Integer($1).chr
when /\A#x([0-9a-f]+)\z/ni then $1.hex.chr
end
}
str
end
puts unescapeHTML("1<2 && 4>3")
puts unescapeHTML(""A" = A = A")
|
produces: