|
|
|
|
Quantifiers
A quantifier describes the way that a pattern absorbs input text:
- Greedy: Quantifiers
are greedy unless otherwise altered. A greedy expression finds as many possible
matches for the pattern as possible. A typical cause of problems is to assume
that your pattern will only match the first possible group of characters, when
it’s actually greedy and will keep going.
- Reluctant: Specified
with a question mark, this quantifier matches the minimum necessary number of
characters to satisfy the pattern. Also called lazy, minimal
matching, non-greedy, or ungreedy.
- Possessive: Currently
only available in Java (not in other languages), and it is more advanced, so you
probably won’t use it right away. As a regular expression is applied to a
string, it generates many states so that it can backtrack if the match fails.
Possessive quantifiers do not keep those intermediate states, and thus prevent
backtracking. They can be used to prevent a regular expression from running away
and also to make it execute more efficiently.
|
Greedy
|
Reluctant
|
Possessive
|
Matches
|
X?
|
X??
|
X?+
|
X, one or none
|
X*
|
X*?
|
X*+
|
X, zero or more
|
X+
|
X+?
|
X++
|
X, one or more
|
X{n}
|
X{n}?
|
X{n}+
|
X, exactly n times
|
X{n,}
|
X{n,}?
|
X{n,}+
|
X, at least n times
|
X{n,m}
|
X{n,m}?
|
X{n,m}+
|
X, at least n but not more than m times
|
You should be very aware that the expression ‘X’ will often need to be surrounded in parentheses for it to work the way you desire. For example:
abc+
Might seem like it would match the sequence ‘abc’ one or more times, and if you apply it to the input string ‘abcabcabc’, you will in fact get three matches. However, the expression actually says “match ‘ab’ followed by one or more occurrences of ‘c’.” To match the entire string ‘abc’ one or more times, you must say:
(abc)+
You can easily be fooled when using regular expressions; it’s a new language, on top of Java.
CharSequence
JDK 1.4 defines a new interface called CharSequence, which establishes a definition of a character sequence abstracted from the String or StringBuffer classes:
interface CharSequence {
charAt(int i);
length();
subSequence(int start, int end);
toString();
}
The String, StringBuffer, and CharBuffer classes have been modified to implement this new CharSequence interface. Many regular expression operations take CharSequence arguments.
|
|