4.1. Regular expressions
4.1.1. What are regular expressions?
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions.
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash.
4.1.2. Regular expression metacharacters
A regular expression may be followed by one of several repetition operators (metacharacters):
Table 4-1. Regular expression operators
Operator | Effect |
---|
. | Matches any single character. |
? | The preceding item is optional and will be matched, at most, once. |
* | The preceding item will be matched zero or more times. |
+ | The preceding item will be matched one or more times. |
{N} | The preceding item is matched exactly N times. |
{N,} | The preceding item is matched N or more times. |
{N,M} | The preceding item is matched at least N times, but not more than M times. |
- | represents the range if it's not first or last in a list or the ending point of a range in a list. |
^ | Matches the empty string at the beginning of a line; also represents the characters not in the range of a list. |
$ | Matches the empty string at the end of a line. |
\b | Matches the empty string at the edge of a word. |
\B | Matches the empty string provided it's not at the edge of a word. |
\< | Match the empty string at the beginning of word. |
\> | Match the empty string at the end of word. |
Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions.
Two regular expressions may be joined by the infix operator "|"; the resulting regular expression matches any string matching either subexpression.
Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole subexpression may be enclosed in parentheses to override these precedence rules.
4.1.3. Basic versus extended regular expressions
In basic regular expressions the metacharacters "?", "+", "{", "|", "(", and ")" lose their special meaning; instead use the backslashed versions "\?", "\+", "\{", "\|", "\(", and "\)".
Check in your system documentation whether commands using regular expressions support extended expressions.