PostgreSQL contains a comprehensive set of character string operators, from simple text concatenation and string
comparison, to a strong set of regular expression matching. Character string operators are valid upon values of
types char, varchar, and PostgreSQL's own
text type.
The following sections describe the basic comparison and concatenation operators, as well as the implementation of
case-sensitive and case-insensitive
regular expression
operators.
Each of the basic character string comparison and concatenation operators supported by PostgreSQL are listed in
Table 5-1.
Note: Note that the LIKE and ILIKE keywords, which
call to the like() function, are sometimes referred to as string comparison
operators
. These
keywords are covered in the Section called Functions
".
Table 5-1. Basic Character String Operators
Operator
|
Usage
|
Description
|
=
|
'
string
' = '
comparison
'
|
A comparison returning true if
string
matches
comparison
identically
|
!=
|
'
string
' != '
comparison
'
|
A comparison returning true if
string
does not match
comparison
identically
|
<>
|
'
string
' <> '
comparison
'
|
Identical to the != operator
|
<
|
'
string
' < '
comparison
'
|
A comparison returning true if
string
should be sorted alphabetically before
comparison
|
<=
|
'
string
' <= '
comparison
'
|
A comparison returning true if
string
should be sorted alphabetically before
comparison
, or if the values are identical
|
>
|
'
string
' > '
comparison
'
|
A comparison returning true if
string
should be sorted alphabetically after
comparison
|
>=
|
'
string
' >= '
comparison
'
|
A comparison returning true if
string
should be sorted alphabetically after
comparison
, or if the values are identical
|
Each of the string comparison operators returns a Boolean result of either true or false. The alphabetical sorting
referred to by Table 5-1 compares each sequential character in a string, determining
if one character is considered 'greater than' or 'less than' the other. If the leading characters in two strings are at
all identical, each character is checked from left to right until two different characters are found for comparison. In
this sorting scheme, characters are determined to be higher than one another based on their ASCII value, as demonstrated
in the following example:
booktown=#
SELECT letter,
booktown-#
ascii(letter)
booktown-#
FROM text_sorting
booktown-#
ORDER BY letter ASC;
letter | ascii
--------+-------
0 | 48
1 | 49
2 | 50
3 | 51
A | 65
B | 66
C | 67
D | 68
a | 97
b | 98
c | 99
d | 100
(12 rows)
If you are unsure of how a character will be sorted, you can use the ascii()
function to determine the ASCII value of the character. This function is described further in the Section called Functions
."
Example 5-3 illustrates a comparative check on the books table,
and returns all titles whose first letter would be sorted before the letter
D
.
Example 5-3. Comparing strings
booktown=#
SELECT title FROM books
booktown-#
WHERE substr(title, 1, 1) < 'D';
title
-----------------------------
2001: A Space Odyssey
Bartholomew and the Oobleck
(2 rows)
For times when normal equivalence comparisons are inadequate, PostgreSQL has several operators designed
to perform pattern matching against regular expressions. A regular expression is similar to any other string to be matched
against, with the exception that some characters (such as the square braces, pipe, and backslash) have
special meaning in a comparison. If you have used UNIX programs such as
sed
,
grep
, or
perl
, you may already be familiar with this kind of syntax.
Note: For more detailed information on regular expressions in general, refer to O'Reilly's
Mastering Regular Expressions
, by Jeffrey E. F. Friedl.
When a value is compared against a regular expression, the expression itself (or
regex
) may
match both literal character sequences, as well as several variable character sequences. Both literal and variable
sequences may be specified throughout the expression. Example 5-5 illustrates an example
of such a sequence. It searches the Book Town authors table for names
beginning with either
A
or
T
.
Example 5-5. An example regular expression
booktown=#
SELECT first_name, last_name
booktown-#
FROM authors
booktown-#
WHERE first_name ~ '^A|^T';
first_name | last_name
---------------+--------------
Ariel | Denham
Tom | Christiansen
Arthur C. | Clarke
Andrew | Brookins
Theodor Seuss | Geisel
(5 rows)
The ~ symbol is the regular expression operator, within the
WHERE clause, and the regular expression sequence itself in Example 5-5 is
^A|^T
. The special characters in this sequence are the
caret (^), and the pipe (|), while the literal
characters are
A
and
T
. The special characters used in regular expressions are
explained in detail later in this section.
The most important syntactic difference between the use of the like() function and
regular expression operators is that like() uses wild-card symbols (e.g.,
%) at the beginning and end of its expression in order to match a substring. In
contrast, (with the beginning and end-line symbols found in Table 5-3) regular expression operators will implicitly look for the regular expression
sequence
anywhere in the compared character string
unless otherwise instructed.
Table 5-2 lists the regular expression operators.
These operators compare a text value (either an identifier or a constant) to a regular expression. Each operator
provides a Boolean result, depending on the nature of the operator.
Table 5-2. Regular expression comparison operators
Operator
|
Usage
|
Description
|
~
|
'
string
' ~ '
regex
'
|
A regular expression comparison, yielding true if the expression matches
|
!~
|
'
string
' !~ '
regex
'
|
A regular expression comparison, yielding true if the expression
does not
match
|
~*
|
'
string
' ~* '
regex
'
|
A case-insensitive regular expression, yielding true if the expression matches
|
!~*
|
'
string
' !~* '
regex
'
|
not equal to regular expression, case insensitive
|
The special characters available to a regular expression are listed in Table 5-3.
These are the characters which may be used in a regular expression string to represent special meaning.
Table 5-3. Regular expression symbols
Symbol(s)
|
Usage
|
Description
|
^
|
^
expression
|
Matches the beginning (
^
) of the character string
|
$
|
expression
$
|
Matches the end (
$
) of the character string
|
.
|
.
|
Matches any single character
|
[ ]
|
[
abc
]
|
Matches any single character which is between brackets (e.g.,
a
,
b
, or
c
)
|
[^]
|
[^
abc
]
|
Matches any single character not between brackets, following caret (e.g., not
a
,
b
, or
c)
|
[-]
|
[
a
-
z
]
|
Matches any character which is between the range of characters between brackets and separated by the dash (e.g., within
a
through
z
)
|
[^-]
|
[^
a
-
z
]
|
Matches any characters
not
between the range of characters between brackets and separated by the dash (e.g., not within
a
through
z
)
|
?
|
a
?
|
Matches zero or one instances of the character (or regex sequence) preceding it
|
*
|
a
*
|
Matches zero or more instances of the character (or regex sequence) preceding it
|
+
|
a
+
|
Matches one or more instances of the character (or regex sequence) preceding it
|
|
|
expr1
|
expr2
|
Matches character sequences to the left
or
right of it (e.g., either
expr1
, or
expr2
)
|
( )
|
(
expr1
)
expr2
|
Explicitly groups expressions, to clarify precedence of special character symbols
|
Note: Note that in order to use a literal version of any of the characters in Table 5-3, they must be prefixed with
two
backslashes (e.g.,
\\$ represents a literal dollar sign).
A common use of regular expressions is to search for a literal substring within a larger string. This can be
achieved either with the
~
operator, if case is important, or with the
~*
operator if the comparison should be case-insensitive. These operators are each
demonstrated in Example 5-6.
Example 5-6. A Simple Regular Expression Comparison
booktown=#
SELECT title FROM books
booktown-#
WHERE title ~ 'The';
title
----------------------
The Shining
The Cat in the Hat
The Velveteen Rabbit
The Tell-Tale Heart
(4 rows)
booktown=#
SELECT title FROM books
booktown-#
WHERE title ~* 'The';
title
-----------------------------
The Shining
The Cat in the Hat
Bartholomew and the Oobleck
Franklin in the Dark
The Velveteen Rabbit
The Tell-Tale Heart
(6 rows)
As you can see in Example 5-6, two more rows are returned when using the
~*
operator, as it matches not just "the" sequence, but
modification of case on the same sequence (including
the
,
tHe
,
ThE
, and so on).
The same regular expression sequence can be modified to use the
^
symbol, to match only the
character string
The
when it is at the beginning of the comparison string, as shown in Example 5-7. Additionally, the
.*
sequence is then appended, to indicate any
number of characters may match until the next following grouped expression. In this case, the
.*
sequence is followed by a parenthetically grouped pair of strings (
rabbit
and
heart
), which are separated by the
|
symbol, indicating that either of the
strings will be considered a match.
Example 5-7. A more involved regular expression comparison
booktown=#
SELECT title FROM books
booktown-#
WHERE title ~* '^The.*(rabbit|heart)';
title
----------------------
The Velveteen Rabbit
The Tell-Tale Heart
(2 rows)
In Example 5-7, the results should fairly clearly indicate the effect of the regular
expression comparison. Translated into English, the expression
^The.*(rabbit|heart)
states that a
match will be found only if the compared string begins with the character sequence
The
and, any
amount of any characters thereafter, contain either the character sequence
rabbit
, or
heart
. The use of the ~* operator (rather than just the
~ operator) makes the comparison case-insensitive.
Example 5-8 executes an even more complicated regular expression comparison.
Example 5-8. A Complicated Regular Expression Comparison
booktown=#
SELECT title FROM books
booktown-#
WHERE title ~* '(^t.*[ri]t)|(ing$|une$)';
title
----------------------
The Shining
Dune
The Velveteen Rabbit
The Tell-Tale Heart
(4 rows)
booktown=#
The regular expression used in Example 5-8 is a good example of how regular
expressions can be intimidating! Breaking it down an element at a time, you can see that there
are two parenthetically grouped expressions, separated by a
|
symbol. This means that if either of
these expressions are found to match the title, the comparison will be considered a match.
Breaking it down further, you can see that the expression to the left of the
|
symbol
consists of, from left to right: a caret (
^
) followed by the character
t
, a
period (
.
) followed by an asterisk (
*
), and a pair of square brackets
(
[]
) enclosing the characters
r
and
i
, followed by the
character
t
. Translated into English, this sub-expression essentially says that in order to match, the compared string must begin with
the letter
t
, and be followed by a sequence of zero or more characters until either the letter
r
, or
i
is found, which must be followed immediately by the letter
t
. If any of these conditions is not found, the comparison will
not be considered a match.
The expression to the right of the
|
symbol is a bit simpler, consisting of two character
string sequences (
ing
and
une
), each followed by the
$
character, and separated by another
|
symbol. This sub-expression, translated into English,
describes a match as a relationship in which either ends with the value
ing
, or
une
. If
either of these are found, the expression is considered a match, because of the
|
symbol.