Databases - Practical PostgreSQL - Character String Operators

Databases - Practical PostgreSQL
Previous Page	Home	Next Page

Character String Operators

PostgreSQL contains a comprehensive set of character string operators, from simple text concatenation and string comparison, to a strong set of regular expression matching. Character string operators are valid upon values of types char, varchar, and PostgreSQL's own text type.

The following sections describe the basic comparison and concatenation operators, as well as the implementation of case-sensitive and case-insensitive regular expression operators.

Basic comparison

Each of the basic character string comparison and concatenation operators supported by PostgreSQL are listed in Table 5-1.

Note: Note that the LIKE and ILIKE keywords, which call to the like() function, are sometimes referred to as string comparison operators . These keywords are covered in the Section called Functions ".

Table 5-1. Basic Character String Operators

Operator	Usage	Description
`=`	`' string ' = ' comparison '`	A comparison returning true if `string` matches `comparison` identically
`!=`	`' string ' != ' comparison '`	A comparison returning true if `string` does not match `comparison` identically
`<>`	`' string ' <> ' comparison '`	Identical to the `!=` operator
`<`	`' string ' < ' comparison '`	A comparison returning true if `string` should be sorted alphabetically before `comparison`
`<=`	`' string ' <= ' comparison '`	A comparison returning true if `string` should be sorted alphabetically before `comparison`, or if the values are identical
`>`	`' string ' > ' comparison '`	A comparison returning true if `string` should be sorted alphabetically after `comparison`
`>=`	`' string ' >= ' comparison '`	A comparison returning true if `string` should be sorted alphabetically after `comparison`, or if the values are identical

Each of the string comparison operators returns a Boolean result of either true or false. The alphabetical sorting referred to by Table 5-1 compares each sequential character in a string, determining if one character is considered 'greater than' or 'less than' the other. If the leading characters in two strings are at all identical, each character is checked from left to right until two different characters are found for comparison. In this sorting scheme, characters are determined to be higher than one another based on their ASCII value, as demonstrated in the following example:

booktown=# 
SELECT letter,

booktown-# 
       ascii(letter)

booktown-# 
       FROM text_sorting

booktown-# 
       ORDER BY letter ASC;

 letter | ascii
--------+-------
 0      |    48
 1      |    49
 2      |    50
 3      |    51
 A      |    65
 B      |    66
 C      |    67
 D      |    68
 a      |    97
 b      |    98
 c      |    99
 d      |   100
(12 rows)

If you are unsure of how a character will be sorted, you can use the ascii() function to determine the ASCII value of the character. This function is described further in the Section called Functions ." Example 5-3 illustrates a comparative check on the books table, and returns all titles whose first letter would be sorted before the letter D .

Example 5-3. Comparing strings

booktown=# 
SELECT title FROM books

booktown-# 
             WHERE substr(title, 1, 1) < 'D';

            title
-----------------------------
 2001: A Space Odyssey
 Bartholomew and the Oobleck
(2 rows)

String concatenation

The text concatenation operator (||) is an invaluable tool for formatting output results. Like all operators, it may be used anywhere a constant value is allowed in a SQL statement. Values may be repeatedly concatenated in a single statement by simply appending the || operator after each appended string constant or identifier.

As an example, it might be used in the WHERE clause in order to constrain rows by comparing against a dynamically generated character string. Example 5-4 demonstrates how to use this operator.

Example 5-4. Concatenating strings

booktown=# 
SELECT 'The Title: ' || title || ', by ' ||

booktown-# 
       first_name || ' ' || last_name AS book_info

booktown-# 
       FROM books NATURAL JOIN authors AS a (author_id)

booktown-# 
       LIMIT 3;

                       book_info
--------------------------------------------------------
 The Title: The Shining, by Stephen King
 The Title: Dune, by Frank Herbert
 The Title: 2001: A Space Odyssey, by Arthur C. Clarke
(3 rows)

Regular expression matching operators

For times when normal equivalence comparisons are inadequate, PostgreSQL has several operators designed to perform pattern matching against regular expressions. A regular expression is similar to any other string to be matched against, with the exception that some characters (such as the square braces, pipe, and backslash) have special meaning in a comparison. If you have used UNIX programs such as sed , grep , or perl , you may already be familiar with this kind of syntax.

Note: For more detailed information on regular expressions in general, refer to O'Reilly's Mastering Regular Expressions , by Jeffrey E. F. Friedl.

When a value is compared against a regular expression, the expression itself (or regex ) may match both literal character sequences, as well as several variable character sequences. Both literal and variable sequences may be specified throughout the expression. Example 5-5 illustrates an example of such a sequence. It searches the Book Town authors table for names beginning with either A or T .

Example 5-5. An example regular expression

booktown=# 
SELECT first_name, last_name

booktown-# 
       FROM authors

booktown-# 
       WHERE first_name ~ '^A|^T';

  first_name   |  last_name
---------------+--------------
 Ariel         | Denham
 Tom           | Christiansen
 Arthur C.     | Clarke
 Andrew        | Brookins
 Theodor Seuss | Geisel
(5 rows)

The ~ symbol is the regular expression operator, within the WHERE clause, and the regular expression sequence itself in Example 5-5 is ^A|^T . The special characters in this sequence are the caret (^), and the pipe (|), while the literal characters are A and T . The special characters used in regular expressions are explained in detail later in this section.

The most important syntactic difference between the use of the like() function and regular expression operators is that like() uses wild-card symbols (e.g., %) at the beginning and end of its expression in order to match a substring. In contrast, (with the beginning and end-line symbols found in Table 5-3) regular expression operators will implicitly look for the regular expression sequence anywhere in the compared character string unless otherwise instructed.

Table 5-2 lists the regular expression operators. These operators compare a text value (either an identifier or a constant) to a regular expression. Each operator provides a Boolean result, depending on the nature of the operator.

Table 5-2. Regular expression comparison operators

Operator	Usage	Description
`~`	`' string ' ~ ' regex '`	A regular expression comparison, yielding true if the expression matches
!~	`' string ' !~ ' regex '`	A regular expression comparison, yielding true if the expression does not match
~*	`' string ' ~* ' regex '`	A case-insensitive regular expression, yielding true if the expression matches
!~*	`' string ' !~* ' regex '`	not equal to regular expression, case insensitive

The special characters available to a regular expression are listed in Table 5-3. These are the characters which may be used in a regular expression string to represent special meaning.

Table 5-3. Regular expression symbols

Symbol(s)	Usage	Description
^	^ `expression`	Matches the beginning ( ^ ) of the character string
$	`expression` $	Matches the end ( $ ) of the character string
.	.	Matches any single character
[ ]	[ `abc` ]	Matches any single character which is between brackets (e.g., a , b , or c )
[^]	[^`abc` ]	Matches any single character not between brackets, following caret (e.g., not a , b , or c)
[-]	[ `a`-`z` ]	Matches any character which is between the range of characters between brackets and separated by the dash (e.g., within a through z )
[^-]	[^`a`-`z` ]	Matches any characters not between the range of characters between brackets and separated by the dash (e.g., not within a through z )
?	`a` ?	Matches zero or one instances of the character (or regex sequence) preceding it
*	`a` *	Matches zero or more instances of the character (or regex sequence) preceding it
+	`a`+	Matches one or more instances of the character (or regex sequence) preceding it
\|	`expr1`\| `expr2`	Matches character sequences to the left or right of it (e.g., either `expr1`, or `expr2`)
( )	(`expr1`) `expr2`	Explicitly groups expressions, to clarify precedence of special character symbols

Note: Note that in order to use a literal version of any of the characters in Table 5-3, they must be prefixed with two backslashes (e.g., \\$ represents a literal dollar sign).

A common use of regular expressions is to search for a literal substring within a larger string. This can be achieved either with the ~ operator, if case is important, or with the ~* operator if the comparison should be case-insensitive. These operators are each demonstrated in Example 5-6.

Example 5-6. A Simple Regular Expression Comparison

booktown=# 
SELECT title FROM books

booktown-# 
       WHERE title ~ 'The';

        title
----------------------
 The Shining
 The Cat in the Hat
 The Velveteen Rabbit
 The Tell-Tale Heart
(4 rows)

booktown=# 
SELECT title FROM books

booktown-# 
       WHERE title ~* 'The';

            title
-----------------------------
 The Shining
 The Cat in the Hat
 Bartholomew and the Oobleck
 Franklin in the Dark
 The Velveteen Rabbit
 The Tell-Tale Heart
(6 rows)

As you can see in Example 5-6, two more rows are returned when using the ~* operator, as it matches not just "the" sequence, but modification of case on the same sequence (including the , tHe , ThE , and so on).

The same regular expression sequence can be modified to use the ^ symbol, to match only the character string The when it is at the beginning of the comparison string, as shown in Example 5-7. Additionally, the .* sequence is then appended, to indicate any number of characters may match until the next following grouped expression. In this case, the .* sequence is followed by a parenthetically grouped pair of strings ( rabbit and heart ), which are separated by the | symbol, indicating that either of the strings will be considered a match.

Example 5-7. A more involved regular expression comparison

booktown=# 
SELECT title FROM books

booktown-# 
       WHERE title ~* '^The.*(rabbit|heart)';

        title
----------------------
 The Velveteen Rabbit
 The Tell-Tale Heart
(2 rows)

In Example 5-7, the results should fairly clearly indicate the effect of the regular expression comparison. Translated into English, the expression ^The.*(rabbit|heart) states that a match will be found only if the compared string begins with the character sequence The and, any amount of any characters thereafter, contain either the character sequence rabbit , or heart . The use of the ~* operator (rather than just the ~ operator) makes the comparison case-insensitive.

Example 5-8 executes an even more complicated regular expression comparison.

Example 5-8. A Complicated Regular Expression Comparison

booktown=# 
SELECT title FROM books

booktown-# 
       WHERE title ~* '(^t.*[ri]t)|(ing$|une$)';

        title
----------------------
 The Shining
 Dune
 The Velveteen Rabbit
 The Tell-Tale Heart
(4 rows)

booktown=#

The regular expression used in Example 5-8 is a good example of how regular expressions can be intimidating! Breaking it down an element at a time, you can see that there are two parenthetically grouped expressions, separated by a | symbol. This means that if either of these expressions are found to match the title, the comparison will be considered a match.

Breaking it down further, you can see that the expression to the left of the | symbol consists of, from left to right: a caret ( ^ ) followed by the character t , a period ( . ) followed by an asterisk ( * ), and a pair of square brackets ( [] ) enclosing the characters r and i , followed by the character t . Translated into English, this sub-expression essentially says that in order to match, the compared string must begin with the letter t , and be followed by a sequence of zero or more characters until either the letter r , or i is found, which must be followed immediately by the letter t . If any of these conditions is not found, the comparison will not be considered a match.

The expression to the right of the | symbol is a bit simpler, consisting of two character string sequences ( ing and une ), each followed by the $ character, and separated by another | symbol. This sub-expression, translated into English, describes a match as a relationship in which either ends with the value ing , or une . If either of these are found, the expression is considered a match, because of the | symbol.

Databases - Practical PostgreSQL
Previous Page	Home	Next Page