10.4. Connection Character Sets and Collations
Several character set and collation system variables relate to a
client's interaction with the server. Some of these have been
mentioned in earlier sections:
The server character set and collation can be determined from
the values of the character_set_server
and
collation_server
system variables.
The character set and collation of the default database can be
determined from the values of the
character_set_database
and
collation_database
system variables.
Additional character set and collation system variables are
involved in handling traffic for the connection between a client
and the server. Every client has connection-related character set
and collation system variables.
Consider what a “connection” is: It's what you make
when you connect to the server. The client sends SQL statements,
such as queries, over the connection to the server. The server
sends responses, such as result sets, over the connection back to
the client. This leads to several questions about character set
and collation handling for client connections, each of which can
be answered in terms of system variables:
-
What character set is the statement in when it leaves the
client?
The server takes the character_set_client
system variable to be the character set in which statements
are sent by the client.
-
What character set should the server translate a statement to
after receiving it?
For this, the server uses the
character_set_connection
and
collation_connection
system variables. It
converts statements sent by the client from
character_set_client
to
character_set_connection
(except for string
literals that have an introducer such as
_latin1
or _utf8
).
collation_connection
is important for
comparisons of literal strings. For comparisons of strings
with column values, collation_connection
does not matter because columns have their own collation,
which has a higher collation precedence.
-
What character set should the server translate to before
shipping result sets or error messages back to the client?
The character_set_results
system variable
indicates the character set in which the server returns query
results to the client. This includes result data such as
column values, and result metadata such as column names.
You can fine-tune the settings for these variables, or you can
depend on the defaults (in which case, you can skip the rest of
this section).
There are two statements that affect the connection character
sets:
SET NAMES 'charset_name
'
SET CHARACTER SET charset_name
SET NAMES
indicates what character set the
client will use to send SQL statements to the server. Thus,
SET NAMES 'cp1251'
tells the server
“future incoming messages from this client are in character
set cp1251
.” It also specifies the
character set that the server should use for sending results back
to the client. (For example, it indicates what character set to
use for column values if you use a SELECT
statement.)
A SET NAMES 'x
'
statement is equivalent to these three statements:
SET character_set_client = x
;
SET character_set_results = x
;
SET character_set_connection = x
;
Setting character_set_connection
to
x
also sets
collation_connection
to the default collation
for x
.
SET CHARACTER SET
is similar to SET
NAMES
but sets the connection character set and
collation to be those of the default database. A SET
CHARACTER SET x
statement is
equivalent to these three statements:
SET character_set_client = x
;
SET character_set_results = x
;
SET collation_connection = @@collation_database;
Setting collation_connection
also sets
character_set_connection
to the character set
associated with the collation.
When a client connects, it sends to the server the name of the
character set that it wants to use. The server uses the name to
set the character_set_client
,
character_set_results
, and
character_set_connection
system variables. In
effect, the server performs a SET NAMES
operation using the character set name.
With the mysql client, it is not necessary to
execute SET NAMES
every time you start up if
you want to use a character set different from the default. You
can add the --default-character-set
option
setting to your mysql statement line, or in
your option file. For example, the following option file setting
changes the three character set variables set to
koi8r
each time you invoke
mysql:
[mysql]
default-character-set=koi8r
Example: Suppose that column1
is defined as
CHAR(5) CHARACTER SET latin2
. If you do not say
SET NAMES
or SET CHARACTER
SET
, then for SELECT column1 FROM t
,
the server sends back all the values for
column1
using the character set that the client
specified when it connected. On the other hand, if you say
SET NAMES 'latin1'
or SET CHARACTER
SET latin1
before issuing the SELECT
statement, the server converts the latin2
values to latin1
just before sending results
back. Conversion may be lossy if there are characters that are not
in both character sets.
If you do not want the server to perform any conversion of result
sets, set character_set_results
to
NULL
:
SET character_set_results = NULL;
Note: Currently, UCS-2 cannot be
used as a client character set, which means that SET
NAMES 'ucs2'
does not work.
To see the values of the character set and collation system
variables that apply to your connection, use these statements:
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';