26.3.3.4. Using Character Sets and Unicode
All strings sent from the JDBC driver to the server are
converted automatically from native Java Unicode form to the
client character encoding, including all queries sent via
Statement.execute()
,
Statement.executeUpdate()
,
Statement.executeQuery()
as well as all
PreparedStatement
and
CallableStatement
parameters with the exclusion of parameters set using
setBytes()
,
setBinaryStream()
,
setAsciiStream()
,
setUnicodeStream()
and
setBlob()
.
Prior to MySQL Server 4.1, Connector/J supported a single
character encoding per connection, which could either be
automatically detected from the server configuration, or could
be configured by the user through the
"useUnicode"
and
"characterEncoding
" properties.
Starting with MySQL Server 4.1, Connector/J supports a single
character encoding between client and server, and any number of
character encodings for data returned by the server to the
client in ResultSets
.
The character encoding between client and server is
automatically detected upon connection. The encoding used by the
driver is specified on the server via the
character_set
system variable for server
versions older than 4.1.0 and
character_set_server
for server versions
4.1.0 and newer. For more information, see
Section 10.3.1, “Server Character Set and Collation”.
To override the automatically-detected encoding on the client
side, use the characterEncoding
property
in the URL used to connect to the server.
When specifying character encodings on the client side,
Java-style names should be used. The following table lists
Java-style names for MySQL character sets:
Table 26.4. MySQL to Java Encoding Name Translations
MySQL Character Set Name |
Java-Style Character Encoding Name |
usa7 |
US-ASCII |
big5 |
Big5 |
gbk |
GBK |
sjis |
SJIS (or Cp932 or MS932 for MySQL Server < 4.1.11) |
cp932 |
Cp932 or MS932 (MySQL Server > 4.1.11) |
gb2312 |
EUC_CN |
ujis |
EUC_JP |
euc_kr |
EUC_KR |
latin1 |
ISO8859_1 |
latin1_de |
ISO8859_1 |
german1 |
ISO8859_1 |
danish |
ISO8859_1 |
latin2 |
ISO8859_2 |
czech |
ISO8859_2 |
hungarian |
ISO8859_2 |
croat |
ISO8859_2 |
greek |
ISO8859_7 |
hebrew |
ISO8859_8 |
latin5 |
ISO8859_9 |
latvian |
ISO8859_13 |
latvian1 |
ISO8859_13 |
estonia |
ISO8859_13 |
dos |
Cp437 |
pclatin2 |
Cp852 |
cp866 |
Cp866 |
koi8_ru |
KOI8_R |
tis620 |
TIS620 |
win1250 |
Cp1250 |
win1250ch |
Cp1250 |
win1251 |
Cp1251 |
cp1251 |
Cp1251 |
win1251ukr |
Cp1251 |
cp1257 |
Cp1257 |
macroman |
MacRoman |
macce |
MacCentralEurope |
utf8 |
UTF-8 |
ucs2 |
UnicodeBig |
Warning
Do not issue the query 'set names' with Connector/J, as the
driver will not detect that the character set has changed, and
will continue to use the character set detected during the
initial connection setup.
To allow multiple character sets to be sent from the client, the
"UTF-8" encoding should be used, either by configuring "utf8" as
the default server character set, or by configuring the JDBC
driver to use "UTF-8" through the
characterEncoding
property.