5.10.3. Adding a New Character Set
This section discusses the procedure for adding a new character
set to MySQL. You must have a MySQL source distribution to use
these instructions. To choose the proper procedure, determine
whether the character set is simple or complex:
If the character set does not need to use special string
collating routines for sorting and does not need multi-byte
character support, it is simple.
If it needs either of those features, it is complex.
For example, latin1
and
danish
are simple character sets, whereas
big5
and czech
are complex
character sets.
In the following instructions, the name of the character set is
represented by MYSET
.
For a simple character set, do the following:
Add MYSET
to the end of the
sql/share/charsets/Index
file. Assign a
unique number to it.
-
Create the file
sql/share/charsets/MYSET
.conf
.
(You can use a copy of
sql/share/charsets/latin1.conf
as the
basis for this file.)
The syntax for the file is very simple:
Comments start with a ‘#
’
character and continue to the end of the line.
Words are separated by arbitrary amounts of whitespace.
When defining the character set, every word must be a
number in hexadecimal format.
The ctype
array takes up the first
257 words. The to_lower[]
,
to_upper[]
and
sort_order[]
arrays take up 256 words
each after that.
See Section 5.10.4, “The Character Definition Arrays”.
Add the character set name to the
CHARSETS_AVAILABLE
and
COMPILED_CHARSETS
lists in
configure.in
.
Reconfigure, recompile, and test.
For a complex character set, do the following:
Create the file
strings/ctype-MYSET
.c
in the MySQL source distribution.
Add MYSET
to the end of the
sql/share/charsets/Index
file. Assign a
unique number to it.
Look at one of the existing ctype-*.c
files (such as strings/ctype-big5.c
) to
see what needs to be defined. Note that the arrays in your
file must have names like
ctype_MYSET
,
to_lower_MYSET
,
and so on. These correspond to the arrays for a simple
character set. See Section 5.10.4, “The Character Definition Arrays”.
-
Near the top of the file, place a special comment like this:
/*
* This comment is parsed by configure to create ctype.c,
* so don't change it unless you know what you are doing.
*
* .configure. number_MYSET
=MYNUMBER
* .configure. strxfrm_multiply_MYSET
=N
* .configure. mbmaxlen_MYSET
=N
*/
The configure program uses this comment
to include the character set into the MySQL library
automatically.
The strxfrm_multiply
and
mbmaxlen
lines are explained in the
following sections. You need include them only if you need
the string collating functions or the multi-byte character
set functions, respectively.
-
You should then create some of the following functions:
my_strncoll_MYSET
()
my_strcoll_MYSET
()
my_strxfrm_MYSET
()
my_like_range_MYSET
()
See Section 5.10.5, “String Collating Support”.
Add the character set name to the
CHARSETS_AVAILABLE
and
COMPILED_CHARSETS
lists in
configure.in
.
Reconfigure, recompile, and test.
The sql/share/charsets/README
file includes
additional instructions.
If you want to have the character set included in the MySQL
distribution, mail a patch to the MySQL
internals
mailing list. See
Section 1.7.1, “MySQL Mailing Lists”.