|
Setting up Japanese charsets is quite difficult. This is mainly because:
-
The Windows character set is extended from the original legacy Japanese
standard (JIS X 0208) and is not standardized. This means that the strictly
standardized implementation cannot support the full Windows character set.
-
Mainly for historical reasons, there are several encoding methods in
Japanese, which are not fully compatible with each other. There are
two major encoding methods. One is the Shift_JIS series used in Windows
and some UNIXes. The other is the EUC-JP series used in most UNIXes
and Linux. Moreover, Samba previously also offered several unique encoding
methods, named CAP and HEX, to keep interoperability with CAP/NetAtalk and
UNIXes that can't use Japanese filenames. Some implementations of the
EUC-JP series can't support the full Windows character set.
-
There are some code conversion tables between Unicode and legacy
Japanese character sets. One is compatible with Windows, another one
is based on the reference of the Unicode consortium, and others are
a mixed implementation. The Unicode consortium does not officially
define any conversion tables between Unicode and legacy character
sets, so there cannot be standard one.
-
The character set and conversion tables available in iconv() depend
on the iconv library that is available. Next to that, the Japanese locale
names may be different on different systems. This means that the value of
the charset parameters depends on the implementation of iconv() you are using.
Though 2-byte fixed UCS-2 encoding is used in Windows internally,
Shift_JIS series encoding is usually used in Japanese environments
as ASCII encoding is in English environments.
|
|