27.14 Charsets
Emacs groups all supported characters into disjoint charsets.
Each character code belongs to one and only one charset. For
historical reasons, Emacs typically divides an 8-bit character code
for an extended version of ASCII into two charsets: ASCII, which
covers the codes 0 through 127, plus another charset which covers the
“right-hand part” (the codes 128 and up). For instance, the
characters of Latin-1 include the Emacs charset ascii
plus the
Emacs charset latin-iso8859-1
.
Emacs characters belonging to different charsets may look the same,
but they are still different characters. For example, the letter
‘o’ with acute accent in charset latin-iso8859-1
, used for
Latin-1, is different from the letter ‘o’ with acute accent in
charset latin-iso8859-2
, used for Latin-2.
There are two commands for obtaining information about Emacs
charsets. The command M-x list-charset-chars prompts for a name
of a character set, and displays all the characters in that character
set. The command M-x describe-character-set prompts for a
charset name and displays information about that charset, including
its internal representation within Emacs.
To find out which charset a character in the buffer belongs to,
put point before it and type C-u C-x =.