Single-Byte Character Support

Next: Charsets, Previous: Undisplayable Characters, Up: International

27.13 Single-byte Character Set Support

The ISO 8859 Latin-n character sets define character codes in the range 0240 to 0377 octal (160 to 255 decimal) to handle the accented letters and punctuation needed by various European languages (and some non-European ones). If you disable multibyte characters, Emacs can still handle one of these character codes at a time. To specify which of these codes to use, invoke M-x set-language-environment and specify a suitable language environment such as ‘Latin-n’.

For more information about unibyte operation, see Enabling Multibyte. Note particularly that you probably want to ensure that your initialization files are read as unibyte if they contain non-ASCII characters.

Emacs can also display those characters, provided the terminal or font in use supports them. This works automatically. Alternatively, if you are using a window system, Emacs can also display single-byte characters through fontsets, in effect by displaying the equivalent multibyte characters according to the current language environment. To request this, set the variable unibyte-display-via-language-environment to a non-nil value.

If your terminal does not support display of the Latin-1 character set, Emacs can display these characters as ASCII sequences which at least give you a clear idea of what the characters are. To do this, load the library iso-ascii. Similar libraries for other Latin-n character sets could be implemented, but we don't have them yet.

Normally non-ISO-8859 characters (decimal codes between 128 and 159 inclusive) are displayed as octal escapes. You can change this for non-standard “extended” versions of ISO-8859 character sets by using the function standard-display-8bit in the disp-table library.

There are two ways to input single-byte non-ASCII characters:

You can use an input method for the selected language environment. See Input Methods. When you use an input method in a unibyte buffer, the non-ASCII character you specify with it is converted to unibyte.
If your keyboard can generate character codes 128 (decimal) and up, representing non-ASCII characters, you can type those character codes directly.
On a window system, you should not need to do anything special to use these keys; they should simply work. On a text-only terminal, you should use the command M-x set-keyboard-coding-system or the variable keyboard-coding-system to specify which coding system your keyboard uses (see Specify Coding). Enabling this feature will probably require you to use ESC to type Meta characters; however, on a console terminal or in xterm, you can arrange for Meta to be converted to ESC and still be able type 8-bit characters present directly on the keyboard or using Compose or AltGr keys. See User Input.
For Latin-1 only, you can use the key C-x 8 as a “compose character” prefix for entry of non-ASCII Latin-1 printing characters. C-x 8 is good for insertion (in the minibuffer as well as other buffers), for searching, and in any other context where a key sequence is allowed.
C-x 8 works by loading the iso-transl library. Once that library is loaded, the <ALT> modifier key, if the keyboard has one, serves the same purpose as C-x 8: use <ALT> together with an accent character to modify the following letter. In addition, if the keyboard has keys for the Latin-1 “dead accent characters,” they too are defined to compose with the following character, once iso-transl is loaded.
Use C-x 8 C-h to list all the available C-x 8 translations.