27.9 Specifying a Coding System
In cases where Emacs does not automatically choose the right coding
system, you can use these commands to specify one:
- C-x <RET> f coding <RET>
- Use coding system coding for saving or revisiting the visited
file in the current buffer.
- C-x <RET> c coding <RET>
- Specify coding system coding for the immediately following
command.
- C-x <RET> r coding <RET>
- Revisit the current file using the coding system coding.
- C-x <RET> k coding <RET>
- Use coding system coding for keyboard input.
- C-x <RET> t coding <RET>
- Use coding system coding for terminal output.
- C-x <RET> p input-coding <RET> output-coding <RET>
- Use coding systems input-coding and output-coding for
subprocess input and output in the current buffer.
- C-x <RET> x coding <RET>
- Use coding system coding for transferring selections to and from
other programs through the window system.
- C-x <RET> F coding <RET>
- Use coding system coding for encoding and decoding file
names. This affects the use of non-ASCII characters in file
names. It has no effect on reading and writing the contents of
files.
- C-x <RET> X coding <RET>
- Use coding system coding for transferring one
selection—the next one—to or from the window system.
- M-x recode-region
- Convert the region from a previous coding system to a new one.
The command C-x <RET> f
(set-buffer-file-coding-system
) sets the file coding system for
the current buffer—in other words, it says which coding system to
use when saving or reverting the visited file. You specify which
coding system using the minibuffer. If you specify a coding system
that cannot handle all of the characters in the buffer, Emacs warns
you about the troublesome characters when you actually save the
buffer.
Another way to specify the coding system for a file is when you visit
the file. First use the command C-x <RET> c
(universal-coding-system-argument
); this command uses the
minibuffer to read a coding system name. After you exit the minibuffer,
the specified coding system is used for the immediately following
command.
So if the immediately following command is C-x C-f, for example,
it reads the file using that coding system (and records the coding
system for when you later save the file). Or if the immediately following
command is C-x C-w, it writes the file using that coding system.
When you specify the coding system for saving in this way, instead
of with C-x <RET> f, there is no warning if the buffer
contains characters that the coding system cannot handle.
Other file commands affected by a specified coding system include
C-x C-i and C-x C-v, as well as the other-window variants
of C-x C-f. C-x <RET> c also affects commands that
start subprocesses, including M-x shell (see Shell).
If the immediately following command does not use the coding system,
then C-x <RET> c ultimately has no effect.
An easy way to visit a file with no conversion is with the M-x
find-file-literally command. See Visiting.
The variable default-buffer-file-coding-system
specifies the
choice of coding system to use when you create a new file. It applies
when you find a new file, and when you create a buffer and then save it
in a file. Selecting a language environment typically sets this
variable to a good choice of default coding system for that language
environment.
If you visit a file with a wrong coding system, you can correct this
with C-x <RET> r (revert-buffer-with-coding-system
).
This visits the current file again, using a coding system you specify.
The command C-x <RET> t (set-terminal-coding-system
)
specifies the coding system for terminal output. If you specify a
character code for terminal output, all characters output to the
terminal are translated into that coding system.
This feature is useful for certain character-only terminals built to
support specific languages or character sets—for example, European
terminals that support one of the ISO Latin character sets. You need to
specify the terminal coding system when using multibyte text, so that
Emacs knows which characters the terminal can actually handle.
By default, output to the terminal is not translated at all, unless
Emacs can deduce the proper coding system from your terminal type or
your locale specification (see Language Environments).
The command C-x <RET> k (set-keyboard-coding-system
)
or the variable keyboard-coding-system
specifies the coding
system for keyboard input. Character-code translation of keyboard
input is useful for terminals with keys that send non-ASCII
graphic characters—for example, some terminals designed for ISO
Latin-1 or subsets of it.
By default, keyboard input is translated based on your system locale
setting. If your terminal does not really support the encoding
implied by your locale (for example, if you find it inserts a
non-ASCII character if you type M-i), you will need to set
keyboard-coding-system
to nil
to turn off encoding.
You can do this by putting
(set-keyboard-coding-system nil)
in your ~/.emacs file.
There is a similarity between using a coding system translation for
keyboard input, and using an input method: both define sequences of
keyboard input that translate into single characters. However, input
methods are designed to be convenient for interactive use by humans, and
the sequences that are translated are typically sequences of ASCII
printing characters. Coding systems typically translate sequences of
non-graphic characters.
The command C-x <RET> x (set-selection-coding-system
)
specifies the coding system for sending selected text to the window
system, and for receiving the text of selections made in other
applications. This command applies to all subsequent selections, until
you override it by using the command again. The command C-x
<RET> X (set-next-selection-coding-system
) specifies the
coding system for the next selection made in Emacs or read by Emacs.
The command C-x <RET> p (set-buffer-process-coding-system
)
specifies the coding system for input and output to a subprocess. This
command applies to the current buffer; normally, each subprocess has its
own buffer, and thus you can use this command to specify translation to
and from a particular subprocess by giving the command in the
corresponding buffer.
The default for translation of process input and output depends on the
current language environment.
If a piece of text has already been inserted into a buffer using the
wrong coding system, you can decode it again using M-x
recode-region. This prompts you for the old coding system and the
desired coding system, and acts on the text in the region.
The variable file-name-coding-system
specifies a coding
system to use for encoding file names. If you set the variable to a
coding system name (as a Lisp symbol or a string), Emacs encodes file
names using that coding system for all file operations. This makes it
possible to use non-ASCII characters in file names—or, at
least, those non-ASCII characters which the specified coding
system can encode. Use C-x <RET> F
(set-file-name-coding-system
) to specify this interactively.
If file-name-coding-system
is nil
, Emacs uses a default
coding system determined by the selected language environment. In the
default language environment, any non-ASCII characters in file names are
not encoded specially; they appear in the file system using the internal
Emacs representation.
Warning: if you change file-name-coding-system
(or the
language environment) in the middle of an Emacs session, problems can
result if you have already visited files whose names were encoded using
the earlier coding system and cannot be encoded (or are encoded
differently) under the new coding system. If you try to save one of
these buffers under the visited file name, saving may use the wrong file
name, or it may get an error. If such a problem happens, use C-x
C-w to specify a new file name for that buffer.
If a mistake occurs when encoding a file name, use the command
M-x recode-file-name to change the file name's coding
system. This prompts for an existing file name, its old coding
system, and the coding system to which you wish to convert.
The variable locale-coding-system
specifies a coding system
to use when encoding and decoding system strings such as system error
messages and format-time-string
formats and time stamps. That
coding system is also used for decoding non-ASCII keyboard input on X
Window systems. You should choose a coding system that is compatible
with the underlying system's text representation, which is normally
specified by one of the environment variables LC_ALL,
LC_CTYPE, and LANG. (The first one, in the order
specified above, whose value is nonempty is the one that determines
the text representation.)