A string
is a sequence of characters. The
literal value for a string
is written by
surrounding the value with quotes or apostrophes. There are several
variations to provide some additional features.
-
Basic String
-
"xyz"
or 'xyz'
. A
basic string
must be completed on a single
line, or continued with a \
as the very last
character of a line.
-
Multi-Line String,
Triple-Quoted String
-
"""xyz"""
or
'''xyz'''
. A multi-line
string
continues on until the concluding
triple-quote or triple-apostrophe.
-
Unicode String
-
u"Unicode",
u'Unicode'
, u"""Unicode"""
,
etc. Unicode is the Universal Character Set; each character
requires from 1 to 4 bytes of storage. ASCII is a single-byte
character set; each of the 256 ASCII characters requires a single
byte of storage. Unicode permits any character in any of the
languages in common use around the world.
-
Raw String
-
r"raw\nstring"
, r'raw\nstring', etc. The
backslash characters (\) are
not
interpreted
by Python, but are left as is. This is handy for Windows files
names that contain \'s. It is also handy for regular expressions
that make extensive use of backslashes. Example:
'\n'
is a one-character
string
with a non-printing newline;
r'\n'
is a two-character
string
.
Outside of raw strings, non-printing characters and Unicode
characters that aren't found on your keyboard are created using
escapes. A table of escapes is provided below.
These are Python representations for unprintable ASCII characters.
They're called escapes because the \
is an escape from the
usual meaning of the following character.
Escape |
Meaning |
\ at end of a line |
The end-of-line is ignored, the string
continues on the next line |
\\
|
Backslash (\) |
\'
|
Apostrophe (') |
\"
|
Quote (") |
\a
|
ASCII Bell (BEL), an audible signal. Some OS's translate this
to a screen flash or ignore it completely. |
\b
|
ASCII Backspace (BS) |
\f
|
ASCII Formfeed (FF) |
\n
|
ASCII Linefeed (LF) |
\r
|
ASCII Carriage Return (CR) |
\t
|
ASCII Horizontal Tab (TAB) |
\v
|
ASCII Vertical Tab (VT) |
\
ooo
|
ASCII character with octal value
ooo
. Exactly three octal digits are
required. |
\x
hh
|
ASCII character with hex value
hh
|
Note that adjacent string
s are
automatically put together to make a longer
string
.
"ab" "cd" "ef"
is the same as
"abcdef"
.
For Unicode, a special \u
xxxx
escape is provided. This requires the four digit Unicode
character identification.
日本
is written
in Python as u'\u65e5\u672c'
using two Unicode
characters provided via escapes. There are a variety of Unicode encoding
schemes, for example, UTF-8, UTF-16 and LATIN-1. The
codecs
module provides mechanisms for encoding
and decoding Unicode string
s.