Python - String Literal Values

String Literal Values
	Chapter 12. Strings

String Literal Values

A string is a sequence of characters. The literal value for a string is written by surrounding the value with quotes or apostrophes. There are several variations to provide some additional features.

Basic String: "xyz" or 'xyz'. A basic string must be completed on a single line, or continued with a \ as the very last character of a line.
Multi-Line String, Triple-Quoted String: """xyz""" or '''xyz'''. A multi-line string continues on until the concluding triple-quote or triple-apostrophe.
Unicode String: u"Unicode", u'Unicode', u"""Unicode""", etc. Unicode is the Universal Character Set; each character requires from 1 to 4 bytes of storage. ASCII is a single-byte character set; each of the 256 ASCII characters requires a single byte of storage. Unicode permits any character in any of the languages in common use around the world.
Raw String: r"raw\nstring", r'raw\nstring', etc. The backslash characters (\) are not interpreted by Python, but are left as is. This is handy for Windows files names that contain \'s. It is also handy for regular expressions that make extensive use of backslashes. Example: '\n' is a one-character string with a non-printing newline; r'\n' is a two-character string.

Outside of raw strings, non-printing characters and Unicode characters that aren't found on your keyboard are created using escapes. A table of escapes is provided below. These are Python representations for unprintable ASCII characters. They're called escapes because the \ is an escape from the usual meaning of the following character.

Escape	Meaning
`\` at end of a line	The end-of-line is ignored, the `string` continues on the next line
`\\`	Backslash (\)
`\'`	Apostrophe (')
`\"`	Quote (")
`\a`	ASCII Bell (BEL), an audible signal. Some OS's translate this to a screen flash or ignore it completely.
`\b`	ASCII Backspace (BS)
`\f`	ASCII Formfeed (FF)
`\n`	ASCII Linefeed (LF)
`\r`	ASCII Carriage Return (CR)
`\t`	ASCII Horizontal Tab (TAB)
`\v`	ASCII Vertical Tab (VT)
`\ ooo`	ASCII character with octal value `ooo` . Exactly three octal digits are required.
`\x hh`	ASCII character with hex value `hh`

Note that adjacent strings are automatically put together to make a longer string.

"ab" "cd" "ef" is the same as "abcdef".

For Unicode, a special \u xxxx escape is provided. This requires the four digit Unicode character identification. 日本 is written in Python as u'\u65e5\u672c' using two Unicode characters provided via escapes. There are a variety of Unicode encoding schemes, for example, UTF-8, UTF-16 and LATIN-1. The codecs module provides mechanisms for encoding and decoding Unicode strings.


Chapter 12. Strings		String Operations