String Built-in Functions
The following built-in functions are relevant to
string
manipulation
-
chr
(
i
) →
character
-
Return a string
of one character with
ordinal i; 0 ≤
i
< 256.
-
len
(
object
) →
integer
-
Return the number of items of a sequence or mapping.
-
ord
(
c
) →
integer
-
Return the integer ordinal of a one character
string
-
repr
(
object
) →
string
-
Return the canonical string
representation of the object. For most object types,
eval(repr(object)) == object
.
-
str
(
object
) →
string
-
Return a nice string
representation
of the object. If the argument is a string
,
the return value is the same object.
-
unichr
(
i
) → Unicode
string
-
Return a Unicode string
of one
character with ordinal i; 0 ≤
i
<
65536.
-
unicode
(
string
,
[
encoding
, ]
[
errors
]) → Unicode string
-
Creates a new Unicode object from the given encoded
string
.
encoding
defaults to the current default string
encoding and
errors
, defining the error
handling, to 'strict'.
For character code manipulation, there are three related
functions: chr
, ord
and
unichr
. chr
returns the ASCII
character that belongs to an ASCII code number.
unichr
returns the Unicode character the belongs to
a Unicode number. ord
transforms an ASCII character
to its ASCII code number, or transforms a Unicode character to its
Unicode number.
The len
function returns the length of the
string
.
>>>
len("abcdefg")
7
>>>
len(r"\n")
2
>>>
len("\n")
1
The str
function converts any object to a
string
.
>>>
a= str(355.0/113.0)
>>>
a
'3.14159292035'
>>>
len(a)
13
The repr
function also converts an object to
a string
. However, repr
usually creates a string
suitable for use as
Python source code. For simple numeric types, it's not terribly
interesting. For more complex, types, however, it reveals details of
their structure. It can also be invoked using the reverse
quotes (`
), also called accent grave,
(underneath the tilde, ~
, on most keyboards).
>>>
a="""a very
...
long string
...
on multiple lines"""
>>>
print repr(a)
'a very\012long string\012on multiple lines'
>>>
print `a`
'a very\012long string\012on multiple lines'
This representation shows the newline characters
(\012
) embedded within the triple-quoted
string
. If we simply print a
or str
(
a
), we would see the
string interpreted instead of represented.
>>>
a="""a very
...
long string
...
on multiple lines"""
>>>
print a
a very
long string
on multiple lines
The unicode
(
string
,
[
encoding
, ]
[
errors
]) function converts the
string
to a specific Unicode external
representation. The default
encoding
is 'UTF-8'
with 'strict' error handling. Choices for
errors
are 'strict', 'replace' and 'ignore'. Strict raises an exception for
unrecognized characters, replace substitutes the Unicode replacement
character (\uFFFD
) and ignore skips over invalid
characters. The codecs
and
unicodedata
modules provide more functions for
working with Unicode.