The Python Apprentice
上QQ阅读APP看书,第一时间看更新

str – an immutable sequence of Unicode code points

Strings in Python have the datatype str and we've been using them extensively already. A string is a sequence of Unicode code-points, and for the most part you can think of code-points as being like characters, although they aren't strictly equivalent. The sequence of code-points in a Python string is immutable, so once you've constructed a string, you can't modify its contents.

The difference between code points, letters, characters, and glyphs can be confusing. Let's try to clarify with an example: The Greek capital letter Σ (sigma), which is of course used widely in the writing of Greek text, is also used by mathematicians to signify summation of a series. These two uses of the letter sigma are represented by distinct Unicode characters called GREEK CAPITAL LETTER SIGMA and N-ARY SUMMATION respectively. Typically, where the same letter is used to convey different
information, a different Unicode character is used. Another example would be the GREEK CAPITAL LETTER OMEGA and OHM SIGN, the symbol for the unit of electrical resistance. A code point is any one member of the set of of numerical values which make up the code space. Each character is associated with a single code point, so GREEK CAPITAL LETTER SIGMA is assigned to U+03A3 and N-ARY SUMMATION is assigned to U+2211.  As we have done here, code points are often written in U+nnnn form where nnnn is a four, five or six digit hexadecimal number. Not all code points have yet been allocated to characters. For example, U+0378 is an unassigned code point, and there’s nothing to stop you including this code point in a Python str using the \u0378 escape sequence; hence, str really is a sequence of code points and not a sequence of characters. Although the term in not used in the context of Python, for completeness we feel we should point out that a glyph is the visual representation of a character. Different characters, such as GREEK CAPITAL LETTER SIGMA and N-ARY SUMMATION may be rendered using the same glyph, or indeed different glyphs, depending on the font in use.