Modern Python Cookbook
上QQ阅读APP看书,第一时间看更新

Building complicated strings from lists of characters

How can we make complicated changes to an immutable string? Can we assemble a string from inpidual characters?

In most cases, the recipes we've already seen give us a number of tools for creating and modifying strings. There are yet more ways in which we can tackle the string manipulation problem. In this recipe, we'll look at using a list object as a way to decompose and rebuild a string. This will dovetail with some of the recipes in Chapter 4, Built-In Data Structures Part 1: Lists and Sets.

Getting ready

Here's a string that we'd like to rearrange:

>>> title = "Recipe 5: Rewriting an Immutable String"

We'd like to do two transformations:

  • Remove the part before:
  • Replace the punctuation with _ and make all the characters lowercase

We'll make use of the string module:

>>> from string import whitespace, punctuation

This has two important constants:

  • string.whitespace lists all of the ASCII whitespace characters, including space and tab.
  • string.punctuation lists the ASCII punctuation marks.

How to do it...

We can work with a string exploded into a list. We'll look at lists in more depth in Chapter 4, Built-In Data Structures Part 1: Lists and Sets:

  1. Explode the string into a list object:
    >>> title_list = list(title)
    
  2. Find the partition character. The index() method for a list has the same semantics as the index() method has for a string. It locates the position with the given value:
    >>> colon_position = title_list.index(':')
    
  3. Delete the characters that are no longer needed. The del statement can remove items from a list. Unlike strings, lists are mutable data structures:
    >>> del title_list[:colon_position+1]
    
  4. Replace punctuation by stepping through each position. In this case, we'll use a for statement to visit every index in the string:
    >>> for position in range(len(title_list)):
    ...    if title_list[position] in whitespace+punctuation:
    ...        title_list[position]= '_'
    
  5. The expression range(len(title_list)) generates all of the values between 0 and len(title_list)-1. This assures us that the value of position will be each value index in the list. Join the list of characters to create a new string. It seems a little odd to use a zero-length string, '', as a separator when concatenating strings together. However, it works perfectly:
    >>> title = ''.join(title_list)
    >>> title
    '_Rewriting_an_Immutable_String'
    

We assigned the resulting string back to the original variable. The original string object, which had been referred to by that variable, is no longer needed: it's automatically removed from memory (this is known as "garbage collection"). The new string object replaces the value of the variable.

How it works...

This is a change in representation trick. Since a string is immutable, we can't update it. We can, however, convert it into a mutable form; in this case, a list. We can make whatever changes are required to the mutable list object. When we're done, we can change the representation from a list back to a string and replace the original value of the variable.

Lists provide some features that strings don't have. Conversely, strings provide a number of features lists don't have. As an example, we can't convert a list into lowercase the way we can convert a string.

There's an important trade-off here:

  • Strings are immutable, which makes them very fast. Strings are focused on Unicode characters. When we look at mappings and sets, we can use strings as keys for mappings and items in sets because the value is immutable.
  • Lists are mutable. Operations are slower. Lists can hold any kind of item. We can't use a list as a key for a mapping or an item in a set because the list value could change.

Strings and lists are both specialized kinds of sequences. Consequently, they have a number of common features. The basic item indexing and slicing features are shared. Similarly, a list uses the same kind of negative index values that a string does: list[-1] is the last item in a list object.

We'll return to mutable data structures in Chapter 4, Built-In Data Structures Part 1: Lists and Sets.

See also

  • We can also work with strings using the internal methods of a string. See the Rewriting an immutable string recipe for more techniques.
  • Sometimes, we need to build a string, and then convert it into bytes. See the Encoding strings – creating ASCII and UTF-8 bytes recipe for how we can do this.
  • Other times, we'll need to convert bytes into a string. See the Decoding Bytes – How to get proper characters from some bytes recipe for more information.