Writing testable scripts with the script-library switch
It's often very easy to create a Python script file. A script file is very easy to use because when we provide the file to Python, it runs immediately. In some cases, there are no function or class definitions; the script file is the sequence of Python statements.
These simple script files are very difficult to test. More importantly, they're also difficult to reuse. When we want to build larger and more sophisticated applications from a collection of script files, we're often forced to re-engineer a simple script into a function.
Getting ready
Let's say that we have a handy implementation of the haversine distance function called haversine(), and it's in a file named ch03_r11.py.
Initially, the file might look like this:
import csv
from pathlib import Path
from math import radians, sin, cos, sqrt, asin
from functools import partial
MI = 3959
NM = 3440
KM = 6373
def haversine(lat_1: float, lon_1: float,
lat_2: float, lon_2: float, *, R: float) -> float:
... and more ...
nm_haversine = partial(haversine, R=NM)
source_path = Path("waypoints.csv")
with source_path.open() as source_file:
reader = csv.DictReader(source_file)
start = next(reader)
for point in reader:
d = nm_haversine(
float(start['lat']), float(start['lon']),
float(point['lat']), float(point['lon'])
)
print(start, point, d)
start = point
We've omitted the body of the haversine() function, showing only ... and more..., since it's exactly the same code we've already shown in the Picking an order for parameters based on partial functions recipe. We've focused on the context in which the function is in a Python script, which also opens a file, wapypoints.csv, and does some processing on that file.
How can we import this module without it printing a display of distances between waypoints in our waypoints.csv file?
How to do it...
Python scripts can be simple to write. Indeed, it's often too simple to create a working script. Here's how we transform a simple script into a reusable library:
- Identify the statements that do the work of the script: we'll distinguish between definition and action. Statements such as import, def, and class are clearly definitional—they define objects but don't take a direct action to compute or produce the output. Almost all other statements take some action. The distinction is entirely one of intent.
- In our example, we have some assignment statements that are more definition than action. These actions are like def statements; they only set variables that are used later. Here are the generally definitional statements:
MI = 3959 NM = 3440 KM = 6373 def haversine(lat_1: float, lon_1: float, lat_2: float, lon_2: float, *, R: float) -> float: ... and more ... nm_haversine = partial(haversine, R=NM)
The rest of the statements clearly take an action toward producing the printed results.
So, the testability approach is as follows:
- Wrap the actions into a function:
def distances(): source_path = Path("waypoints.csv") with source_path.open() as source_file: reader = csv.DictReader(source_file) start = next(reader) for point in reader: d = nm_haversine( float(start['lat']), float(start['lon']), float(point['lat']), float(point['lon']) ) print(start, point, d) start = point
- Where possible, extract literals and turn them into parameters. This is often a simple movement of the literal to a parameter with a default value. From this:
def distances(): source_path = Path("waypoints.csv")
To this:
def distances(source_path: Path = Path("waypoints.csv")) -> None:
This makes the script reusable because the path is now a parameter instead of an assumption.
- Include the following as the only high-level action statements in the script file:
if __name__ == "__main__": distances()
We've packaged the action of the script as a function. The top-level action script is now wrapped in an if statement so that it isn't executed during import.
How it works...
The most important rule for Python is that an import of a module is essentially the same as running the module as a script. The statements in the file are executed, in order, from top to bottom.
When we import a file, we're generally interested in executing the def and class statements. We might be interested in some assignment statements.
When Python runs a script, it sets a number of built-in special variables. One of these is __name__. This variable has two different values, depending on the context in which the file is being executed:
- The top-level script, executed from the command line: In this case, the value of the built-in special name of __name__ is set to __main__.
- A file being executed because of an import statement: In this case, the value of __name__ is the name of the module being created.
The standard name of __main__ may seem a little odd at first. Why not use the filename in all cases? This special name is assigned because a Python script can be read from one of many sources. It can be a file. Python can also be read from the stdin pipeline, or it can be provided on the Python command line using the -c option.
When a file is being imported, however, the value of __name__ is set to the name of the module. It will not be __main__. In our example, the value __name__ during import processing will be ch03_r08.
There's more...
We can now build useful work around a reusable library. We might make several files that look like this:
File trip_1.py:
from ch03_r11 import distances
distances('trip_1.csv')
Or perhaps something even more complex:
File all_trips.py:
from ch03_r11 import distances
for trip in 'trip_1.csv', 'trip_2.csv':
distances(trip)
The goal is to decompose a practical solution into two collections of features:
- The definition of classes and functions
- A very small action-oriented script that uses the definitions to do useful work
To get to this goal, we'll often start with a script that conflates both sets of features. This kind of script can be viewed as a spike solution. Our spike solution should evolve towards a more refined solution as soon as we're sure that it works. A spike or piton is a piece of removable mountain-climbing gear that doesn't get us any higher on the route, but it enables us to climb safely.
See also
- In Chapter 7, Basics of Classes and Objects, we'll look at class definitions. These are another kind of widely used definitional statement, in addition to function definitions.