Testing, Debugging, and Optimizing: Chapter 18 - Python in a Nutshell

by Alex Martelli
Python in a Nutshell book cover

This excerpt is from Python in a Nutshell. Python in a Nutshell provides a solid, no-nonsense quick reference to information that programmers rely on the most. This book will immediately earn its place in any Python programmer's library.

buy button

You're not finished with a programming task when you're done writing the code; you're finished when the code runs correctly and with acceptable performance. Testing (covered in "Testing" on page 452) means verifying that code runs correctly by exercising the code under known conditions and checking that results are as expected. Debugging (covered in "Debugging" on page 461) means discovering causes of incorrect behavior and repairing them (repair is often easy, once you figure out the causes).

Optimizing (covered in "Optimization" on page 474) is often used as an umbrella term for activities meant to ensure acceptable performance. Optimizing breaks down into benchmarking (measuring performance for given tasks to check that it's within acceptable bounds), profiling (instrumenting the program to identify performance bottlenecks), and optimizing proper (removing bottlenecks to make overall program performance acceptable). Clearly, you can't remove performance bottlenecks until you've found out where they are (using profiling), which in turn requires knowing that there are performance problems (using benchmarking).

This chapter covers the three subjects in the natural order in which they occur in development: testing first and foremost, debugging next, and optimizing last. However, most programmers' enthusiasm focuses on optimization: testing and debugging are often (wrongly, in my opinion) perceived as being chores, while optimization is perceived as being fun. As a consequence, if you were to read only one section of the chapter, I would suggest that section be "Developing a Fast-Enough Python Application" on page 474, which summarizes the Pythonic approach to optimization—close to Jackson's classic "Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet."

All of these tasks are large and important, and each could fill at least a book by itself. This chapter does not even come close to exploring every related technique and implication; it focuses on Python-specific techniques, approaches, and tools.


In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and many more distinctions could be drawn, but I focus on the issues of most immediate importance to software developers. Many developers are reluctant to spend time on testing, seeing it as time subtracted from "real" development, but this attitude is short-sighted: defects are easier to fix the earlier you find out about them, so each hour spent developing tests can amply pay back for itself by finding defects ASAP, thus saving many hours of debugging that would otherwise have been needed in later phases of the software development cycle.

Unit Testing and System Testing

Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional or integration testing) involves running an entire program with known inputs. Some classic books on testing also draw the distinction between white-box testing, done with knowledge of a program's internals, and black-box testing, done without such knowledge. This classic viewpoint parallels, but does not exactly duplicate, the modern one of unit versus system testing.

Unit and system testing serve different goals. Unit testing proceeds apace with development; you can and should test each unit as you're developing it. One modern approach is known as test-driven development (TDD): for each feature that your program must have, you first write unit tests and only then do you proceed to write code that implements the feature. TDD may seem upside-down, but it has several advantages. For example, it ensures that you won't omit unit tests for some feature. Further, developing test-first is helpful because it urges you to focus first on the tasks a certain function, class, or method should accomplish, and to deal only afterward with how to implement that function, class, or method. A recent important innovation along the lines of TDD is behavior-driven development; see http://behavior-driven.org/ for all details about this new and promising development methodology.

In order to test a unit, which may depend on other units not yet fully developed, you often have to write stubs, which are fake implementations of various units' interfaces that give known and correct responses in cases needed to test other units. The Mock module (http://python-mock.sourceforge.net/) can be very helpful in the implementation of many such stubs (a good tutorial, with links to other useful documents about Mock Objects, is at http://xper.org/wiki/seminar/PythonMockObjectTutorial).

System testing comes afterward, since it requires the system to exist, with some subset of system functionality believed to be in working condition. System testing provides a sanity check: given that each module in the program works properly (passes unit tests), does the whole program work? If each unit is okay but the system as a whole is not, there is a problem in the integration between units, meaning the way the units cooperate. For this reason, system testing is also known as integration testing.

System testing is similar to running the system in production use, except that you fix the inputs in advance so that any problems you may find are easy to reproduce. The cost of failures in system testing is lower than in production use, since outputs from system testing are not used to make decisions, serve customers, control external systems, and so on. Rather, outputs from system testing are systematically compared with the outputs that the system should produce given the known inputs. The purpose is to find, in a cheap and reproducible way, any discrepancy between what the program should do and what the program actually does.

Failures discovered by system testing, just like system failures in production use, may reveal some defects in unit tests, as well as defects in the code. Unit testing may have been insufficient: a module's unit tests may have failed to exercise all needed functionality of that module. In this case, the unit tests clearly need to be beefed up. Do that before you change your code to fix the problem, then run the newly strengthened unit tests to confirm that they do now show the existence of the problem. Then fix the problem and run unit tests again to confirm they show no problem anymore. Finally, rerun the system tests to confirm that the problem has indeed gone away.

Most often, failures in system testing reveal communication problems within the development team: a module correctly implements a certain functionality, but another module expects different functionality. This kind of problem (an integration problem in the strict sense) is hard to pinpoint in unit testing. In good development practice, unit tests must run often, so it is crucial that they run fast. It's therefore essential that each unit can assume other units are working correctly and as expected.

Unit tests that are run in reasonably late stages of development can reveal integration problems if the system architecture is hierarchical, a common and reasonable organization. In such an architecture, lower-level modules depend on no others (except perhaps library modules, which you can assume to be correct), and thus the unit tests of such lower-level modules, if complete, suffice to assure correctness. Higher-level modules depend on lower-level ones, and therefore also depend on correct understanding about what functionality each module expects and supplies. Running complete unit tests on higher-level modules, using the true lower-level modules rather than stubs, exercises the interface between modules, as well as the higher-level modules' own code.

Unit tests for higher-level modules are thus run in two ways. You run the tests with stubs for the lower levels during the early stages of development when the lower-level modules are not yet ready or, later, when you need to check only the correctness of the higher levels. During later stages of development, you also regularly run the higher-level modules' unit tests using the true lower-level modules. In this way, you check the correctness of the whole subsystem, from the higher levels downward. Nevertheless, even in this favorable case, you still need to write and run system tests to make sure the whole system's functionality is exercised and checked, and that no interface between modules is neglected.

System testing is similar to running the program in normal ways. You need special support only to ensure that known inputs are supplied and that outputs are captured for comparison with expected outputs. This is easy for programs that perform I/O on files, but terribly hard for programs whose I/O relies on a GUI, network, or other communication with independent external entities. To simulate such external entities and make them predictable and entirely observable, you generally need platform-dependent infrastructure. Another useful piece of supporting infrastructure for system testing is a testing framework that automates the running of system tests, including logging of successes and failures. Such a framework can also help testers prepare sets of known inputs and corresponding expected outputs.

Both free and commercial programs for these purposes exist, but they are not dependent on which programming languages are used in the system under test. System testing is a close kin to what was classically known as black-box testing, or testing that is independent from the implementation of the system under test (and therefore, in particular, independent from the programming languages used for implementation). Instead, testing frameworks usually depend on the operating system platform on which they run, since the tasks they perform are platform-dependent: running programs with given inputs, capturing their outputs, and particularly simulating and capturing GUI, network, and other interprocess communication I/O. Since frameworks for system testing depend on the platform and not on programming languages, I do not cover them further in this book.

The doctest Module

The doctest module has the primary purpose of letting you create good usage examples in your code's docstrings by checking that the examples do in fact produce the results that your docstrings show for them. doctest recognizes such examples by looking within the docstring for the interactive Python prompt '>>> ', followed on the same line by a Python statement, and the statement's expected output on the next line.

As you're developing a module, keep the docstrings up to date and gradually enrich them with examples. Each time part of the module (e.g., a function) is ready, or even partially ready, make it a habit to add examples to the docstrings. Import the module into an interactive session, and interactively use the parts you just developed in order to provide examples with a mix of typical cases, limit cases, and failing cases. For this specific purpose only, use from module import * so that your examples don't prefix module . to each name the module supplies. Copy and paste the text of the interactive session into the docstring in an editor, adjust any mistakes, and you're almost done.

Your documentation is now enriched with examples, and readers will have an easier time following it, assuming you choose a good mix of examples and season it wisely with nonexample text. Make sure you have docstrings, with examples, for your module as a whole, and for each function, class, and method that the module exports. You may choose to skip functions, classes, and methods whose names start with _, since, as their names indicate, they're meant to be private implementation details; doctest by default ignores them, and so should most readers of your module.

Examples that don't match the way your code works are worse than useless. Documentation and comments are useful only if they match reality. Docstrings and comments often get out of date as code changes, and then they become misinformation, hampering rather than helping any reader of the source. Better to have no comments and docstrings at all than to have ones that lie. doctest can help you through the examples in your docstrings. A failing doctest run should prompt you to review the docstring that contains the failing examples, reminding you to keep the docstring's text updated, too.

At the end of your module's source, insert the following small snippet:

if _ _name_ _ == '_ _main_ _':
    import doctest
    doctest.testmod( )

This code calls function testmod of module doctest when you run your module as the main program. testmod examines all relevant docstrings (the module docstring, and docstrings of all public functions, public classes, and public methods of public classes). In each docstring, testmod finds all examples (by looking for occurrences of the interpreter prompt '>>> ', possibly preceded by whitespace) and runs each example. testmod checks that each example's results are equal to the output given in the docstring right after the example. In the case of exceptions, testmod ignores the traceback, and just checks that the expected and observed error messages are equal.

When everything goes right, testmod terminates silently. Otherwise, it outputs detailed messages about examples that failed, showing expected and actual output. Example 18-1 shows a typical example of doctest at work on a module mod.py.

Example 18-1. Using doctest

This module supplies a single function reverseWords that reverses a string by words.

>>> reverseWords('four score and seven years')
'years seven and score four'
>>> reverseWords('justoneword')
>>> reverseWords('')

You must call reverseWords with one argument, and it must be a string:

>>> reverseWords( )
Traceback (most recent call last):
TypeError: reverseWords( ) takes exactly 1 argument (0 given)
>>> reverseWords('one', 'another')
Traceback (most recent call last):
TypeError: reverseWords( ) takes exactly 1 argument (2 given)
>>> reverseWords(1)
Traceback (most recent call last):
AttributeError: 'int' object has no attribute 'split'
>>> reverseWords(u'however, unicode is all right too')
u'too right all is unicode however,'

As a side effect, reverseWords eliminates any redundant spacing:

>>> reverseWords('with   redundant   spacing')
'spacing redundant with'

def reverseWords(astring):
    words = astring.split( )
    words.reverse( )
    return ' '.join(words)

if _ _name_ _=='_ _main_ _':
    import doctest, sys
    doctest.testmod(sys.modules[_ _name_ _])

In this module's docstring, I have snipped the tracebacks from the docstring and replaced them with an ellipsis: this is good common practice, since doctest ignores tracebacks and they add nothing to the explanatory value of each failing case. Apart from this snipping, the docstring is the copy and paste of an interactive session, with the addition of some explanatory text and empty lines for readability. Save this source as mod.py, and then run it with python mod.py. It produces no output, meaning that all examples work right. Try python mod.py -v to get an account of all tests tried and a verbose summary at the end. Finally, alter the example results in the module docstring, making them incorrect, to see the messages doctest provides for errant examples.

While doctest is not primarily meant for general-purpose unit testing, it can nevertheless be a very convenient tool for the purpose. The recommended way to do unit testing in Python is with module unittest, covered in "The unittest Module" on page 457. However, unit testing with doctest can be easier and faster to set up, since it requires little more than copy and paste from an interactive session. If you need to maintain a module that lacks unit tests, retrofitting such tests into the module with doctest is a reasonable compromise. It's certainly better to have doctest-based unit tests than not to have any unit tests at all, as might otherwise happen should you decide that setting up tests "properly" with unittest would take you too long.

If you do decide to use doctest for unit testing, don't cram extra tests into your module's docstrings. This would damage the docstrings by making them too long and hard to read. Keep in the docstrings the right amount and kind of examples, strictly for explanatory purposes, just as if unit testing was not in the picture. Instead, put the extra tests into a global variable of your module, a dictionary named _ _test_ _. The keys in _ _test_ _ are strings used as arbitrary test names, and the corresponding values are strings that doctest picks up and uses in just the same way as it uses docstrings. The values in _ _test_ _ may also be function and class objects, in which case doctest examines their docstrings for tests to run. This latter feature is a convenient way to run doctest on objects with private names, which doctest skips by default.

In Python 2.4, the doctest module also supplies two functions that return instances of the unittest.TestSuite class based on doctests so that you can integrate such tests into testing frameworks based on unittest. The documentation for this advanced functionality is at http://docs.python.org/lib/doctest-unittest-api.html.

The unittest Module

The unittest module is the Python version of a unit-testing framework originally developed by Kent Beck for Smalltalk. Similar, widespread versions of the framework also exist for many other programming languages (e.g., the JUnit package for Java) and are often collectively referred to as xUnit.

To use unittest, you don't put your testing code in the same source file as the tested module, but instead write a separate test module for each module you're testing. A popular convention is to name the test module the same as the module being tested, with a prefix such as 'test_', and put it in a subdirectory named test of the directory where you keep your sources. For example, the test module for mod.py can be test/test_mod.py. You need a simple and consistent naming convention to make it easy for you to write and maintain auxiliary scripts that find and run all unit tests for a package.

Separation between a module's source code and its unit-testing code lets you refactor the module more easily, including possibly recoding its functionality in C, without perturbing the unit-testing code. Knowing that test_mod.py stays intact, whatever changes you make to mod.py, enhances your confidence that passing the tests in test_mod.py indicates that mod.py still works correctly after the changes.

A unit-testing module defines one or more subclasses of unittest's TestCase class. Each subclass specifies one or more test cases by defining test-case methods, which are methods that are callable without arguments and whose names start with test. The subclass may also override method setUp, which the framework calls to prepare a new instance just before calling each test-case method, and tearDown, which the framework calls to clean things up just after calling each test-case method. Each test-case method calls methods of class TestCase whose names start with assert in order to express the conditions that the test must meet. unittest runs the test-case methods within a TestCase subclass in arbitrary order, each on a new instance of the subclass, running setUp just before each test case and tearDown just after each test case.

unittest provides other facilities, such as grouping test cases into test suites, and other more advanced functionality. You do not need such extras unless you're defining a custom unit-testing framework or, at the very least, structuring complicated testing procedures for equally complicated packages. In almost all cases, the concepts and details covered in this section are sufficient to perform effective and systematic unit testing. Example 18-2 shows how to use unittest to provide unit tests for the module mod.py of Example 18-1. For illustration purposes, this example uses unittest to perform exactly the same tests that Example 18-1 uses as examples in docstrings using doctest.

Example 18-2. Using unittest

""" This module tests function reverseWords provided by module mod.py. """
import unittest import mod

class ModTest(unittest.TestCase):

    def testNormalCase(self):
        self.assertEqual(mod.reverseWords('four score and seven years'),
            'years seven and score four')

    def testSingleWord(self):
        self.assertEqual(mod.reverseWords('justoneword'), 'justoneword')

    def testEmpty(self):
        self.assertEqual(mod.reverseWords(''), '')

    def testRedundantSpacing(self):
        self.assertEqual(mod.reverseWords('with   redundant   spacing'),
            'spacing redundant with')

    def testUnicode(self):
        self.assertEqual(mod.reverseWords(u'unicode is all right too'),
            u'too right all is unicode')

    def testExactlyOneArgument(self):
        self.assertRaises(TypeError, mod.reverseWords)
        self.assertRaises(TypeError, mod.reverseWords, 'one', 'another')

    def testMustBeString(self):
        self.assertRaises((AttributeError,TypeError), mod.reverseWords, 1)

if _ _name_ _=  ='_ _main_ _':
    unittest.main( )

Running this script with python test_mod.py is a bit more verbose than using python mod.py to run doctest, as in Example 18-1. test_mod.py outputs a . (dot) for each test-case method it runs, then a separator line of dashes, and finally a summary line, such as "Ran 7 tests in 0.110s," and a final line of "OK" if every test passed.

Each test-case method makes one or more calls to methods whose names start with assert (or their synonyms whose names start with fail). Here, method testExactlyOneArgument is the only one with two such calls. In more complicated cases, multiple calls to assert methods from a single test-case method are quite common.

Even in a case as simple as this, one minor aspect shows that, for unit testing, unittest is more powerful and flexible than doctest. In method testMustBeString, we pass as the first argument to assertRaises a pair of exception classes, meaning we accept either kind of exception. test_mod.py therefore accepts as valid multiple implementations of mod.py. It accepts the implementation in Example 18-1, which tries calling method split on its argument, and therefore raises AttributeError when called with an argument that is not a string. However, it also accepts a different hypothetical implementation, one that raises TypeError instead when called with an argument of the wrong type. It would be possible to code this testing functionality with doctest, but it would be awkward and nonobvious, while unittest makes it simple and natural.

This kind of flexibility is crucial for real-life unit tests, which to some extent are executable specifications for their modules. You could, pessimistically, view the need for flexibility as indicating that the interface of the code you're testing is not well defined. However, it's best to view the interface as being defined with a useful amount of flexibility for the implementer: under circumstance X (argument of invalid type passed to function reverseWords, in this example), either of two things (raising AttributeError or TypeError) is allowed to happen.

Thus, implementations with either of the two behaviors are correct, and the implementer can choose between them on the basis of such considerations as performance and clarity. By viewing unit tests as executable specifications for their modules (the modern view, and the basis of test-first development) rather than as white-box tests strictly constrained to a specific implementation (as in some traditional taxonomies of testing), the tests become a more vital component of the software development process.

The TestCase class

With unittest, you write test cases by subclassing class TestCase and adding methods, callable without arguments, whose names start with test. Such test-case methods, in turn, call methods that your subclass inherits from TestCase, whose names start with assert (or their synonyms, whose names start with fail), to indicate conditions that must hold for the test to succeed.

Class TestCase also defines two methods that your subclass can optionally override in order to group actions to perform right before and right after each test-case method runs. This doesn't exhaust TestCase's functionality, but you won't need the rest unless you're developing testing frameworks or performing some similarly advanced task. The frequently called methods in a TestCase instance t are the following.

Unit tests dealing with large amounts of data

Unit tests must be fast, since they are run frequently during development. Therefore, it's best to unit-test each aspect of your modules' functionality on small amounts of data when possible. This makes each unit test faster and lets you conveniently embed all needed data in the test's source code. When you test a function that reads from or writes to a file object, in particular, you normally use an instance of class cStringIO (covered in "The StringIO and cStringIO Modules" on page 229) to simulate a file object while holding the data in memory; this approach is faster than writing to disk and saves you the bother of having to remove disk files after your tests have run.

However, in some rare cases, it may be impossible to fully exercise a module's functionality without supplying and/or comparing data in quantities larger than can be reasonably embedded in a test's source code. In such cases, your unit test will have to rely on auxiliary, external datafiles to hold the data it needs to supply to the module it tests and/or the data it needs to compare to the tested module's output. Even then, you're generally better off reading the data into instances of cStringIO rather than directing the tested module to perform actual disk I/O. Even more important, I strongly suggest that you generally use stubs to test modules meant to interact with other external entities, such as a database, a GUI, or some other program over a network. It's easier for you to control all aspects of the test when using stubs rather than real external entities. Also, to reiterate, the speed at which you can run unit tests is important, and it's invariably faster to perform simulated operations in stubs, rather than real operations.


Since Python's development cycle is so fast, the most effective way to debug is often to edit your code so that it outputs relevant information at key points. Python has many ways to let your code explore its own state in order to extract information that may be relevant for debugging. The inspect and traceback modules specifically support such exploration, which is also known as reflection or introspection.

Once you have obtained debugging-relevant information, the print statement is often the simplest way to display it. You can also log debugging information to files. Logging is particularly useful for programs that run unattended for a long time, such as server programs. Displaying debugging information is just like displaying other kinds of information, as covered in Chapters 10 and 17. Logging such information is mostly like writing to files (as covered in Chapter 10) or otherwise persisting information, as covered in Chapter 11; however, to help with the specific task of logging, Python's standard library also supplies a logging module, covered in "The logging module" on page 136. As covered in excepthook on page 168, rebinding attribute excepthook of module sys lets your program log detailed error information just before your program is terminated by a propagating exception.

Python also offers hooks that enable interactive debugging. Module pdb supplies a simple text-mode interactive debugger. Other interactive debuggers for Python are part of integrated development environments (IDEs), such as IDLE and various commercial offerings. However, I do not cover IDEs in this book.

Before You Debug

Before you embark on possibly lengthy debugging explorations, make sure you have thoroughly checked your Python sources with the tools mentioned in Chapter 3. Such tools can catch only a subset of the bugs in your code, but they're much faster than interactive debugging, and so their use amply repays itself.

Moreover, again before starting a debugging session, make sure that all the code involved is well covered by unit tests, as seen at "Unit Testing and System Testing" on page 452. Once you have found a bug, before you fix it, add to your suite of unit tests (or, if needed, to the suite of system tests) a test or two that would have found the bug if they had been present from the start, and run the tests again to confirm that they now do reveal and isolate the bug; only once that is done should you proceed to fix the bug. By regularly following this procedure, you will soon have a much better suite of tests, learn to write better tests, and gain much sounder assurance about the overall correctness of your code.

Remember, even with all the facilities offered by Python, its standard library, and whatever IDEs you fancy, debugging is still hard. Take this fact into account even before you start designing and coding: write and run plenty of unit tests and keep your design and code simple, so as to reduce to the absolute minimum the amount of debugging you will need! The classic advice in this regard was phrased by Brian Kernighan as follows: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

The inspect Module

The inspect module supplies functions to get information from all kinds of objects, including the Python call stack (which records all function calls currently executing) and source files. The most frequently used functions of module inspect are as follows.

An example of using inspect

Suppose that somewhere in your program you execute a statement such as:

x.f( )

and unexpectedly receive an AttributeError informing you that object x has no attribute named f. This means that object x is not as you expected, so you want to determine more about x as a preliminary to ascertaining why x is that way and what you should do about it. Change the statement to:

try: x.f( )
except AttributeError:
    import sys, inspect
    print>>sys.stderr, 'x is type %s, (%r)' % (type(x), x)
    print>>sys.stderr, "x's methods are:",
    for n, v in inspect.getmembers(x, callable):
        print>>sys.stderr, n,

This example uses sys.stderr (covered in stdin, stdout, stderr on page 171), since it displays information related to an error, not program results. Function getmembers of module inspect obtains the name of all the methods available on x in order to display them. If you need this kind of diagnostic functionality often, package it up into a separate function, such as:

import sys, inspect def show_obj_methods(obj, name, show=sys.stderr.write):
    show('%s is type %s(%r)\n'%(name,obj,type(obj)))
    show("%s's methods are: "%name)
    for n, v in inspect.getmembers(obj, callable):
       show('%s '%n)

And then the example becomes just:

try: x.f( )
except AttributeError:
    show_obj_methods(x, 'x')

Good program structure and organization are just as necessary in code intended for diagnostic and debugging purposes as they are in code that implements your program's functionality. See also "The _ _debug_ _ built-in variable" on page 138 for a good technique to use when defining diagnostic and debugging functions.

The traceback Module

The traceback module lets you extract, format, and output information about tracebacks as normally produced by uncaught exceptions. By default, module traceback reproduces the formatting Python uses for tracebacks. However, module traceback also lets you exert fine-grained control. The module supplies many functions, but in typical use you need only one of them.

The pdb Module

The pdb module exploits the Python interpreter's debugging and tracing hooks to implement a simple command-line-oriented interactive debugger. pdb lets you set breakpoints, single-step on sources, examine stack frames, and so on.

To run some code under pdb's control, import pdb and then call pdb.run, passing as the single argument a string of code to execute. To use pdb for post-mortem debugging (meaning debugging of code that just terminated by propagating an exception at an interactive prompt), call pdb.pm( ) without arguments. When pdb starts, it first reads text files named .pdbrc in your home directory and in the current directory. Such files can contain any pdb commands, but most often they use the alias command in order to define useful synonyms and abbreviations for other commands.

When pdb is in control, it prompts you with the string '(Pdb) ', and you can enter pdb commands. Command help (which you can also enter in the abbreviated form h) lists all available commands. Call help with an argument (separated by a space) to get help about any specific command. You can abbreviate most commands to the first one or two letters, but you must always enter commands in lowercase: pdb, like Python itself, is case-sensitive. Entering an empty line repeats the previous command. The most frequently used pdb commands are the following.

Debugging in IDLE

IDLE, the Interactive DeveLopment Environment that comes with Python, offers debugging functionality similar to that of pdb, although not quite as powerful. Thanks to IDLE's GUI, the functionality is easier to access. For example, instead of having to ask for source lists and stack lists explicitly with such pdb commands as list and where, you just activate one or more of four checkboxes in the Debug Control window to see source, stack, locals, and globals always displayed in the same window at each step.

To start IDLE's interactive debugger, use Debug → Debugger in IDLE's *Python Shell* window. IDLE opens the Debug Control window, outputs [DEBUG ON] in the shell window, and gives you another >>> prompt in the shell window. Keep using the shell window as you normally would; any command you give at the shell window's prompt now runs under the debugger. To deactivate the debugger, use Debug → Debugger again; IDLE then toggles the debug state, closes the Debug Control window, and outputs [DEBUG OFF] in the shell window. To control the debugger when the debugger is active, use the GUI controls in the Debug Control window. You can toggle the debugger away only when it is not busy actively tracking code; otherwise, IDLE disables the Quit button in the Debug Control window.

The warnings Module

Warnings are messages about errors or anomalies that may not be serious enough to be worth disrupting the program's control flow (as would happen by raising a normal exception). The warnings module affords fine-grained control over which warnings are output and what happens to them. You can conditionally output a warning by calling function warn in module warnings. Other functions in the module let you control how warnings are formatted, set their destinations, and conditionally suppress some warnings (or transform some warnings into exceptions).


Module warnings supplies several exception classes that represent warnings. Class Warning subclasses Exception and is the base class for all warnings. You may define your own warning classes; they must subclass Warning, either directly or via one of its other existing subclasses, which are:


Uses deprecated features supplied only for backward compatibility


Uses features whose semantics are error-prone


Uses features whose syntax is error-prone


Other user-defined warnings that don't fit any of the above cases


Python supplies no concrete warning objects. A warning is composed of a message (a text string), a category (a subclass of Warning), and two pieces of information that identify where the warning was raised from: module (name of the module that raised the warning) and lineno (line number of the source code line that raised the warning). Conceptually, you may think of these as attributes of a warning object w, and I use attribute notation later for clarity, but no specific warning object w actually exists.


At any time, module warnings keeps a list of active filters for warnings. When you import warnings for the first time in a run, the module examines sys.warnoptions to determine the initial set of filters. You can run Python with option -W to set sys.warnoptions for a given run. Do not rely on the initial set of filters being held specifically in sys.warnoptions, as this is an implementation aspect that may change in future releases of Python.

As each warning w occurs, warnings tests w against each filter until a filter matches. The first matching filter determines what happens to w. Each filter is a tuple of five items. The first item, action, is a string that defines what happens on a match. The other four items, message, category, module, and lineno, control what it means for w to match the filter, and all conditions must be satisfied for a match. Here are the meanings of these items (using attribute notation to indicate conceptual attributes of w):


A regular expression object; the match condition is message .match( w .message) (the match is case-insensitive).


Warning or a subclass of Warning; the match condition is issubclass( w .category, category ).


A regular expression object; the match condition is module .match( w .module) (the match is case-sensitive).


An int; the match condition is lineno in (0, w .lineno). lineno is 0, meaning w .lineno does not matter, or w .lineno must exactly equal lineno.

Upon a match, the first field of the filter, the action, determines what happens:


w .message is output whether or not w has already occurred.


w .message is output if, and only if, this is the first time w occurs from this specific location (i.e., this specific w. module, w .location pair).


w .category( w .message) is raised as an exception.


w is ignored.


w .message is output if, and only if, this is the first time w occurs from w .module.


w .message is output if, and only if, this is the first time w occurs from any location.


Module warnings supplies the following functions.


"First make it work. Then make it right. Then make it fast." This quotation, often with slight variations, is widely known as "the golden rule of programming." As far as I've been able to ascertain, the quotation is by Kent Beck, who credits his father with it. Being widely known makes the principle no less important, particularly because it's more honored in the breach than in the observance. A negative form, slightly exaggerated for emphasis, is in a quotation by Don Knuth (who credits Hoare with it): "Premature optimization is the root of all evil in programming."

Optimization is premature if your code is not working yet, or if you're not sure about what, exactly, your code should be doing (since then you cannot be sure if it's working). First make it work. Optimization is also premature if your code is working but you are not satisfied with the overall architecture and design. Remedy structural flaws before worrying about optimization: first make it work, then make it right. These first two steps are not optional; working, well-architected code is always a must.

In contrast, you don't always need to make it fast. Benchmarks may show that your code's performance is already acceptable after the first two steps. When performance is not acceptable, profiling often shows that all performance issues are in a small part of the code, perhaps 10 to 20 percent of the code where your program spends 80 or 90 percent of the time. Such performance-crucial regions of your code are known as its bottlenecks, or hot spots. It's a waste of effort to optimize large portions of code that account for, say, 10 percent of your program's running time. Even if you made that part run 10 times as fast (a rare feat), your program's overall runtime would only decrease by 9 percent, a speedup no user would even notice. If optimization is needed, focus your efforts where they'll matter—on bottlenecks. You can optimize bottlenecks while keeping your code 100 percent pure Python, thus not preventing future porting to other Python implementations. In some cases, you can resort to recoding some computational bottlenecks as Python extensions (as covered in Chapter 25), potentially gaining even better performance (possibly at the expense of some potential future portability).

Developing a Fast-Enough Python Application

Start by designing, coding, and testing your application in Python, using available extension modules if they save you work. This takes much less time than it would with a classic compiled language. Then benchmark the application to find out if the resulting code is fast enough. Often it is, and you're done—congratulations! Ship it!

Since much of Python itself is coded in highly optimized C, as are many of its standard and extension modules, your application may even turn out to be already faster than typical C code. However, if the application is too slow, you need to reexamine your algorithms and data structures. Check for bottlenecks due to application architecture, network traffic, database access, and operating system interactions. For typical applications, each of these factors is more likely than language choice to cause slowdowns. Tinkering with large-scale architectural aspects can often speed up an application dramatically, and Python is an excellent medium for such experimentation.

If your program is still too slow, profile it to find out where the time is going. Applications often exhibit computational bottlenecks: small areas of the source code, often between 10 and 20 percent, which account for 80 percent or more of the running time. Then optimize the bottlenecks, applying the techniques suggested in the rest of this chapter.

If normal Python-level optimizations still leave some outstanding computational bottlenecks, you can recode them as Python extension modules, as covered in Chapter 25. In the end, your application will run at roughly the same speed as if you had coded it all in C, C++, or Fortran—or faster, when large-scale experimentation has let you find a better architecture. Your overall programming productivity with this process is not much less than if you coded everything in Python. Future changes and maintenance are easy, since you use Python to express the overall structure of the program and lower-level, harder-to-maintain languages for only a few specific computational bottlenecks.

As you build applications in a given area according to this process, you accumulate a library of reusable Python extension modules. You therefore become more and more productive at developing other fast-running Python applications in the same field.

Even if external constraints eventually force you to recode the whole application in a lower-level language, you're still better off for having started in Python. Rapid prototyping has long been acknowledged as the best way to get software architecture just right. A working prototype lets you check that you have identified the right problems and taken a good path to their solution. A prototype also affords the kind of large-scale architectural experiments that can make a real difference to performance. Starting your prototype with Python allows a gradual migration to other languages by way of extension modules. The application remains fully functional and testable at each stage. This ensures against the risk of compromising a design's architectural integrity in the coding stage. The resulting software is faster and more robust than if all of the coding had been lower-level from the start, and your productivity, while not quite as good as with a pure Python application, is still better than if you had been coding at a lower level throughout.


Benchmarking (also known as load testing) is similar to system testing: both activities are much like running the program for production purposes. In both cases, you need to have at least some subset of the program's intended functionality working, and you need to use known, reproducible inputs. For benchmarking, you don't need to capture and check your program's output: since you make it work and make it right before you make it fast, you are already fully confident about your program's correctness by the time you load-test it. You do need inputs that are representative of typical system operations, ideally ones that may be most challenging for your program's performance. If your program performs several kinds of operations, make sure you run some benchmarks for each different kind of operation.

Elapsed time as measured by your wristwatch is probably precise enough to benchmark most programs. Programs with hard real-time constraints are obviously another matter, but they have needs very different from those of normal programs in most respects. A 5 or 10 percent difference in performance, except for programs with very peculiar constraints, makes no practical difference to a program's real-life usability.

When you benchmark "toy" programs or snippets in order to help you choose an algorithm or data structure, you may need more precision: the timeit module of Python's standard library (mentioned in "Module timeit" on page 484) is quite suitable for such tasks. The benchmarking discussed in this section is of a different kind: it is an approximation of real-life program operation for the sole purpose of checking whether the program's performance at each task is acceptable, before embarking on profiling and other optimization activities. For such system benchmarking, a situation that approximates the program's normal operating conditions is best, and high accuracy in timing is not particularly important.

Large-Scale Optimization

The aspects of your program that are most important for performance are large-scale ones: choice of algorithms, overall architecture, choice of data structures.

The performance issues that you must often take into account are those connected with the traditional big-O notation of computer science. Informally, if you call N the input size of an algorithm, big-O notation expresses algorithm performance, for large values of N, as proportional to some function of N (in precise computer science lingo, this should be called big-Theta, but in real life, most programmers call this big-O, perhaps because a Greek uppercase Theta looks like an O with a dot in the center!).

An O(1) algorithm (also known as a "constant time" algorithm) is one that takes a time that does not grow with N. An O(N) algorithm (also known as a "linear time" algorithm) is one where, for large enough N, handling twice as much data takes about twice as much time, three times as much data three times as much time, and so on, growing proportionally to N. An O(N 2 ) algorithm (also known as a "quadratic time" algorithm) is one where, for large enough N, handling twice as much data takes about four times as much time, three times as much data nine times as much time, and so on, growing proportionally to N squared. Identical concepts and notation are used to describe a program's consumption of memory ("space") rather than of time.

You will find more information on big-O notation, and about algorithms and their complexity, in any good book about algorithms and data structures. Unfortunately, at the time of this writing, there aren't yet any such books that use Python. However, if you are familiar with C, I recommend Mastering Algorithms with C, by Kyle Loudon (O'Reilly).

To understand the practical importance of big-O considerations in your programs, consider two different ways to accept all items from an input iterable and accumulate them into a list in reverse order:

def slow(it):
    result = []
    for item in it: result.insert(0, item)
    return result

def fast(it):
    result = []
    for item in it: result.append(item)
    result.reverse( )
    return result

We could express each of these functions more concisely, but the key difference is best appreciated by presenting the functions in elementary terms. Function slow builds the result list by inserting each input item before all previously received ones. Function fast appends each input item after all previously received ones, then reverses the result list at the end. Intuitively, one might think that the final reversing represents extra work, and therefore slow should be faster than fast. But that's not the way things work out.

Each call to result.append takes roughly the same amount of time, independent of how many items are already in list result, since there is always a free slot for an extra item at the end of the list (in pedantic terms, append is amortized O(1), but I don't cover amortization in this book). The for loop in function fast executes N times to receive N items. Since each iteration of the loop takes a constant time, overall loop time is O(N). result.reverse also takes time O(N), as it is directly proportional to the total number of items. Thus, the total running time of fast is O(N). (If you don't understand why a sum of two quantities, each O(N), is also O(N), consider that the sum of any two linear functions of N is also a linear function of N—and "being O(N)" has exactly the same meaning as "consuming an amount of time that is a linear function of N.")

In contrast, each call to result.insert must make space at slot 0 for the new item to insert by moving all items that are already in list result forward one slot. This takes a time proportional to the number of items that are already in the list. The overall amount of time to receive N items is therefore proportional to 1+2+3+...N-1, a sum whose value is O(N 2 ). Therefore, the total running time of slow is O(N 2 ).

It's almost always worth replacing an O(N 2 ) solution with an O(N) one, unless you can somehow assign rigorous small limits to the input size N. If N can grow without very strict bounds, the O(N 2 ) solution will inevitably turn out to be disastrously slower than the O(N) one for large enough values of N, no matter what the proportionality constants in each case may be (and no matter what profiling tells you). Unless you have other O(N 2 ) or even worse bottlenecks elsewhere that you cannot eliminate, a part of the program that is O(N 2 ) will inevitably turn into the program's bottleneck and dominate runtime for large enough values of N. Do yourself a favor and watch out for the big O: all other performance issues, in comparison, are almost always insignificant.

Incidentally, function fast can be made even faster by expressing it in more idiomatic Python. Just replace the first two lines with the single statement:

result = list(it)

This change does not affect fast's big-O character (fast is still O(N) after the change), but does speed things up by a large constant factor. Often, in Python, the simplest, clearest, most idiomatic way to express something is also the fastest.

Choosing algorithms with good big-O characteristics is roughly the same task in Python as in any other language. You just need a few indications about the big-O performance of Python's elementary building blocks, and I provide them in the following sections.

List operations

Python lists are internally implemented as vectors (also known as dynamic arrays), not as "linked lists." This fundamental implementation choice determines just about all performance characteristics of Python lists, in big-O terms.

Chaining two lists L1 and L2, of length N1 and N2 (i.e., L1+L2) is O(N1+N2). Multiplying a list L of length N by the integer M (i.e., L*M) is O(N*M). Accessing or rebinding any list item is O(1). len( ) on a list is also O(1). Accessing any slice of length M is O(M). Rebinding a slice of length M with one of identical length is also O(M). Rebinding a slice of length M1 with one of different length M2 is O(M1+M2+N1), where N1 is the number of items after the slice in the target list (in other words, such length-changing slice rebindings are very cheap when they occur at the end of a list, and costly when they occur at the beginning or around the middle of a long list). If you need first-in, first-out (FIFO) operations, a list is probably not the fastest data structure for the purpose: instead, try type collections.deque, covered in "deque" on page 173.

Most list methods, as shown back in Table 4-3, are equivalent to slice rebindings and have the same big-O performance. Methods count, index, remove, and reverse, and operator in, are O(N). Method sort is generally O(N*log(N)), but is highly optimized to be O(N) in some important special cases, like when the list is already sorted, reverse-sorted, or sorted except for a few items. range(a,b,c) is O((b-a)/c). xrange(a,b,c) is O(1), but looping on xrange's result is O((b-a)/c).

String operations

Most methods on a string of length N (be it plain or Unicode) are O(N). len(astring) is O(1). The fastest way to produce a copy of a string with transliterations and/or removal of specified characters is the string's method translate. The single most practically important big-O consideration involving strings is covered in "Building up a string from pieces" on page 484.

Dictionary operations

Python dictionaries are internally implemented with hash tables. This fundamental implementation choice determines just about all performance characteristics of Python dictionaries, in big-O terms.

Accessing, rebinding, adding, or removing a dictionary item is generally O(1), as are methods has_key, get, setdefault, and popitem, and operator in. d1 .update( d2 ) is O(len( d2 )). len( adict ) is O(1). Methods keys, items, and values are O(N). Methods iterkeys, iteritems, and itervalues are O(1), but looping on the iterators that those methods return is O(N) (the methods with names that start with iter do save memory compared to their counterparts that return lists, which in turn may make them faster), and looping directly on a dict has the same big-O performance as iterkeys. Never test with if x in d.keys( ). That would be O(N), while the equivalent test if x in d: is O(1) (if d.has_key(x): is also O(1), but is slower than if x in d: and has no compensating advantage).

When the keys in a dictionary are instances of classes that define _ _hash_ _ and equality comparison methods, dictionary performance is of course affected by those methods. The performance indications presented in this paragraph hold only when hashing and equality comparison are O(1).

Set operations

Python sets, like dictionaries, are internally implemented with hash tables. All performance characteristics of sets are, in big-O terms, the same as those of dictionaries.

Adding or removing a set item is generally O(1), as is operator in. len( aset ) is O(1). Looping on a set is O(N). When the items in a set are instances of classes that define _ _hash_ _ and equality comparison methods, set performance is of course affected by those methods. The performance indications presented in this paragraph hold only when hashing and equality comparison are O(1).

Summary of big-O times for operations on Python built-in types

Let L be any list, T any string (plain or Unicode); D any dict; S any set, with (say) numbers as items (with O(1) hashing and comparison) and x any number:


len( L ), len(T), len( D ), len(S), L [i], T [i], D [i], del D[i], if x in D, if x in S, S .add( x ), S .remove( x ), additions or removals to/from the right end of L


Loops on L, T, D, S, general additions or removals to/from L (not at the right end), all methods on T, if x in L, if x in T, most methods on L, all shallow copies

O(N log N)

L .sort in general (but O(N) if L is already nearly sorted or reverse-sorted)


Most programs have hot spots (i.e., regions of source code that account for most of the time elapsed during a program run). Don't try to guess where your program's hot spots are: a programmer's intuition is notoriously unreliable in this field. Use module profile to collect profile data over one or more runs of your program, with known inputs. Then use module pstats to collate, interpret, and display that profile data. To gain accuracy, you can calibrate the Python profiler for your machine (i.e., determine what overhead profiling incurs on your machine). Module profile can then subtract this overhead from the times it measures so that the profile data you collect is closer to reality. Python 2.5 introduces a new standard library module cProfile with similar functionality to profile; cProfile is preferable, since it's faster, which imposes less overhead. Yet another profiling module in Python's standard library is hotshot (covered at http://docs.python.org/lib/module-hotshot.html and present since Python 2.2); unfortunately, hotshot is not compatible with threads.

The profile module

The profile module supplies one function you will often use.


To calibrate profile for your machine, you need to use class Profile, which module profile supplies and internally uses in function run. An instance p of Profile supplies one method you use for calibration.

The pstats module

The pstats module supplies a single class, Stats, to analyze, consolidate, and report on the profile data contained in one or more files written by function profile.run.

Small-Scale Optimization

Fine-tuning of program operations is rarely important. Tuning may make a small but meaningful difference in some particularly hot spot, but hardly ever is it a decisive factor. And yet, fine-tuning, in the pursuit of mostly irrelevant micro-efficiencies, is where a programmer's instincts are likely to lead. It is in good part because of this that most optimization is premature and best avoided. The most that can be said in favor of fine-tuning is that, if one idiom is always speedier than another when the difference is measurable, it's worth getting into the habit of always using the former and not the latter.

Most often, in Python, if you do what comes naturally and choose simplicity and elegance, you end up with code that has good performance as well as clarity and maintainability. In a few cases, an approach that may not be intuitively preferable still does offer performance advantages, as discussed in the rest of this section.

The simplest possible optimization is to run your Python programs using python -O or -OO. -OO makes little direct difference to performance compared to -O, but -OO may save memory, since it removes docstrings from the bytecode, and memory availability is sometimes (indirectly) a performance bottleneck. The optimizer is not very powerful in current releases of Python, but it may still gain you performance advantages on the order of 5 percent, sometimes as large as 10 percent (potentially larger if you make use of assert statements and if _ _debug_ _: guards, as suggested in "The assert Statement" on page 138). The best aspect of -O is that it costs nothing—as long as your optimization isn't premature, of course (don't bother using -O on a program you're still developing).

Module timeit

Standard library module timeit is very handy for measuring the precise performance of specific snippets of code. You can have module timeit use timeit's functionality in your programs, but the simplest and most normal use is from the command line:

python -mtimeit -s'setup statement(s)' 'statement(s)
to be timed'

For example, say you're wondering about the performance of x=x+1 versus x+=1. At some command prompt, you can easily try:

$python -mtimeit -s'x=0' 'x=x+1'
1000000 loops, best of 3: 0.25 usec per loop
$ python -mtimeit -s'x=0' 'x+=1'
1000000 loops, best of 3: 0.258 usec per loop

and find out that performance is for all intents and purposes identical in both cases.

Building up a string from pieces

The single Python "anti-idiom" that's likeliest to kill your program's performance, to the point that you should never use it, is to build up a large string from pieces by looping on string concatenation statements such as big_string += piece. Python strings are immutable, so each such concatenation means that Python must free the M bytes previously allocated for big_string, and allocate and fill M+K bytes for the new version. Doing this repeatedly in a loop, you end up with roughly O(N 2 ) performance, where N is the total number of characters. More often than not, O(N 2 ) performance where O(N) is available is a performance disaster. On some platforms, things may be even bleaker due to memory fragmentation effects caused by freeing many memory areas of progressively larger sizes.

To achieve O(N) performance, accumulate intermediate pieces in a list rather than build up the string piece by piece. Lists, unlike strings, are mutable, so appending to a list has O(1) performance (amortized). Change each occurrence of big_string += piece into temp_list .append( piece ). Then, when you're done accumulating, use the following to build your desired string result in O(N) time:

big_string = ''.join( 

Using a list comprehension, generator expression, or other direct means (such as a call to map, or use of standard library module itertools) to build temp_list may often offer further optimization over repeated calls to temp_list .append. Other O(N) ways to build up big strings, which some Python programmers find more readable, are to concatenate the pieces to an instance of array.array('c') with the array's extend method, or to write the pieces to an instance of cStringIO.StringIO.

In the special case where you want to output the resulting string, you may gain a further small slice of performance by using writelines on temp_list (never building big_string in memory). When feasible (i.e., when you have the output file object open and available in the loop), it's just as effective to perform a write call for each piece, without any accumulation.

Although not nearly as crucial as += on a big string in a loop, another case where removing string concatenation may give a slight performance improvement is when you're concatenating several values in an expression:

oneway = str(x)+' eggs and '+str(y)+' slices of '+k+' ham'
another = '%s eggs and %s slices of %s ham' % (x, y, k)

Using operator % for string formatting is often a good performance choice.

Searching and sorting

Operator in, the most natural tool for searching, is O(1) when the righthand side operand is a set or dictionary, but O(N) when the righthand side operand is a string, list, or tuple. If you need to perform many searches on a container, you're generally much better off using a set or dictionary, rather than a list or tuple, as the container. Python sets and dictionaries are highly optimized for searching and fetching items by key.

Method sort of Python lists is also a highly optimized and sophisticated tool. You can rely on sort's performance. Performance dramatically degrades, however, if you pass sort a custom callable to perform comparisons in order to sort a list based on anything but built-in comparisons. To satisfy such needs, consider using the decorate-sort-undecorate (DSU) idiom instead. In spelled-out form, this idiom has the following steps:


Build an auxiliary list A where each item is a tuple made up of the sort keys, ending with the item of the original list L or with the item's index.


Call A.sort( ) without arguments.


Extract the items in order from the now-sorted A.

The decorate and undecorate steps are often handily performed with list comprehensions. If you need the sort to be in-place, assign the final sorted list to L[:]. Otherwise, DSU provides a sorted copy, without disturbing the original list L.

For example, say we have in L a large list of strings, each of at least two words, and we want to sort L in-place by the second word of each string:

A = [ (s.split( )[1], s) for s in L ]
A.sort( )
L[:] = [ t[1] for t in A ]

This is much faster than passing to L.sort a function that compares two strings by their second words, as in:

def secondword(s): return s.split()[1]

On a series of benchmarks with Python 2.4 on lists of 10,000 strings, I measured the DSU version as about five times faster than the non-DSU one.

A particularly fast and effective way to use DSU is to specify a named argument key= to sort, a possibility that was introduced in Python 2.4. Module operator supplies functions attrgetter and itemgetter that are particularly suitable for this use. The fastest way to perform the task mentioned above is:

import operator L.sort(key=operator.itemgetter(1))

On the same benchmarks, this is another five times faster than the plain DSU version.

Occasionally, you may avoid the need for sorting by using heaps, covered in "The heapq Module" on page 177.

Avoiding exec and from...import *

Code in a function runs faster than code at the top level in a module because access to a function's local variables is optimized to be very fast. If a function contains an exec statement without explicit dictionaries, however, the whole function slows down. The presence of such an exec statement forces the Python compiler to avoid the modest but important optimizations it normally performs regarding access to local variables, since the exec might cause any alteration at all to the function's namespace. A from statement of the form:

fromMyModule import *

wastes performance, too, since it also can alter a function's namespace unpredictably.

exec itself is also quite slow, and even more so if you apply it to a string of source code rather than to a code object. By far the best approach—for performance, for correctness, and for clarity—is to avoid exec altogether. It's most often possible to find better (faster, more robust, and clearer) solutions. If you must use exec, always use it with explicit dictionaries. If you need to exec a dynamically obtained string more than once, compile the string just once and then repeatedly exec the resulting code object.

eval works on expressions, not on statements; therefore, while still slow, it avoids some of the worst performance impacts of exec. With eval, too, you're best advised to use explicit dictionaries. If you need several evaluations of the same dynamically obtained string, compile the string once and then repeatedly eval the resulting code object.

See "Dynamic Execution and the exec Statement" on page 328 for more details and advice about exec, eval, and compile.

Optimizing loops

Most of your program's bottlenecks will be in loops, particularly nested loops, because loop bodies tend to execute repeatedly. Python does not implicitly perform any code hoisting: if you have any code inside a loop that might be executed just once by hoisting it out of the loop, and the loop is a performance bottleneck, hoist the code out yourself. Sometimes the presence of code to hoist may not be immediately obvious:

def slower(anobject, ahugenumber):
    for i in xrange(ahugenumber): anobject.amethod(i)
def faster(anobject, ahugenumber):
    themethod = anobject.amethod
    for i in xrange(ahugenumber): themethod(i)

In this case, the code that faster hoists out of the loop is the attribute lookup anobject.amethod. slower repeats the lookup every time, while faster performs it just once. The two functions are not 100 percent equivalent: it is (barely) conceivable that executing amethod might cause such changes on anobject that the next lookup for the same named attribute fetches a different method object. This is part of why Python doesn't perform such optimizations itself. In practice, such subtle, obscure, and tricky cases happen quite seldom; you're quite safe performing such optimizations yourself, to squeeze the last drop of performance out of some crucial bottleneck.

Python is faster with local variables than with global ones. If a loop repeatedly accesses a global whose value does not change between iterations, cache the value in a local variable and have the loop access the local instead. This also applies to built-ins:

def slightly_slower(asequence, adict):
    for x in asequence: adict[x] = hex(x)
def slightly_faster(asequence, adict):
    myhex = hex
    for x in asequence: adict[x] = myhex(x)

Here, the speedup is very modest, on the order of five percent or so.

Do not cache None. None is a keyword, so no further optimization is needed.

List comprehensions can be faster than loops, and so can map and filter. For optimization purposes, try changing loops into list comprehensions or map and filter calls where feasible. The performance advantage of map and filter is nullified if you have to use a lambda or an extra level of function call. Only when you pass to map or filter a built-in function, or a function you'd have to call anyway even from an explicit loop, do you stand to gain.

The loops that you can replace most naturally with list comprehensions, or map and filter calls, are ones that build up a list by repeatedly calling append on the list. The following example shows this optimization in a micro-performance benchmark script:

import time, operator

def slow(asequence):
    result = []
    for x in asequence: result.append(-x)
    return result

def middling(asequence):
    return map(operator.neg, asequence)

def fast(asequence):
    return [-x for x in asequence]

biggie = xrange(500*1000)
tentimes = [None]*10
def timit(afunc):
    lobi = biggie
    start = time.clock( )
    for x in tentimes: afunc(lobi)
    stend = time.clock( )
    return "%-10s: %.2f" % (afunc._ _name_ _, stend-start)

for afunc in slow, middling, fast, fast, middling, slow:
    print timit(afunc)

Running this example with Python 2.4 on my laptop shows that fast takes 3.62 seconds, middling 4.71 seconds, and slow 6.91 seconds. In other words, on this machine, slow (the loop of append method calls) is about 47 percent slower than middling (the single map call), and middling, in turn, is about 30 percent slower than fast (the list comprehension). The list comprehension is the most direct way to express the task being micro-benchmarked in this example, so, not surprisingly, it's also fastest—almost two times faster than the loop of append method calls.

Optimizing I/O

If your program does substantial amounts of I/O, it's likely that performance bottlenecks are due to I/O, not to computation. Such programs are said to be I/O-bound, rather than CPU-bound. Your operating system tries to optimize I/O performance, but you can help it in a couple of ways. One such way is to perform your I/O in chunks of a size that is optimal for performance, rather than simply being convenient for your program's operations. Another way is to use threading.

From the point of view of a program's convenience and simplicity, the ideal amount of data to read or write at a time is often small (one character or one line) or very large (an entire file at a time). That's often okay: Python and your operating system work behind the scenes to let your program use convenient logical chunks for I/O, while arranging physical I/O operations with chunk sizes more attuned to performance. Reading and writing a whole file at a time is quite likely to be okay for performance as long as the file is not very large. Specifically, file-at-a-time I/O is fine as long as the file's data fits comfortably in physical memory, leaving ample memory available for your program and operating system to perform whatever other tasks they're performing at the same time. The hard problems of I/O-bound performance tend to come with huge files.

If performance is an issue, don't use a file's readline method, which is limited in the amount of chunking and buffering it can perform. (Using writeline, on the other hand, gives no performance problem when that method is convenient for your program.) When reading a text file, loop directly on the file object to get one line at a time with best performance. If the file isn't too huge, and so can conveniently fit in memory, time two versions of your program—one looping directly on the file object, the other calling readlines to read the whole file into memory. Either may prove faster.

For binary files, particularly large binary files whose contents you need just a part of on each run of your program, module mmap (covered in The mmap Module" on page 360) can often give you both good performance and program simplicity.

Making an I/O-bound program multithreaded may sometimes afford substantial performance gains if you can arrange your program's architecture accordingly. Start a few worker threads devoted exclusively to I/O, have the computational threads request I/O operations from the I/O threads via Queue instances, and post the request for each input operation as soon as you know you'll eventually need that data. Performance will increase only if there are other tasks your computational threads can perform while I/O threads are blocked waiting for data. Basically, you get better performance this way if you can manage to overlap computation and waiting for data by having different threads do the computing and the waiting. (See "Threads in Python" on page 341 for detailed coverage of Python threading and a suggested architecture.) On the other hand, if a substantial fraction of your I/O is on the Net, an even faster and definitely more scalable approach is to eschew threads in favor of asynchronous (event-driven) architectures, as covered in "Event-Driven Socket Programs" on page 533.

If you enjoyed this excerpt, buy a copy of Python in a Nutshell