Great article, Jeremy! I recently had to jump into the profiler module for a simulator interface I was running. *My* bottleneck was in reading back data from the simulator. I would read a character back, append to a buffer, and check (with a regex!) to see if the buffer ended with the simulator "ready" prompt. Quick and dirty to write, but when you're getting 1-2k replies from the simulator, that's a rediculous # of regex searches. I ended up fixing it with buffered reads (using os.read, not file.read) and replacing regex searches with slicing off the end of the buffer and using string comparison rather than a regex. End result? 30x improvement in overall runtime.
One thing that I did run into was the fact that my bottleneck was actually in the search function in the re module, which happens to be a C function. Just goes to show that your bottleneck need not be in Python at all, I guess.