oreilly.comSafari Books Online.Conferences.


O'Reilly Open Source Convention

Perl Program Repair Shop and Red Flags

by Mark-Jason Dominus

Mark-Jason Dominus will be speaking at the O'Reilly Open Source Convention in San Diego, CA, July 23-27, 2001.

In this year's Perl track, I'll be doing a new class, called "Perl Program Repair Shop and Red Flags." In the class, I look at some examples of a beginner's code, and then I go through the code, a bit at a time, explaining what is wrong with it and how to make it better. At the end, the program does the same thing, but is shorter, easier to understand, more robust, and more flexible. Hence "Program Repair Shop."

Most years, I have to slave over new tutorials. But this one almost wrote itself once I found the right examples. I'd start with someone's 71-line Perl program, and go over it line by line. Each time I came to a line that was unnecessary, I'd take it out and write a slide about it. The result? 39 slides and a 32-line program.

That's not an exaggeration. Beginning writers will write ten words where five will do. Beginning programmers are the same. I wasn't expecting to be able to reduce all the example code by 50 percent, but I did.

Most of the reduction isn't even very clever, but that's the whole point of the class: You don't have to be clever to write reasonably good code. The class concentrates on a few bad practices that are easy to recognize, easy to avoid, and which have a big payoff if you do avoid them. I call this the "low-hanging fruit approach."

Red flags

Each bad practice is accompanied by a "red flag" that sticks out of your code. Once you learn to recognize a few red flags, you can stop every time you see one and ask yourself if you are making a mistake. Often you are, and then you have an easy opportunity to improve the code.

The wonderful thing about most red flags is that you can see them. For example,

# typical data: /dev/dsk/c0t0d0s0   31199  13477  16111   46%   /

if (m|(/dev/\w+(/\w+)?)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)|) {
        print "$1,$2,$3,$4,$5,$6\n";

This is a common pattern, and it's a red flag -- you can see it waving. Whenever you have lots of \s+ alternating with (\d+) or (\w+), you should stop and ask yourself "Should I be using split() here instead?" About 90 percent of the time, the answer will be "yes":

print join ",", split();

This isn't exactly the same, but the change fixes the bug. We lost the matching condition, but now it's easy to replace it with a meaningful condition, such as:

# typical data: /dev/dsk/c0t0d0s0  31199  13477  16111  46%  /
my @a = split;
if (@a == 6 && $a[0] =~ m{^/dev}) {
  print join ",", @a;

O'Reilly Open Source Convention Featured Speaker

Mark-Jason Dominus will be presenting the tutorial, Perl Program Repair Shop and Red Flags, and other sessions at the O'Reilly Open Source Convention in San Diego, CA, July 23-27, 2001. Rub elbows with Open Source leaders while relaxing on the beautiful Sheraton San Diego Hotel and Marina waterfront. For more information, visit our conference home page or see our pre-conference coverage.

Real examples

The wonderful thing about using real examples in the class is that they are so terrible. I would never have had the presumption to make up anything as bad as the code I show in the class or in this article. I would have said to myself "Oh, nobody will ever believe that" and then I would have toned it down.

Often I would want to make a particular point, and while I would balk at inventing an example that perfectly illustrated my point (because I know that things never work out so conveniently in the real world), it usually wouldn't take long to find 20 examples from Usenet that did perfectly illustrate the point. Real examples display a wretchedness that goes far beyond anything I can imagine. Perhaps this just shows it has been a long time since I did any professional maintenance programming.

Here's an unusually apt example I found recently:

$sth = $dbh->prepare("SELECT * FROM info");
my($one, $two, $three, $four, $five, $six, $seven, $eight,
        $nine, $ten, $eleven, $twelve, $thirteen, $fourteen, 
        $fifteen, $sixteen); 
$sth->bind_columns(undef, \$one, \$two, \$three, \$four,
        \$five, \$six, \$seven, \$eight, \$nine, \$ten, \$eleven,
        \$twelve, \$thirteen, \$fourteen, \$fifteen, \$sixteen); 
while ($sth->fetch) {
  $SqlStatement4 = ("INSERT INTO Customer_Information " .
"(CustomerNumber, Name, Address1, Address2, City, State, ZipCode, 
Email, DefaultPassword) " .
 "VALUES ('$one', '$two', '$three', '$four', '$six', '$seven', 
'$eight', '$nine', '$sixteen') "); 

The big red flag here is the profusion of $one, $two, $three variables. (It seems incredible that anyone could ever have written this, but someone did. Most folks go for $x1, $x2, $x3, $x4, but this guy is unusually "gifted.") The first thing to do here is to replace these nasty little insects with an array. That prevents the use of bind_columns, but that's OK, because with the array, we no longer need bind_columns:

  $sth = $dbh->prepare("SELECT * FROM info");
  while (my @data = $sth->fetchrow_array) {
    $SqlStatement4 = ("INSERT INTO Customer_Information " .
  "(CustomerNumber, Name, Address1, Address2, City, State, 
    ZipCode, Email, DefaultPassword) " .
  "VALUES ('$data[1]', '$data[2]', '$data[3]', '$data[4]', 
    '$data[6]', '$data[7]', '$data[8]', '$data[9]', '$data[16]') "); 

Pages: 1, 2

Next Pagearrow

Sponsored by: