ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Learning to Count on Perl at the Census Bureau

by Ed Stephenson
01/02/2001

It's no secret that Perl is now at work in some of our more stolid institutions, such as the U.S. Census Bureau, but it may surprise a few to learn just how long it's been in use there. Like many government agencies, the bureau leaves the selection of programming tools up to their techies, and more than a few in the back rooms of bureaucracy cut their teeth on Perl and a whole line of open source solutions.

Rachael LaPorte Taylor, for one, has used Perl for nearly eight years. While she's done her share of programs that search and retrieve static lists of demographic data, her two most recent Web applications gather and process time-sensitive information from hundreds, and sometimes thousands, of sources on a daily basis.

For Taylor, the Senior Internet Technology Architect who oversees system administration, content management, and application development for some of the servers at the bureau, there's no question as to preference. "I've been using open source since I got into the business, which was in '92," she says, matter-of-factly. "I'm personally comfortable using these solutions because they work so well--seamlessly--together. Some areas of the government are a little reluctant to use open source, others aren't. I've been lucky. Perl has always been my first choice, and pretty much the first choice in the office where I work."

The two critical Web applications she helped develop earlier this year--which put Perl, Apache, Linux, MySQL, and PHP through their paces--have made quite an impact within government circles. One site, rates.census.gov, was the key part of a promotion for Census 2000 that made it the most successful census in 30 years, resulting in a budget surplus. And the second one, an interagency site named www.fedstats.gov/imf, displays all the country's leading financial and economic indicators at a glance, and updates them twice daily. It's a site so powerful and concise that other subscriber nations to the International Monetary Fund (IMF) have recently requested assistance in setting up similar systems.

Why Perl?

As is the case with others in her office, Taylor's path to system administration was indirect. When she joined the Census Bureau 13 years ago, straight out of college with degrees in marketing and applied statistics, she worked as a Survey Statistician for several years until the bug bit her. "When I got Internet access in the early '90s, I found myself spending more and more of my spare time online," she recalls. "So I decided I might as well get paid for it."

Why did she learn Perl? "Basically that was the language you used if you wanted to have dynamic Web pages on the Internet using CGI," she explains, as if the answer were obvious. "And then for various system administration tasks, that was the tool that you use."

Her primary responsibility at the Census Bureau is to oversee FedStats, the Web's "one-stop shopping for Federal statistics." Through this site, users can view profiles of every state, county, and congressional and judicial district in the U.S.; with statistics on agriculture, population, business, crime, energy, and the environment. FedStats doesn't actually maintain the information, but pulls specific stats from the Web servers of 70 government agencies, and several independent sources.

Of course, due to the way this data is gathered, much of the information on FedStats tends to be a few years old. That's not the case with the two Web applications Taylor developed this year.

Better than Expected

The Census 2000 Initial and Final Response Rates Web site was part of a nationwide promotional campaign called "How America Knows What America Needs," a.k.a. HAKWAN, to encourage public cooperation in returning census information. Since 1970, when 78 percent of American households participated, the Census Bureau has witnessed a drastic decline in people returning questionnaires. In 1980, the response rate was 75 percent; in 1990, it was only 65 percent. Many predicted a 61 percent response this year.

The purpose of Taylor's Web site was to bolster the Census Bureau's "90 Plus 5" challenge to local governments. "We challenged government entities to meet and exceed their 1990 response rate by 5 percent," she explains. "And they used the Web site to keep track of how well they were doing and then, of course, tied that into their local promotional efforts. Anybody--newspapers, the public, government officials--could look up and see how their county was doing, or their city or town. And it galvanized their community to respond."

Once HAKWAN and the Web site were unveiled last March, the bureau received daily responses to the census questionnaire from 38,000 local governmental entities, representing 60,000 "interim census tracts" for cities and counties across the country. The Decennial processing office generated a daily file with response rates and housing unit counts for aggregated geographic areas, then Taylor and her Perl program would process that file and post the new information on rates.census.gov every evening.

"We built an interface using both Perl and PHP to answer queries from the public and generate pages with the rates," adds Taylor. "While the Census was being conducted, the site averaged 55,000 hits and 44,000 file transfers per day."

The effort paid off better than expected. Using mail-back questionnaires, "Be Counted" forms, and a first-ever census form on the Internet (also written in Perl), 67 percent of American households responded to Census 2000. It was the first time in history that the response rate improved over the prior census--this despite public concern about the "intrusive" questionnaire. As a result, the "Non-Response Follow Up Universe" was reduced, requiring fewer visits by census takers, and the Census Bureau was able to hand a $305 million budget surplus back to Congress.

Tremendous Backend Effort

After months of processing interim response numbers, the bureau posted final response rates for states, communities, and the nation on the census Web site in September. The work of rates.census.gov was completed. But the heat wasn't off Taylor. As Census 2000 moved ahead, she was also busy preparing for the launch of FedStat's IMF (International Monetary Fund) Web page last August.

"This was a major effort by the Census Bureau and other federal agencies," Taylor says of the IMF page. "Federal statistical agencies and some private organizations produce so many statistics that you just don't know where to get them. This page provides you with a central point to get at least the indicators."

Indeed, a quick scroll down the FedStats/IMF page reveals a wealth of information never before available in so concise a format. The International Monetary Fund recently mandated that each subscriber to its Dissemination Standards Bulletin Board (DSBB) establish a Web page containing the current national summary of its economic and financial data. So far, the U.S. is the only country that has established an automated system, and it's easy to see why.

Behind those placid data tables is a tremendous backend effort. Using Perl's LWP and DBI modules, the FedStats/IMF program goes onto the Web twice a day, and grabs source files from Web sites of 14 different statistical agencies. The program then performs some simple edit checks and updates a MySQL database. "That's most of the work," she says. "What you see in your browser is PHP."

Though the source files are already structured in a specific format, defined a few years ago the White House development team for use on their Briefing Rooms site, they are located on servers running a variety of operating systems. Perl was a natural choice for the backend program because of its portability, and since several agencies fund the project, Taylor notes, cost was also a consideration.

There's Always an Answer

Of course, she didn't really need an excuse to use Perl for either the IMF or the rates.census.gov projects. "I've used other programming languages, but Perl, for my purposes, does the trick," Taylor remarks. "Perl is easy to learn and easy to use, and there's a large support community available. If you do have problems, or run into some sort of snag, you can usually always find an answer. In addition, there's just an abundance of modules out there that already do what you'd like to do."

You could say that Rachael LaPorte Taylor, and many others in the Census Bureau, learned to count on Perl.


  • Learn how large and small companies are putting Perl to work by reading more Perl Success Stories.
  • Visit perl.oreilly.com for a complete list of O'Reilly's Perl books and www.perl.com for the latest news and CPAN updates.




  • Sponsored by: