oreilly.comSafari Books Online.Conferences.


NBCi Accesses and Maintains Complex Databases with Perl

by Howard Wen

Wrangling the enormous amount of information stored in NBCi's databases is easy with Perl. Perl was already an important component in the company's operations when the NBCi site was formed by merging,, and the online division of NBC. "When I started out with, Perl was heavily used for most database applications, CGI, and wherever else it made sense under time constraints," explains Reed Sandberg, who works as a database engineer for the infrastructure software division of NBCi.

NBCi uses Perl because Perl is an open source programming language. Sandberg says this is one of Perl's greatest strengths, particularly when it comes to getting a job done quickly: "Before I write some common utility, I always check CPAN. In many cases, I can just download someone else's work and use it as is."

Most of the applications he writes interface with a database of some kind, and sometimes multiple databases, with each having its own server vendor (such as Sybase and Oracle). Many of these applications synchronize information (which consists mainly of Web site user information and calendar events) between the databases. Additionally, the company's direct marketing data warehouse needs to be in constant sync with its production database, which in turn is used to capture a Web site user's input for sign-ups and site activity. Sandberg heavily utilizes the DBI module of Perl to accomplish these goals.

Part of NBCi's business involves developing highly targeted email campaigns. To pull this off, the company has to access a parallel cluster of several Linux servers, each running Oracle and all "glued" together by Perl. Essentially, Perl is used to make several database servers run in parallel to create one giant, virtual server. Technically, it's more akin to a distributed system. This involves transferring data among the servers by sending signals in a master-slave fashion. Perl was also used to create a CGI Web interface to help administer this system.

Perl also allowed Sandberg to design this entire setup to be scalable so that, as more members sign up, additional Linux servers can be added to the system. "DBI is indispensable for this kind of work," he says. "I can't imagine writing database apps anymore without having regex's at my fingertips."

Another large-scale Perl application that Sandberg maintains allows clients to define a data file through a CGI Web site and submit it. From there, other Perl applications process and load the files into a database on the back-end, according to the client's definitions.

Sandberg is interested in the possibility of using a neural network program written in Perl. It's the one module he'd most like to see developed since it would be an asset when used with an application (under development at NBCi), which will manage NBCi's database assets by treating them as perishable inventory. "We'd attempt to use neural networks for predicting the signup/unsubscribe rates of our members, in addition to predicting certain aggregate member behaviors," he says.

For developers who are considering using Perl to tackle a database task as complex as those regularly undertaken by the infrastructure software division of NBCi, Sandberg is encouraging. "For extracting and translating data, there's nothing better than using DBI and having those regex's at your fingertips. One popular commercial ETL tool, Informatica, couldn't even get by without having Perl as an extension language," he says. "Except maybe where performance is absolutely critical, DBI can replace PL/SQL and other SQL procedural languages. "Perl lets you be 'bad' for those times you need to throw together a temporary tool quickly and don't expect scalability. Perl also lets you be 'good' for those bulletproof apps, which need to scale and last for years."

Sandberg says Perl is great for handling administrative emergencies: "During development, the target system I mentioned had a runaway process on each machine. Instead of logging in to each machine to find and kill the offending processes, I simply wrote a short Perl script to do the job quickly." He admits, however, that the script he wrote for this "could be considered bad programming because no care was taken to declare variables or to structure the code into sub-routines and such. But Perl allows this kind of reckless behavior. An equivalent program in C would have been impractical for the amount of time spent developing it."

This advantage of Perl can be a disadvantage, too. It's tempting to become complacent with one's programming because of all it can do for you. "Perl may let you fall into the trap of using bad programming styles and practices, especially if you haven't learned good habits with other, perhaps stricter, languages," he says. "This is arguably why Perl is more sophisticated than some of the other programming languages. I consider Perl a 'jazz style' language--you should know the rules before attempting to break them."

But Sandberg says that the best advice he can give when using Perl for any task is to first search on CPAN, or in the many modules that come with Perl, for an application similar to the one you're planning to program. "Chances are you can simply download, install, and take the rest of the day off," he says.

Sponsored by: