oreilly.comSafari Books Online.Conferences.


Perl is Instrumental in Data Warehouse for Italy's Top Bank

by Lori Houston

UniCredito Italiano , one of the largest bank groups in Italy, recently adopted an immense enterprise relationship management system. The data migration for that system was primarily written in Perl and heavily uses the Perl DBI. UniCredito Italiano urgently needed the enterprise relationship management system (ERM) to more effectively market to a rapidly growing customer base, which now exceeds five million customers.

UniCredito Italiano (UCI) is a 2,600-branch group of banks with roots tracing back to Renaissance times and the old merchant bankers of Milan. Changes in Europe's monetary market have opened up unprecedented new opportunities for UCI, whose network now includes banks in Poland, Bulgaria, Croatia, and Slovakia. UCI's new ERM system enables the bank to more effectively segment and cross-reference its customer database for better targeted, customer-focused marketing, with an unprecedented degree of detail--and savings.

Closing the sale on such an extensive project was no small feat for the system's developer, iXL, Inc. Perl is one of the main reasons iXL's development team managed to deliver the 300-plus gigabyte system to UCI in six months, according to Bob Doucette, iXL's lead developer on the UCI project. They were able to do this by rapidly developing a pilot system built almost entirely with Perl. "The client is very pleased and has signed on for a series of additional phases over the next several years, which will result in a 3.4 terabyte data warehouse for marketing," Doucette says. "Considering the complexity of business rules and the volume of legacy data, we never could have delivered as robust a system as quickly without using Perl and the DBI."

Developing an ERM system requires collecting as much information as possible about the company's clients and customers. All this data is compiled into a single, integrated database, from which customer information is extracted for marketing promotions and campaigns. In years past, this direct marketing largely relied on costly campaigns driven from externally compiled mailing lists purchased from third parties.

That style of direct marketing, with promotional mailings directed to an overly broad target audience, often resulted in less than a one-percent response rate. So it was inefficient and very expensive. By contrast, the ERM systems created by iXL use sophisticated database mining and predictive modeling techniques to generate focused customer-appropriate campaigns at a fraction of the traditional cost, and with a much higher success rate.

Perl Crunches the Data

iXL implemented the working prototype for UCI's ERM system on a six CPU Sun E5500 server with more than 300 gigabytes of storage, running Sun Solaris 5.6, Oracle 8.0.5, and Perl 5.005_03. "We're a couple of sub-versions back from the current Perl 5.6, but we had already implemented and fully tested the code in a development system. We didn't want to change Perl versions just prior to moving into production, even though I was confident the Perl code would continue to work." Doucette explains.

During development, his team focused on writing reusable, generic routines with Perl DBI 1.09, a generic front-end API for running embedded SQL in Perl, and the Oracle DBD, a back-end driver for interfacing the DBI with Oracle. A big advantage with the DBI/DBD package is that by using different back-end drivers to plug into different databases, a lot of code can be reused across different projects regardless of the database vendor selected. "The options and the portability in that design are amazing because we don't have to develop as much vendor-specific code," Doucette says. "We used the Perl DBI release from last mid-summer partly because we needed to freeze the code. It will be updated going forward.

To learn more about the Perl DBI, you can read Programming the Perl DBI by Tim Bunce and Alligator Descartes.

"I've actually been keeping close tabs on Tim Bunce (the Perl DBI's main creator) and the DBI. I really wanted to run the DBI through its paces. On the UCI project I finally got the chance to do that, and the results and performance were much better than even I anticipated. Not only was Perl an appropriate choice for this prototype system, but there is also clearly a great deal of headroom for scalability. There is a long way to go before we will even consider porting to another development language on this system."

The initial project contract called for delivering a 200 gigabyte Oracle-based marketing database, but with scope increase during the course of development, the system was expanded to more than 300 gigabytes. Doucette and one other developer wrote most of the code. "Instead of writing vendor-specific code in C or C++, we used the Perl DBI and wrote everything in generic embedded SQL with Perl, with the exception of some time-critical key-generation routines. I wrote those in C, and then interfaced them to Perl as external subroutines," he says.

The result produced performance equivalent to C or C++ based systems, which Doucette also attributes to the database design effort. "Some of the design choices we made allowed us to eliminate a lot of expensive look-ups, so throughput was very quick. That's a non-Perl issue, but using Perl in the data migration meant less custom code," he explains. "The same project written entirely in C would have required more developers and likely would have taken several additional months at least--time we and the client didn't have."

Shortened Development Time, Reusable Code

Most of iXL's large-scale, commercial ERM projects involve substantial up-front client investments in hardware and software--and substantial time commitments. Because even Fortune 500 companies are understandably reluctant to make such large investments, such a full ERM system as an initial delivery is often a hard sell. "These clients want to see a quick return on their investment--a prototype they can work with and demonstrated usefulness very quickly," Doucette says. "But building one of these systems is complex. Apart from deriving and applying the client's business rules, the systems usually involve terabytes of legacy data, hundreds of gigabytes of disk, and so forth. It's millions of dollars of hardware and software."

To mitigate the risks on both sides, iXL devised the Early Value™ system used in the UCI project's first phase. Instead of going in and completely custom developing everything the client could possibly desire, Doucette's team gave UCI selections from an a la carte menu of more standardized data models, components, and services for building the system's architecture. For the prototype, UCI selected a limited set of options, which enabled iXL to guarantee prompt delivery of UCI's system and assure UCI that the first marketing campaign would be "out the door" in just over six months.

"We wrote fairly generic SQL, targeted towards a fairly standardized data model. This allowed us to get into the bank and quickly demonstrate some value. We showed them what we could do and that we could do it effectively, thoroughly, and quickly," Doucette says. "Because of our choice to use Perl and the DBI, there was a big difference in the quality of the system we delivered to the client in such a short time frame, and, as a direct result, in how the client perceived us." UniCredito Italiano now plans to extend the working prototype in a series of phases over the next several years into a full function, marketing data warehouse of approximately 3.4 terabytes.

iXL can reuse the code generated in the UCI project to quickly produce similar systems for other clients, even with databases from different vendors. "Going forward, for other rapid prototype deliveries, we have a well-tested library of embedded SQL using Perl and DBI that runs against a standardized data model. I'm estimating we'll probably be able to reuse 70 to 75 percent of the existing code without change," Doucette says. "That's a huge savings on our investment of development time and resources, not to mention a tremendous and repeatable savings each time we roll out a new prototype system. This allows us--and this is the key to all of this--to deliver better quality in less time and to meet if not exceed our clients' expectations. Greater code reuse leading to better quality and more efficient delivery to our clients. It's a big win for iXL and our clients."

Leveraged Development with Perl and Perl DBI

As a lead software engineer for iXL's Technical Resource Group, part of Doucette's job involves working on development methodology and standardization across multiple ERM projects. He estimates that Perl reduced development time on UCI's initial system by 40 percent, easily. "There were certain elements we originally expected to write in C, but when we saw just how well the DBI worked, we wrote almost everything in embedded SQL with Perl using DBI.

"This is also a huge success story for DBI. Tim Bunce has put together a fantastic, commercial-grade product that allows developers to write much more portable, reusable code," Doucette says. "For years I have been aware of the ability to write very portable Perl code. For example, I was able to port a Perl-based UNIX job scheduler I wrote to a number of different UNIX systems with no more than an FTP. There was no re-coding whatsoever. Using DBI, I fully expect to be able to similarly leverage our embedded SQL code investment on the next Early Value™ system that we deliver."

Data migration is one of Doucette's primary areas of responsibility. He spends most of his time moving large amounts of data from legacy mainframes onto open client-server boxes, then reworking that data. This involves data cleansing/scrubbing/filtering/converting, often referred to as ETL for extraction, translation, and load. As a C and C++ programmer, Doucette often implements custom code that applies client business rules to convert large volumes of data into relational database form. Once a database is built, iXL provides a wide range of specialized marketing services to its clients for extrapolating customer-focused marketing information.

Evangelizing Perl

Traditionally, getting large volumes of client data into a relational database usually involves custom development or the use of expensive, overrated ETL tools that seldom work as advertised. "That's the reason why we like Perl so much," Doucette says. "I use Perl to manipulate, scrub, cleanse, convert, validate, filter--whatever you like to call it--gigabytes of data. I use Perl to rework large volumes of both binary and textual data from variable- and fixed-length records, and then to get these into whatever form required. I also have several dozen utilities that I've written in Perl and use regularly to do a wide variety of data analysis and reporting. Rarely a day goes by that I don't use Perl for something.

"There is a misconception that Perl is not useful with large data volumes. Generally that's just not true," Doucette counters. "I work with larger data volumes on a regular basis than some programmers work with in their entire career. Often the performance improvement gained by developing a utility in, for example, C (in comparison to developing a similar well-written Perl-based utility) does not warrant the tremendous increase in development time.

"Where's the gain if I can process a gigabyte of complex data with a C-based program that runs in fifteen minutes but takes me a week to code and test? I can often write a corresponding Perl process in an afternoon to do the very same thing that runs in maybe twice the time, but I deliver a result to the client much, much sooner. That's the key.

"I am not saying that there are not time-critical routines that are better to write in lower-level languages. But that doesn't mean everything has to be written in that same lower-level language. The UCI project is a great example. There I developed a series of highly optimized, intelligent key generation routines in C. We had to generate a large number of unique keys on the fly across billions of rows of data. And the speed of the key generation was absolutely critical. I took the C routines and interfaced them with Perl as external subroutines. This mixed-language design gave us the best of both worlds: rapid development of the bulk of the code in Perl and the fastest possible performance with the C-based key routines in the area where it had the most impact on overall system performance."

Doucette was introduced to Perl in a production support role while working on a marketing data warehouse for American Express more than eight years ago. "I was not thrilled when I was forced to learn Perl on short notice, but I quickly realized that Perl allowed me to do quite a bit with a lot less effort than more traditional programming languages. I got my work done and actually got to go home once in a while. I initially began using Perl to prototype applications that I would later port to C or C++, or to replace Bourne or Korn shell scripting. In no time, though, I realized that Perl was a serious applications programming language in its own right. It's been four or five years since I rewrote any application in another language that I originally developed in Perl."

A few years ago, on a project for a major U.S. retailer, Doucette wrote a full-functioned, UNIX job scheduling package entirely in Perl in just a few days. That Perl application was used to manage the production operation of an 850GB data warehouse for that client, and has since been enhanced and ported to four or five other large iXL data warehouse systems including the prototype system delivered to UCI.

"What can I say? I'm sold," says Doucette. "I regularly recommend that clients allow me to use Perl on these large systems. It's often a struggle because of the relatively low profile of Perl outside of web development. In all but the most time-critical elements of these database systems, the performance with Perl is outstanding and any performance difference at all is more than offset by savings in development time."

Due to his successes with Perl on iXL projects, Doucette now conducts Perl classes training other programmers within his company. He is also a regular speaker at O'Reilly's Perl conferences, and he hopes to soon launch a database on of successful Perl implementations with detailed technical specifications for use in advocating wider use of Perl.

"The UniCredito Italiano project is a real success story in the sense that Perl helped us get something done more effectively and more quickly than ever before," says Doucette. "And it helped lead to a wonderful relationship with a very savvy business client who now has an ongoing, mutually profitable relationship with iXL. This means a substantial amount of additional business for both iXL and UCI."

Integrating ERM solutions into a client's business is just one aspect of the full e-business and Internet services iXL offers its clients as one of the world's largest Internet services companies.

Learn how large and small companies are putting Perl to work by reading more Perl Success Stories.

Sponsored by: