Higher Education Web Dev 2004, Day 2.1

by brian d foy

Related link: http://www.highedweb.org/2004/index.html



9:15 am I'm at the HighEdWebDev 2004 Conference in Rochester, NY this week, but just as a spectator. This is a conference mainly for web people at colleges and universities. One of the presenters is talking about some of the work which I and Randal Schwartz did through Stonehenge Consulting Services.

I've only been here for about a half hour (and I missed the first day, a Sunday), but the hallway conversations have been very interesting. This conference is a mix of "business", content, and technical people, so it's not just the geek's perspective on web stuff.

9:45 am I'm sitting in the tech session, "Technical Propeller Hats Required". Jim Brandt from the University of Buffalo is talking about converting their students services web site from vanilla CGI to mod_perl. He says that in the past couple of years, usage has increased from 25-50% each semester, and their site gets slammed especially hard during the first week of each semester.

Jim reports that upgrading CPUs, disks, and memory only got them limited improvements, and they weren't sure that buying really big iron would be that much of an improvement for the cost. A lot of people have run into this problem, so it's not really new.

He decided to look for other places to improve. Since he was using vanilla CGI (one process per request, a database connection per process), he realized that the real improvement would be dumping CGI.

The problem is that he needed a quick fix and couldn't dump the existing code which had been built up over several years. Even if he wanted to rewrite everything, he didn't have the time or resources to do it in time. He also had to integrate it with the existing set-up for other things, like the student authentication system.

He checked out mod_perl. He could get an immediate benefit with Apache::Registry and Apache::DBI---he could keep all the CGI code.

They decided to get some Perl training. He calls it "Just a little bit late" training rather than "Just in Time" training. They had already done a lot of homework and tried a lot of things, so they came to the training sessions with a lot of questions about problems they had already run into.

10:00 am Jim just canvassed the room. I counted about 30 people in the room, and most of them say they are using apache. Jim says that open source software has come a long way in usability and acceptance, but bringing in experts helps to mitigate management fears about its use.

Once they turned on mod_perl, they looked at their server logs to figure out who was using what when, and identified the top ten most used scripts. They concentrated their conversion efforts on those scripts, which Jim says was a big win with management: they didn't have to convert everything before they could get the benefit.

10:15 am There aren't too many laptops out in this room, which seems odd to me only because I tend to be at Perl conferences where the attendees like to IRC with the person sitting next to them. I can see a 12-inch iBook-er reading Slashdot, though.

Now Jim is talking about reverse proxing in apache. They separated the servers so the one doing the heavy lifting (database and CGI stuff) didn't also have to handle all of the other content. To his surpise, Jim found that some of their users were on really slow connections (instead of the broadband they assumed all on-campus students have), so their heavy-lifting processes basically finished their heacy lifting, but were tied up trickling bits to the client. Once that work is given to the front end of the reverse proxy, they got a big speed up. Once the backend server doesn't have to talk to the client, it has a lot more time to do their real work.

10:30 am Jim is talking about the hardware set-up. They started using SSL cards to take that load off of the CPU. With 25,000 people trying to hit a server in a couple hours, SSL key generation was a significant performance limiter.

They also went with a server farm. Instead of a couple of big machines, they went with more smaller machines managed by a separate load-balancer. At first the load-balancer ran slower, but onyl because they didn't turn on sticky IPs: users had to keep renegotiating things because they were getting different back-end machines. When they fixed that, they got the faster results they expected.

10:40 am They still use CGI for some things, even though they have this big mod_perl set-up. Jim is talking about something I remember from Joel Spolsky: there are different types of software development, and each has a different economic scenario. In this case, they didn't spend a lot of time creating fancy technology for a script that 200 people on campus might use five times a year. Interns can easily create CGI scripts and take care of those users.

10:50 am Jim finished his talk and is taking questions. A lot of people seem to be locked into certain technologies, either by initial choice (big code base), management fiat ("We will use Sun"), or a design decision ("We had to do this because J2EE needed it").