The PHP Scalability Myth
Subject:   Scalability
Date:   2003-10-17 10:29:40
From:   anonymous2
The first incorrect premise of this artical was that performance is related to scalability. As an architectual choice, it is inevitable that by choosing to have scalability, you will be sacrificing some performance. There is a tradeoff between performance & scalability in the same way that there is a tradeoff between speed and memory. It is one thing to be able to create a nice web application that can handle 10,000 concurrent users. Its a completely different thing to build one that can handle 5,000,000 concurrent users. If you are expecting 5,000,000 concurrent active sessions, you need to build a lot of code to handle this properly. All of the handling slows down each individual transaction.

The defining point to scalability is to eliminate bottlenecks from the architecture that generally only occur when the overall throughput is massive. If each session requires 32K in core, when there are 10,000 its easy. When there are 5,000,000 a single computer is probably going to thrash itself into a crash.

A great study in understanding scalability is comparing the Teradata RDBMS to Oracle. One is implemented in hardware, slow as molasses, but its virtually impossible to issue enough queries to significantly change the throughput of all of the queries. In the other case, as Oracle splats its databases over more and more inter-networked machines, it will eventually reach a maximum where adding a new machine won't significantly boost the overall throughput of the system. There is clearly a limit to the number of queries that Oracle can handle. The limit for Teradata is likely at least an order of magnitude above the Oracle limit.

The second incorrect premise of the article is that scalability matters. Most of the existing systems out there have been so crippled by poor algorithms (N^2 or worse) and bad code that it really doesn't matter if the underlying architecture could scale. As well, most businesses are effectiviliy limited by their own business models that the technological issues really don't matter. If a company is running a system that handles 10,000 users now, then its more than likely that the usage will predicable grow slowly enough that someone can re-write the system to support 50,000 people. If they don't then the existing technology will essentially throddle the users and keeps the number in and around 10,000. This happens because when there are too many users they get pissed off at the performance and go elsewhere.

In truth, I've seen no evidence that the J2EE architecture could in fact handle some massive number like 5,000,000 reasonably sized querys concurrently. While I'm not entirely sure that it is necessary I know that when Sun claims Java is scalable it is really just using it as a buzz word for "we can run this architecture for 50,000 concurrent users, when the other guys can only get to 20,000". It is clear that when they were developing the specifications for the EJB architecture they were explicitly trying to address the scalability issues. If at the end of the day, PHP, which didn't try to address these issues can be considered an alternative to the EJB design, it doesn't prove that PHP is scalable. What it really implies is that Java isn't!

1 to 1 of 1
1 to 1 of 1