PHP Scales

by Chris Shiflett

Related link: http://shiflett.org/archive/46



There has been a lot of discussion lately about scalability, brought about by Friendster's move to PHP. Once again, I am amazed at how many people don't understand what scalability means (even though I'm glad to see fewer and fewer people misspelling it). Scalability means "How well a solution to some problem will work when the size of the problem increases" (from Dictionary.com). This is interpreted in drastically different ways, and you can find my interpretation in What Is Scalability?.


Before I continue, let's look at some of the clueful comments from Joyce Park's blog entry:


Rasmus Lerdorf writes:

Scalability is gained by using a shared-nothing architecture where you can scale horizontally infinitely. A typical Java application will make use of the fact that it is running under a JVM in which you can store session and state data very easily and you can effectively write a web application very much the same way you would write a desktop application. This is very convenient, but it doesn't scale. To scale this you then have to add other mechanisms to do intra-JVM message passing which adds another level of complexity and performance issues. There are of course ways to avoid this, but the typical first Java implementation of something will fall into this trap.



PHP has no scalability issues of this nature. Each request is completely sandboxed from every other request and there is nothing in the language that leads people towards writing applications that don't scale.


Harry Fuecks writes (in response to someone citing performance benchmarks to support a scalability argument):

But performance != scalability.


Joyce Park writes (in response to someone suggesting that Friendster's Java developers must have been sub-par):

1) We had not one but TWO guys here who had written bestselling JSP books. Not that this necessarily means they're great Java devs, but I actually think our guys were as good as any team.



2) We tried rewriting the site in Java twice, using MVC and all available best practices. It actually got slower. Anyway, what does MVC have to do with speed or scalability? I thought it was a design cleanliness and maintainability thing.



3) We tried different app servers, different JVMs, different machines.



4) Anything that money could do, it did.


There has been a lot of discussion elsewhere, too. Harry Fuecks explains that The J2EE guy still doesn't get PHP and discusses Why PHP Scales. Harry understands what scalability means and takes the time to try to it explain it to everyone else. If you have read The PHP Scalability Myth or think that scalability is a measure of performance (or both), please take the time to read what Harry has written.


Jeff Moore, in The PHP scalability saga continues, writes:

I think I'll end this post with heresy. The field of web development seems to have a mental model of application development forged from the dot-com boom era. We operate with the vision that our applications are going to experience exponential usage growth. Perhaps this leads to an unhealthy focus on scalability in web applications versus other requirements. Perhaps this also leads us to employ optimizations prematurely before we can even understand their impact or even have a need for them. Perhaps these premature optimizations even hurt scalability and performance and needlessly complicate our applications.



Perhaps the Java Culture is more infected with "dot-com-itis" than the php culture?


George Schlossnagle explains Why PHP Scales - A Cranky, Snarky Answer, in which he writes:

Technical details aside, I think PHP can be made to scale because so many people think that it can't. This skepticism means that people buy into the fact that it takes hard work and intelligent design to make a PHP-based system work right. 'Intelligent design' doesn't mean adhering to MVC or design patterns, writing OO code or assembler. It means looking at your system as a whole, figuring out what it needs to do, and then devising a plan for doing that as cheaply as possible. The critical bit, of course, is that you need to put that sort of work into any large architecture; PHP doesn't magically scale 'naturally', but neither will planting a Java Bean in your backyard create a magic scalable beanstalk.


His entire "answer" is very informative, even if most of it is obvious. Sometimes what people need is for someone to stand up and state the obvious, and I think now is such a time.


Of course, there are plenty of people who aren't as clueful as Harry and George. Unfortunately, it's difficult to know who to listen to. John Lim says "High Performance, High Scalability PHP is a Lie". I assume that he just wanted a nice headline, but his statement couldn't be further from the truth.


Last October, I briefly answered the question What Is Scalability?. Perhaps my use of Big O notation wasn't the best approach, since most people who truly understand my point likely already know what scalability means. A simpler explanation might be better. In fact, we need to eliminate computers from the explanation altogether, because that alone seems to confuse people.


Compare a truck and a tractor (hypothetically). To simplify our comparison, let's assume that both have the exact same towing capacity (this might be unrealistic, but such is the beauty of hypothetical situations). With no load, the truck has a maximum speed of 125 mph (about 200 kph), and the tractor has a maximum speed of 15 mph (about 25 kph). With a load equivalent to their maximum towing capacity, the truck has a maximum speed of 45 mph (about 70 kph), and the tractor has a maximum speed of 10 mph (about 15 kph). Which scales better? If you think the truck does, you're wrong. Although the truck is faster in all cases (loaded, it is even faster than the tractor with no load), it slows down the most under load, proportionately.


If you're only concerned with speed, you should choose a Ferrari Modena rather than decide between the truck and the tractor. If you're only concerned with scalability (which is highly unlikely), you should choose the tractor. If you're concerned with the best combination of speed and scalability, the truck is a good choice.


So how does scalability apply to the Web? First, you should ask yourself whether the Web's fundamental architecture is scalable. The answer is yes. Some people will describe HTTP's statelessness in a derogatory manner. The more enlightened people, however, understand that this is one of the key characteristics that make HTTP such a scalable protocol. What makes it scalable? With every HTTP transaction being completely independent, the amount of resources necessary grows linearly with the amount of requests received. In a system that does not scale (where "does not scale" means that it scales poorly), the amount of resources necessary would increase at a higher rate than the number of requests. While HTTP has its flaws (the proper spelling of referrer being one), there's no arguing that it scales, and this is one of the things that made the Web's early explosive growth less painful than it would have otherwise been.


The present discussion is about developing Web applications that scale well, and whether particular languages, technologies, and platforms are more appropriate than others. My opinion is that some things scale more naturally than others, and Rasmus's explanation above touches on this. PHP, when compiled as an Apache module (mod_php), fits nicely into the basic Web paradigm. In fact, it might be easier to imagine PHP as a new skill that Apache can learn. HTTP requests are still handled by Apache, and unless your programming logic specifically requires interaction with another source (database, filesystem, network), your application will scale as well as Apache (with a decrease in performance based upon the complexity of your programming logic). This is why PHP naturally scales. The caveat I mention is why your PHP application may not scale.


A common (and somewhat trite) argument being tossed around is that scalability has nothing to do with the programming language. While it is true that language syntax is irrelevant, the environments in which languages typically operate can vary drastically, and this makes a big difference. PHP is much different than ColdFusion or JSP. In terms of scalability, PHP has an advantage, but it loses a few features that some developers miss (which is why there are efforts to create application servers for PHP). The PHP versus JSP argument should focus on environment, otherwise the point gets lost.


I actually disagree with George's statement, "PHP doesn't magically scale 'naturally'". Of course, I understand and agree with the spirit of what he's trying to say, which is that using PHP isn't going to make your applications magically scale well, but I do believe that PHP has a natural advantage, as I just described. Rasmus seems to agree with me, and George might also agree, despite his statement.


I think PHP scales well because Apache scales well because the Web scales well. PHP doesn't try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.



How do you define scalability?


5 Comments

James_Elliott
2004-07-02 18:51:42
Another source of confusion
Different groups of people also mean different things by scaling (or use the same word in different contexts). When I hear the expression "PHP doesn't scale", that doesn't mean (to me, without any other contexutal information) anything about number of users or requests, it means that when you grow to a system with hundreds or thousands of components, the source code base becomes hard to understand, work with, evolve and maintain. In other words, my default realm of scaling is in terms of programmer productivity rather than site performance.
bill123456
2004-07-03 09:03:49
Scalability has little to do with the language
In my experience, writing massively scalable webapps is more about smart db usage, agressive data caching (this one is key) and careful resource usage (ie, reading/writing to files).
tonywilliams
2004-07-03 19:08:49
Another source of confusion
So another myth surfaces.


The language used has nothing to do with the problems of large scale projects and there is little a language designer can do to fix it. If Brooks, in The Mythical Man-Month, tells us anything he tells us this. He showed that the size of the problem, not the language, affected programmer productivity if we measure productivity as 'lines written'.


Therefore a language that allows us to write code in the fewest possible lines for our application will give us the best productivity. If you want to scale programmer performance well then you will do better thinking about how you construct program teams than arguing about language. I am amazed how many large development shops have still failed to adopt any sort of 'surgical team' model suggested by Brooks.


As for the problems of maintaining large code bases this has much more to do with design methods, development models, component interface design and coding standards than the language. There are huge systems written in C and assembler with easily maintained code bases as the above has all been addressed early. At the same time I have seen two thousand line Objective-C and Java code bases that are an unmaintainable mess, and much smaller Perl programs where not even the original programmer can maintain it. Once again, language has little effect.


The software industry has to admit, and sometimes does, a few home truths. Programmer productivity scales badly under all conditions. Code maintainability scales poorly unless well supported. Language choice has almost no effect on either.


Tony Williams

lampside
2004-07-07 20:27:45
j2ee and php
I have seen numerous arguments on how OO is the best way to go about any problem. If you look at the number of lines of a php web page and a similar java/jsp page - you will be comparing scripting with OO. Clearly java is not a scripting language and they will not be productive writing OO code to generate html. Java programming was meant for middleware/large projects- similar to what C++ was for. How did we get into this mess where we write entire applications in java and spend many hours coding deployment descriptors(xml files). If you compare programmer productivity - the PHP programmer comes out way ahead. For scalability- may I suggest you visit www.smirkingchimp.com written using PHP Nuke. the Java folks are just about getting into portals...


I find it infinitely funny when I see a lot of java sites that use the popular phpBB forum application to host discussions (www.myeclipseide.com)


PhpNuke itself churns out new releases once in 4-5 months and its just a few folks.


I like Java as a OO language but if we had a small project to do under a tight budget/deadline- I would prefer PHP

simon_massey
2004-07-26 12:22:08
j2ee and php
Javalobby.com state here that they have rewritten their site using JSTL with sql in their page tags. If you take that approach I don’t see that using java for development is so much slower than php. They are writing code which is functionally equivalent to php as quickly.


The idea of using sql in jsp tags would be shocking to many java programmers. Using something like the cache tag trips to the database can be reduced for such an application. If your site is document centric rather than transactional workflow centric then such an approach is light weight, productive and satisfying. Php5 has language support of in memory SQLite which could be used to implement a similar caching strategy as the javalobby approach. So functionally equivalent code can still be written in both languages.


Can Php be made to work well at the other end of the spectrum? Is Java necessary slow to write when working on a transactional heavy application? e.g writing an online banking front end to a well established legacy banking system. Consider the productivity tool that Apache Beehives (a.k.a Weblogic Workshop) represents. You can write webservices and then import the wsdl file into the IDE and map them graphically to a web request e.g this tutorial. This gives you rapid development but geared towards more of a classic j2ee application where you can scale out on the webapp tier and call soap into a transactionally heavy an EJB tier. Can this be done in php? Naturally as php has good soap or xml-rpc support so there is nothing to stop you doing the same webapp in php using the same or any other application server backend (ejb, zope, c++, Tuxedo, whatever). Once again you can write functionally equivalent code.


My feeling is that it is developer ignorance or laziness or that leads to code which is either slow to write or slow to run. Language culture (by which I mean the tutorials online and the experiences and leanings of the opinion leaders) has a lot to do with whether people write bad code. Trying out different approaches in languages and learning the best from each language culture is a luxury that opensource gives us. I am sure that someone that knows Zope written in Python could lend further valuable experiences and good practices to a variety of problems.