The New LAMJ Scaling

by chromatic

Thought for the day: If the preferred scaling strategy of Java web applications is shared-everything in a beefy JVM with plenty of threads in myriad pools (and it seems to be) and the preferred scaling strategy of LAMP applications is a shared-nothing architecture across plenty of boxes with memcached in front of a replicated database, what changes will be necessary to run popular apps written with shared-nothing in mind in a shared-everything environment?

Bonus question: besides web applications and language research, are dynamic languages on the JVM interesting? (The clever reader will see where this line of thought leads.)


4 Comments

Erik
2008-03-26 22:13:40
That is a great question that really makes me think.
It really is a bridging of two totally separate mindsets. My first instinct is to respond by asking why even running on a VM would someone want to have a share everything design? I believe the share everything approach creates an environment where you have so much data which is easily accessed that you do not realize the cost for the access. The share everything model also forces you into purchasing large amounts of very expensive hardware just to support your minimal database and a few pages rendered off of it. In my opinion the LAMP and the share nothing designs offer a far superior web presence because each and every piece can be isolated for help in debugging. The overhead is minimal and the design lends itself very well to multiple levels of partitioned clustering. I personally always will follow the the share nothing strategy because it really makes me think hard when designing an application. The share everything approach would be like writing a perl script without using strict or doing any lexical scoping.


For the bonus I think that dynamic languages on the JVM are interesting but do not see them as valuable as the languages on a VM like parrot ;P And I think the design of web applications will evolve when we see main stream adoption of a VM like parrot with dynamic languages running on top of it running large complex web and desktop applications. I think this will help create a more merged design process which I believe will be the next large step in scalable information distribution design.

Tom
2008-03-27 15:30:22
I don't think they are even different.


The "shared everything" and "shared nothing" approaches don't really differ very much. If you look at a Java MVC framework, and a PHP MVC framework, what difference does memory management make? Not much.


Plus, Java apps have always supported horizontally scaling to multiple boxes, so "shared everything" doesn't even exist. The sharing is just within a single instance. So it is more like "some sharing".


Keep in mind that Java can talk to memcached as well. This is commonly done. Though using some sort of Java specific cache server has been more common than memcached. Mainly because they were there first.


@Erik: I can't even understand your response. "Shared nothing" systems nearly always use more memory than "some sharing" approaches. Each use Apache and mod_php, each process ends up having to hold another copy of the code. You have to use caching systems to get code to share. mod_php doesn't really work that well for large sites as the memory usage is so high, and you need to pull PHP out of the web server and use FastCGI. The PHP processes are just as big, but there are now less of them.

Michael Peters
2008-03-28 12:22:09
@Tom: They differ by quite a bit. By sharing nothing you don't have to worry about which requests go to which servers. You don't need a "sticky session" aware load balancer to know that a certain web user needs to be served by a certain server. Maybe they don't look that different to the frameworks, but as far as hardware is concerned there is a big difference. With a shared nothing approach horizontal scaling is a no-brainer and you never have to rebalance active users between machines.


Re: mod_php using too much memory. It seems you were trying to combine your application server with your site server. For any kind of scalability you need to split the 2. And there's no reason they can't both be apache. There's very little difference between a mod_php + light weight apache combo and a FastCGI + light weight apache combo. There's nothing wrong with having 2 instances of apache running on your machine, each doing something different. Or you can do mod_php + varnish or any other proxy/application server combo.


Re: Memory usage. Copy-On-Write! For most of the applications I've worked on having separate shared-nothing processes + COW memory will use less physical memory than threads + sharing. And you never have to worry about deadlocks.


Almost everything I'm reading recently which talks about the future of parallel computing talks about a shared nothing approach. CPU speeds are coming to a plateau. The amount of memory you will have available will get into the Terabyte range. If you have 1000 CPUs with gigs of data for each CPU, which approach would work better? It's pretty obvious.


@Erik: I can't even understand your response. "Shared nothing" systems nearly always use more memory than "some sharing" approaches. Each use Apache and mod_php, each process ends up having to hold another copy of the code. You have to use caching systems to get code to share. mod_php doesn't really work that well for large sites as the memory usage is so high, and you need to pull PHP out of the web server and use FastCGI. The PHP processes are just as big, but there are now less of them.

Tom
2008-03-28 15:34:40
@Michael: Umm.... The session has to be stored somewhere, whether you are using "shared nothing" or "shared something".


But if you have a sticky sessions load balancer (and what load balancer doesn't support this now?), you could make the sessions local to a particular instance. Then it would be actually "shared nothing", as the session storage is not shared (or replicated).


But if you use what you think is a "shared nothing" setup, with memcached for sessions, the cache is shared, so this really isn't "shared nothing", it is now "shared something". The cache is shared storage for all instances.


As I said before, there really isn't much difference between "shared nothing" and "shared something". It is just semantics.