notes from a conversation about webhosting growing

by Derek Sivers

My little webhosting company is growing faster than expected, and I'm still using the same approach to hardware as I was in 2000 when I started it. Started asking for advice from experienced sysadmins, and here are some notes from my first conversation with a guy that ran a huge webhosting center for a few years.

(Note: I'm not going to clean up these notes too much. Mostly just posting here for my own reference, and maybe it's helpful to someone else out there, as-is.)

Where you run into problems:
Not so much the storage. It's the TCP/IP stack.
Building up and tearing down connections for 400 domains.
Lots of contact-switching and overhead. Even adding more RAM won't help.

Huge companies (Yahoo) end up moving to hardware-based TCP/IP off of kernel into hardware. Redbank is one vendor of that kinda thing. But those are extremely expensive.

I would say SATA drives are the way to go.
Instead of slapping them into one box & doing RAID there, look into doing ISCSI - SCSI-over-ethernet. Like Fibre Channel but that's too expensive. ISCSI is the answer to that : doing the same as Fibre Channel with a little less performance.

Files dished-up by webserver are read-only. When writing, don't need performance,
Commerically, look at VMWARE. Virtualizes your box, run on
Open-source version of VMWARE is XEN.
Boxes with 2-4 CPU and 32G of RAM : 16-20 instances of FreeBSD/Linux on it.
The advantage is the TCP/IP stack.
Also lets you virtualize your storage.

If you were to do something like ISCSI : not a fileserver. You can add things like new disks without rebooting FreeBSD.

Myself, I've only used VMWARE the commercial version. ($6000 or something for 2CPU version)

Example: if you have a 2CPU box and you update to a 4CPU box, you won't see that much improvement, since it comes down to the TCP/IP stack.

Don't think centralized file serving, think centralized disk serving.

If you have 12 SATA disks in a RAID. You can chop that up into many. /dev/wda0 /dev/wda1 /dev/wda2 into 500 megs each.

One client says they want to use all 2 gigs for his website, so we publish another virtual disk that's 2 gigs, map it to web-1, carve it up how you like. Now you can easily publish up more pieces of our array.

There are performance reasons to not just have those SATA disks be a big partition. I can take my SATA array and stripe it : RAID cards in server.

Do something totally external : a different device sitting in the rack with a bunch of disks.

We may end up with 12 disks sitting in a server. But a server right next to it has more disk space that we wish we could use.

Coming up with centralized disk space. Biggest bang-per-buck for flexibility.

Dell Blade server : 8 blades : each blade had 8 VMware instannces. This one blade box had 32 servers.
VMWARE - makes it easier to publishing disks to different instances.

One of the things you'd do with VMWARE/XEN : you make your vanilla build : a snapshot of your O.S. install, You copy that file and call it FreeBSD-gold1 or whatever, change a couple things in the startup file, clone it.

Boxes that do external storage : I haven't myself done something with ISCSI : "Left Hand Network".

SAN didn't exist when he was doing this. SAN exists in two forms (1) fibre channel (2) iSCSI.

First place to look is to centralize your disks.


2006-03-07 11:02:59

Huge companies (Yahoo) end up moving to hardware-based TCP/IP off of kernel into hardware. Redbank is one vendor of that kinda thing. But those are extremely expensive.

Out of curiosity, can you expand on this?

2006-03-10 16:05:04
Sun's Niagara
What about Sun's chip that can run many threads concurrently?