Scale(1), Scale(2), ... Scale(n)

by chromatic

Ted Neward attempted to pull apart some of the silliness in the debate over scalability with Can Dynamic Languages Scale?. In particular, one of the most important insights is:

There's an implicit problem with using the word "scale" here, in that we can think of a language scaling in one of two very orthogonal directions:

  1. Size of project, as in lines-of-code (LOC)
  2. Capacity handling, as in "it needs to scale to 100,000 requests per second"


2008-01-28 08:57:25

To summarize, then (re-ordered for clarity):

  1. Size of project

  2. Maximum capacity handling

  3. Minimum capacity handling

  4. Developer availability

  5. Developer expertise

Unless I've misunderstood you, the last two seem awfully similar and can probably be combined.

Another excellent example of minimum scalability would be COBOL. COBOL was deliberately designed to require more work up front. This was done in the belief that by clearly specifying everything carefully in the "Identification Division", "Environment Division", "Data Division", and so on (and then duplicating a lot of that in the JCL), later maintenance would be made easier. Regardless of whether or not this is true, it does mean that one doesn't dash off a quick COBOL script.

Adam Kennedy
2008-01-28 14:24:48
Seems to be some overuse of the term "scale" here...

Each scalability dimension has multiple properties.

A. Maximum "scale up"
B. Minimum "scale down"
C. Linearity

If you take the two dimensions he calls Scale(1) and Scale(2) (complexity and volume) then you can apply those factors against each other.

Scale(1A) covers the ability of the codebase to grow, availability of developers, introspective analysis and refactoring toolsets, etc. Prime Example: Java

Scale(1B) covers the ease of creating and deploying simple applications, ability to be embedded in other systems, intuitiveness to someone without skills in that area etc. Prime Example: PHP (within it's specific niche at least)

Scale(1C) covers stuff like availability of tools that don't need to be used when a project is small but which can be added later (valgrind, perlcritic), training path from trainee to expert developers, support for modular development, etc. I can think of a great example.

Scalar(2A) is the ability to deliver vast amounts of $stuff. Covers things like determinism, manageability, monitoring capabilities and parallelism. Java and SAP are strong here.

Scalar(2B) probably covers embedding (low footprint) and customization (dealing with many different functions, each of which consumes no overhead). C and CGI.

Scalar(2C) is the domain of the O(x) algorithms and elegant designs. Believe of it not, Oracle is a standout example, since they sell on the basis of cost being related to demand. Amazon's cloud services are also big here.

Chris Josephes
2008-01-29 08:18:55
Flat files don’t scale

Well, depending on what you do with them, they really don't scale.

Large text files (without any indexing assistance) must be sequentially read. If you're working with lots of small files must take into account the overhead of opening and closing the file. Also, a high number of deletions or re-writes is going to fragment your filesystem.

And the big one that everyone forgets is directory operations. A lot of filesystems will show degraded performance if a directory has over 2000 entries. Even simple stat operations would take longer.

You can't just say that a file is scalable (or not scalable) unless you detail the size of the files, the number of files (and layout), the operations performed against the file, and the expected response time.