Building Large Systems

by Curtis Poe

Many programmers feel they have bragging rights if they've written large systems. This isn't always fair as many times a quick twenty-something line program might save the day and programmers who can crank them out shouldn't be undervalued. Be that as it may, sometimes we need to write large systems and we need to know how to do it. But what if you're just writing a small system? What's small? And as many of us know, small systems stick around and often grow. While rules which affect larger systems don't always seem as important on small systems, it's fair to say that if you want your small systems to be able to grow to large systems, it doesn't hurt to start with sane rules.


2006-10-01 00:39:12
What is the good web architecture for 10 billion data records with 1KB size each?

2006-10-01 22:31:12
why is the array element comparison example awful? what is the best way to do it?
2006-10-02 01:24:38
why is the array element comparison example awful?

When iterating over each array in nested loops, the number of iterations is the product of the number of elements in each array. If each array has only three elements, this might not be a big deal (though it could be if you were, say, fetching a slow Web page with each iteration). As the number of elements in each array grows, this can be extremely expensive.

what is the best way to do it?

That really depends upon what you're trying to accomplish. Even if it's really slow, it may not be a big deal if it's code which is only called in a 2:00 AM batch process. However, there are various strategies available to bring this algorithm closer to O(n). You might sort one list, iterate over the other and do a binary search on the first. You might sort both lists and walk them in tandem. You might make the elements of one array the keys to a hash (dictionary) and do a direct lookup for each element in the other array.

The various technigues will depend upon whether you need to know if there's more than one duplicate for element, if you need to keep your results in order, which programming language you're using, and so on. There's an interesting thread about this here.

Ralph Corderoy
2006-10-04 03:15:07
The two-line Perl needs to also close FH or die to be equivalent to the Java as the latter is detecting errors on closing the file and the Perl isn't. Some errors are only detectable on flushing buffers and closing the underlying file descriptor.