Interviewing the Audience
by pat eyler
Sometimes, an interview just doesn't go as planned. I throw out a lot of questions and answers from the interviews I do, either because the answers are better combined or they just don't fit with the rest of the interview. Occassionally, some of my favorite questions don't make the cut. In my recent interview with Robert Glass, I has one such question. Since I couldn't bear to bury it, I decided to ask it here.
Mark really likes iterators, and iterators are something of a dead horse. Python, Ruby, Perl, etc. have a clear and consistent way of modelling a "collection" of things baked into the syntax. C++ and Java don't, so it's necessary to create protocols and design patterns to add that feature into projects.
A better example would be lazy lists. If you read the original papers where lazy computation is introduced into Lisp or Scheme, the idea sounds cool in the abstract, but the implementation is hideous in the extreme. Mark talks about lazy lists in Higher Order Perl, and somehow they're understandable. Maybe the extra syntax helps. Maybe Mark is just better at explaining it.
The problem with lazy lists is that they're an added feature, not innate like lists and hashes. So it's necessary to remember where you're using them, and use them correctly. But Haskell takes a different approach; if lazy lists are good, they should be available all the time. As a results, you get all the benefits of lazy lists, without having to do all the mental bookkeeping of when lists are lazy or strict (non-lazy).
On the other hand, Haskell, like most functional languages, spends a lot of time dealing with lists. Three of the most common operations on lists are converting one list into another (mapping), filtering out a portion of a list (grep), and reducing a list to a single value. Perl5 doesn't have a syntax, operator or builtin function to reduce lists. But neither does Haskell. That's because there are many ways you can reduce a list: from the right or the left? with or without a default value? reduce an entire list, or stop when you have a final answer? Haskell has a handful of functions to perform reduction (or folds, as they're more commonly known), because it doesn't make sense to offer one or two ways to do it. (Perl6 will offer a special syntax to perform list reduction, but it's not clear to me whether it's the only kind of reduction you need, or the kind that gets used most often.)
There is a reference in a wikipedia article to Paul Graham making a similar point:
"...This has led some writers, such as Paul Graham, to speculate that many design patterns observed in statically-typed languages are simply evidence of 'the human compiler' repeatedly writing out metaprograms."
His full talk is here, and the point about patterns is near the end:
I agree on the general principle: Whenever a pattern emerges, implementing the pattern by hand is the band-aid solution. The long-term solution is of course to extend the language (core or libraries) so that the pattern becomes 'invisible', in the terms of the mjd link.
But I beg to differ on the Iterator pattern: The Iterator pattern in Java is one of the better aspects of the language. They have made it 'invisible' - it's already implemented, you only need to use it.
The advantage of the iterator is that you don't need to manage indexes, saving a few lines of code for each iteration. Of course, the Ruby internal iterator 'each' means that you don't need to manage the iterator either. The 'each' style is better for simple iteration.
But external iterators have the advantage that you can iterate over several collections simultaneously. This is something I do fairly often. Say you want to iterate over an array of X and and array of Y to create a list of Points. Currently I know of no simpler way to do that in Ruby than falling back to using indexes, with iterators I can easily march over several lists in lockstep.
I guess usage patterns vary between tasks and individuals, but for me, out of all tasks that can be solved with one or more 'for' loops, for me the Ruby 'each' is applicable perhaps 40% of the time, a further 30% can be done with iterators, and the rest still requires explicit indexes. (For example because you need to jump x indexes at a time.)
I don't think I have a clear idea of what the ultimate iteration pattern should be. But I suspect that 'iteration' will eventually resolve into several subtasks, much like what happened to the 'goto', which was eventually replaced by functions, exceptions and a number of special operators (like 'next', 'return', 'break').
The most complete iteration concept I've encountered was CommonLisp. From your description, Haskell is similar in that respect. If there ever is to be a replacement for 'for', I guess it will start with one of these models.
In 1999, I did a master degree in pattern matching using Lisp. As part of that, I made a 'trie' (a data structure for fast string lookup) for storing the pattern statistics.
In this case, a 'pattern' was any repeated sequence of length N or less. But a 'repeated sequence' is too abstract for a programming language. The language wants to know: A repeated sequence of what? I wanted my pattern-matching code to be able to handle any kinds of repeated patterns, whether they were represented as strings, characters, integers, floats or generic objects.
In a language like Java you cannot do that. The problem is that Java is strongly typed, the type hierarchy does not match what I was trying to do, there is (was) an impassable barrier between primitive and object types, and there is no way to create your own type definitions that encompass several existing types.
Taken together, this means that in Java, there is no way of creating a generic 'sequence match' algorithm. (In Lisp, anything with an ordered collection of elements is a 'sequence'. Translated to Java terms, String, String, int, float, Integer, Object, List etc. would have to share a common supertype 'sequence'. Which they don't.)
(It is possible to do with reflection, but if you want both clean syntax and efficient implementation, I guess it would take a year or more to create a library to work around the language limitations.)
Lisp has a mechanism to access any sequence (string, linked list, int array or otherwise) by index. But that mechanism was too slow. I wanted to be able to write complex algorithms just once, never mind the argument type, and still have it execute efficiently with amounts of data that approached the memory limit.
As it turned out, I could to that in Lisp. I created a macro like this:
(defmacro do-seq ((var seq) &body body)
"optimized access to any sequence"
(let ((s (gensym)))
`(let ((,s ,seq))
(declare (optimize (speed 3)(safety 1)))
(list (dolist (,var ,s)
(dotimes (i (length (the simple-string ,s)))
(let ((,var (char (the simple-string ,s) i)))
((simple-array fixnum (*))
(dotimes (i (length (the (simple-array fixnum (*)) ,s)))
(let ((,var (aref (the (simple-array fixnum (*)) ,s) i)))
(t (error "non-sequence argument to do-seq: ~a" ,s))))))
[sorry about the formatting. i supplied '<pre>' tags, but the preview still seems to remove any indention, making the code almost unreadable]
It's hard to translate this to Java, but the general idea is that the 'seq' argument is a sequence, of type List, String or int. The 'body' argument is the function, whatever you want to do with each list element. (Pretty much as Ruby's 'each', if 'each' on a Ruby string would return character-by-character instead of line-by-line.) The 'typecase' part is a test of the sequence type. The 'dolist' and 'dotimes' stuff are to iterate efficiently over different kinds of sequences. The parts that say '(declare (optimize..) and 'the (simple-array fixnum (*))' are type declarations, so the compiler can generate more efficent code.
Here, I was using the fact that although a general method to access to any kind of element in any kind of sequence must be slow, since it has to determine the type of sequence and the type of element for every single invocation, a series of invocations to a series of elements of the same type only has to determine sequence- and element type once, and can compile the calling code to use only integer machine operations when the sequence actually (at runtime) is integers.
The general point is that I could create, in less than 50 lines of code, a way to access elements in a sequence (List, Object or characters in a string) that was both transparent (calling code does not need to know the sequence type) and fast. And in less than 50 lines, I managed to work around the limitations of the standard language implementation.
Granted, it took me 3-4 months to figure out a Lisp way to do this. On the other hand, after 7 years working with Java I still have no idea how, in Java, to make a very-complex-algorithm-i-certainly-dont-want-to-implement-twice() that accepts int, String and List as arguments, and run efficiently on all of them. (The Java way is to make an implementation for one argument type, and supply a handful of override methods for the remaining types. These helper methods will convert their arguments to the target type and call the implementing method. This works well for small data sets, but not at all when your data sets approach the size of the available memory.)
Lisp is certainly a more powerful language. The catch, where Java beats Lisp hands down, is that
a) Java supports multiple programmers and multiple libraries. Packages and Javadoc are (IMHO) the greatest contribution Java has made to the programming world. Packages (a namespace thing, so that my Document class dealing with Word does not conflict with your Document class dealing with PDF) allows third-part libraries without having to worry about naming conflicts. And Javadoc (a mechanism for generating html documentation from source code comments) lets you get an overview of other people's code: Even if they did not leave a single line of documentation, at least you get a list of method names and argument types. (Lisp has a 'module' mechanism, but it's so painful to use that when I left Lisp, at ~10,000 lines of code, I still had not gotten around to use it. The Java version is built right in there from the start, with no additional cost.)
b) Java supports PDF, Internet, databases, LDAPs and things you've never even heard of. Lisp lives in a world where there is still, in 2006, no complete and robust implementation of the http protocol. (Presumably, if you're using Lisp, you're smart enough to read all the specs and implement http, https, ftp and email libraries all by yourself in a weekend or so. It's hardly worth the bother to create standard libraries for such minor stuff.)
The conclusion, I think, is that Java is optimized to work for known problems with data of know types. It is ideally suited for enterprise type problems, on the form 'convert document of type x to document of type y', but a lousy match for anything approaching general problem solving.
Lisp is open for any kind of problems of any type. It is ideally suited to solitary geniuses, but a lousy match for anyone interested in working with new-fangled stuff like databases, PDF or the Internet. (It is possible, but you have to write every single line of interaction- and intepretation code on your own. Considering that the PDF specification alone is ~1000 pages, this seems to be practical only if you're unemployed, wealthy or a terminally bored student.)
Back to the original question, which I interpret as a question for languages of the future: A whole lot of the elements I would want are already there, they just happen to be spread across different languages.
A language of the future would presumably be a cross-over: Suited for new ideas as well as being good for working with what we already know.
|John "Z-Bo" Zabroski
When I think of expressiveness, I think about deep, penetrating detail. So I think of things like assembly and The Art of Computer Programming by Donald Knuth. The so-called "low-level" stuff. When I think of lack of expressiveness, I think of abstraction. The so-called "high-level" stuff.
A lot of famous writers on the Internet have made abstraction seem like a win-win. Abstraction is a win-lose, a trade-off we are often willing to make. Similarly, a lot of famous programmers have made opinionated software seem like a win-win. Opinionated software is a win-lose, a trade-off we hopefully make under well-defined circumstances. It's not a zero-sum game, and that means the sum can be negative or positive.
I also don't understand Adam Turoff's comments. Iterators also don't simply disappear. There is no perfectly consistent way of modeling a "collection" with programming language design. It is a model. The notion that design patterns disappear is troubling to me, because it suggests design can be taken care of by the programming language. If the problem application domain maps perfectly to the programming languages built-in features, then design is not simplified. Implementation is simplified.
|John "Z-Bo" Zabroski
With that in mind: All ideas are transferable, and it is incumbent upon me to preserve those ideas in a form that lends itself to making those ideas a business asset. All application code is mortal, it is going to die. All programmer time is a sunk cost. I have a strong need for having a form of my code that I can call my asset, so that I can take advantage of lower opportunity costs. Those opportunity costs are based on the ideas, not the present code. Of course, in practice, software is a combination of an informal specification (design document) and formal specification (unit tests and other forms of tests that define which parts of the implementation are not subjective).