ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

A Distributed Discussion with Elliotte Rusty Harold A Distributed Discussion with Elliotte Rusty Harold

by Chris Adamson

Elliotte Rusty Harold has long been a well-known voice in the Java community. His first edition of Java Network Programming came out in 1997, and seven years later, he has returned with a third edition of this classic, updated for J2SE 5.0 and the java.lang.nio package introduced in Java 1.4. He's also known for his many books and articles on XML, as well as for the long-running Cafe au Lait web site.

ONJava interviewed Elliotte by email to get his thoughts on networking, XML, and Java in general.

Networking with Java

ONJava: Many Java APIs are implicitly networked, especially in J2EE: JMS, EJB, etc. Does a Java developer still need to know about low-level stuff like sockets?

Elliotte Rusty Harold: It depends completely on what the developer is writing. For instance, if your program's only interaction with network servers is talking to a SQL database, then you'd have to be crazy to use low-level socket code instead of JDBC or something even higher-level like Hibernate. If the high-level frameworks do what you need, great! By all means use them.

However, frameworks are never going to do everything. There will always be new protocols and network services that have to be implemented with low-level socket code before a framework can be layered on top of them. Someone has to write the frameworks, after all. But more importantly, the frameworks tend to be quite focused on the server side. People writing clients have much less sophisticated libraries to help them; and they do often need to use sockets or the URL/URLConnection/protocol handler framework built into Java.

Related Reading

Java Network Programming
By Elliotte Rusty Harold

Finally, I've noticed the frameworks don't always work like you want, especially if you're looking for something that fits your site instead of re-architecting your site around the framework. For the last couple of days I've been looking for a simple, RESTful framework that would allow me to add comments to my web site. As far as I can tell, there's absolutely nothing out there. I started by looking for a Java solution, but even after I expanded my search to include PHP- and Python-based solutions, I still couldn't find anything, so I had to roll my own. :-(

ONJava: Is it time for developers to start paying attention to IPv6?

ERH: IPv6 is one of the major new networking features in Java 1.4 and in the third edition of Java Network Programming. It's starting to see some traction outside the United States and is at the beginning of what is probably a classic exponential growth curve. That being said, though, the absolute number of IPv6 deployments is still quite small compared to IPv4, and is likely to remain small for the next couple of years. Right now I'd venture to say there are more Mac OS 9 and other IPv6-incapable systems connected to the net than there are IPv6 nodes.

Longer term, IPv6 is likely to become very important for networks and network administrators. It adds lots of useful functionality and helps out with addressing, but the really good news is that most developers working in Java can pretty much ignore it. One of the things that Java has done right since 1.0 is abstract away a lot of gory, low-level details like the exact representation of network addresses. Pretty much all of the Java networking code you write today or have written in the past will just work on an IPv6 network without any extra effort on your part, as long as you're using Java 1.4 or later.

A related issue that will affect Java developers more directly is internationalized domain names and IRIs. The specs for these aren't quite finished yet, but they will be soon, and they're going to catch fire a lot faster than IPv6 because they provide obvious benefits to roughly five billion end users, who are going to start using them (and in some cases, already are using them) whether or not the specs and software are ready. Sadly, even Java 5.0 has absolutely no support for any of this. This means application developers are going to have to accept internationalized domain names and IRIs from end users and other software and convert them into old-style domain names and URIs before working with them in Java. This really should be a core part of the library. There are open RFEs for this functionality for Mustang (Java 1.6), so maybe we'll have this sometime in 2006.

ONJava: What's new in J2SE 5.0 to help out the network programmer?

ERH: Java 1.5 adds several new classes, and a few methods scattered throughout the API. Probably the most significant additions are the CookieHandler and Proxy classes. The Proxy class lets applications choose proxy strategies on a much more finely grained, per-connection basis rather than on the per-VM basis that was the only easy option in earlier versions.

The CookieHandler class provides hooks for cookie callbacks from the HTTP protocol handler used by the URL and URLConnection classes. This makes third-party HTTP libraries less necessary. However, the CookieHandler class is abstract and you have to roll your own implementation (an example is given in Chapter 8), so the support for cookies is still not complete.

Snags and Challenges

ONJava: It's great that you have a chapter on multicast, but all of my iTunes streams are still plain old Shoutcast-like MP3-over-HTTP. What's holding up multicast?

ERH: The routers. Way too many ISPs and network administrators have turned off or blocked multicast functionality. Consequently, you simply can't rely on multicast for any broad distribution of content across the public Internet.

This really needs to change. Clearly, people do want to watch realtime streaming video and listen to realtime streaming audio, and they're doing it now over much less efficient point-to-point protocols. Some of the peer-to-peer applications are even effectively reinventing multicast on top of unicast TCP and UDP. It really is time to turn on multicast by default. The bandwidth savings could be enormous, and it could enable a whole new group of both consumers and producers of multimedia content.

ONJava: What's the hardest part getting started with network programming?

ERH: This question stumped me for a little while because I can't really think of any part of it that is hard. Java makes network programming a lot easier than most other languages do. The learning curve is long but not steep. You can dip in almost anywhere and start learning what you need to know. Maybe the trickiest bit is not the network itself but rather the I/O. It's not that I/O in Java is hard. It isn't. It's just quite different from I/O in older languages like C++, Pascal, and Fortran. These languages, especially when approached in the way that they're taught in school, tend to treat I/O as if its primary purpose is talking to the user through the console, something that's almost never true today anywhere except in textbook examples. Java I/O, both the classic and the new models, is much better suited to network programs than the printf() functions old C programmers often look for. This sometimes throws people moving to Java from other languages. However, if you're comfortable with Java's stream-based I/O model and understand the difference between a Reader and an InputStream, there's really nothing to stop any programmer from diving write into network programming. (And if you don't understand that, you just have to read Chapter 4; and then you will understand it. :-) )

ONJava: Is there a bug or a mis-feature in the Java networking classes that really bothers you?

ERH: Protocol handlers. This was a great idea in theory: dynamically extending web browsers to handle new protocols. However, protocol handlers never really worked that well in practice. Most importantly, implementing a constant stream of new protocols for new services proved not to be necessary. Instead, most new services are implemented on top of HTTP. In fairness, it certainly wasn't obvious this would be the case way back in 1995 when Java first saw the light of day.

Nonetheless, although protocol handlers never really took off, the limits to their design continue to impact Java to this day. Protocol handlers are designed to fit all possible URL schemes and instead end up serving none of them particularly well. They were designed mostly with HTTP in mind. Otherwise, the API works for FTP, but only for downloads, not uploads. Even the simple file protocol handler has some serious problems. More complex protocols like mailto are just hopeless. They can't be reasonably fit to APIs exposed by URL, URLConnection, and URLStreamHandler. It would be much better to design separate APIs for each separate protocol.

At the same time, the effort that's gone into protocol handlers has blocked the development of alternative APIs. In 2005, there's still no complete, modern implementation of either HTTP or FTP in the standard JDK. The URL class makes simple things simple, but it makes hard things impossible. Instead, one has to look to third-party libraries like the Apache Commons HttpClient or Commons Net if you want a complete, straightforward API for many protocols. A lot of this really deserves to be part of the core Java API.

ONJava: I noticed there's not a lot about networking with J2ME in the book. Is it not practical yet, or just so different that it needs its own book?

ERH: It's definitely practical, but I'm afraid it's so different it really needs its own book. The standard java.net package is just too heavyweight for J2ME. java.net's exception classes alone can overwhelm the smallest devices. J2ME substitutes the much smaller javax.microedition.io package for java.net. Details vary from one J2ME version and profile to the next, but for MIDP 2.0, this provides basic TCP socket, UDP datagram, and HTTP/HTTPS support in a fairly straightforward way, though not all devices may support all of these protocols. In some ways, it's simpler and easier to use than the standard java.net classes.

Java's Past, Present, and Future

ONJava: You've been here since the beginning--your Cafe au Lait site has a copyright that dates back to 1995. When you compare networking in Java 1.0 to networking in J2SE 5.0, how much has changed, and how much really needed to?

ERH: A lot's been added, but surprisingly little has changed. The major real changes have been in I/O, first with the introduction of Readers and Writers in Java 1.1, and then with non-blocking I/O in Java 1.4. However, the networking classes themselves still behave pretty much like they did way back in 1995.

Probably more should have changed than has. The Socket constructors still do a lot more than any constructor should. The URL class and even the newer URI class still aren't fully conformant to the relevant RFCs. The protocol handler mechanism is still a mess that only really fits HTTP. Content handlers never really got off the ground, and should probably be dropped. However, Sun's extreme commitment to backwards compatibility means none of this is likely to happen anytime soon.

ONJava: You've written so much about XML in the last few years; so many of the interesting XML standards seem to have Java implementations that themselves are almost de facto standards (Xalan, Xerces, etc.). Do you think there's some sort of ideal pairing between XML and Java?

ERH: The pairing between XML and Java is certainly less than ideal. Java was simply the preferred language of most of the early XML implementers. At the time when XML was invented, circa 1996-1997, Java had better Unicode support than any other common language, which certainly helped it along. Perl lagged for several years in the XML space because it wasn't yet Unicode-savvy, and Larry Wall first had to Unicode-enable Perl before he (or anyone else) could think seriously about supporting XML. However, nowadays most languages including Perl and Python have good Unicode support, so Java's advantage is much less pronounced. Indeed, one of the major impedance mismatches between Java and XML is that a Java char is not in fact a Unicode character; rather, it is a UTF-16 code point. This means that Java implementations have to do quite a bit more work to verify well-formedness than they would if Java actually used Unicode characters. Many APIs and libraries simply punt on this issue. My own XOM is a notable exception; it goes the extra mile to make sure surrogate characters in Java strings are properly matched up.

However, that's a corner case that doesn't affect a lot of people in practice. A much more common issue is simply the difference between XML's tree-based design and Java's object-oriented design. Many naive efforts to model XML documents in Java fail to account for mixed content, element order, repeated elements, recursive elements, and the like. An XML document is not an object and a schema is not a class, but many programmers insist on treating them that way. This myopia is hardly unique to Java, of course. C++ and C# programmers wear the same object-colored glasses. Lisp programmers wear different glasses; they insist on seeing XML as simply S-expressions with angle brackets instead of parentheses, a model that's only marginally closer to reality. The only proper way to approach XML is by understanding and accepting its unique structure without trying to force it to fit into some other paradigm, be that paradigm objects, S-expressions, records and fields, or something else.

ONJava: What kinds of things could and should be done with Java, but aren't?

ERH: That's a short question but a very long answer. I wrote an article about this for O'Reilly back in 2002, and it's still pretty relevant. So far, Sun has taken basically none of my suggestions. Still, I think I can sum up the answer to the question in one sentence:

Java should be open sourced under the GPL license.

There are far too many things that need to be done with Java than any one company can do, even with the assistance of the other members of the Java Community Process. Freeing Java would enable many talented developers to do things with Java that can't even be imagined now. Some of the things these developers would immediately create would be absolute crap and antithetical to good programming practice (multiple inheritance and operator overloading, to name just two), but in a true free marketplace of ideas, the better enhancements will win over the worse ones.

Sun keeps crying that if they do this it will break compatibility, but I've got news for Sun: compatibility is already broken. Eclipse 3.0 can't run on Mac OS X 10.2. XOM and Xalan can't run on Mac OS 9. javac can't compile AspectJ code. Sun itself publishes three different incompatible variants of Java: J2ME, J2SE, and J2EE. And you know what? The world hasn't ended. One language, one class library, one VM cannot possibly satisfy everybody's needs. Different communities need to be free to rewrite the language to meet their own needs. And researchers need to be free to explore new options that may show the path forward.

I am convinced Sun will eventually open source Java, probably later rather than sooner, and probably only after one or two other groups have released clean-room, open source implementations of the entire J2SE. However, if Sun open sourced the JDK now, they'd be vastly strengthening the Java community, and relegating C# and .NET to an increasingly insignificant backwater of computing.

Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.

Return to ONJava.com