There's a new transport-layer protocol in town
by Andy Oram
SCTP can deliver multiple streams of short messages between sockets. One of its "big wins" is that it permits multiple IP addresses to be associated with each connection, and if one connection fails the streams will automatically switch to the next.
An admirably lucid presentation on Friday explained how the protocol works and how to program it in a pending Linux implementation. Testing has shown that the protocol involves less overhead than comparable applications using TCP (at least at the user level and in CPU time; kernel performance is expected to be better too); this is attributed to its careful alignment of data on 4-byte word boundaries.
A single new call has to be added to the standard socket API: bindx, which works like bind but adds or removes an address to the same connection. Another bit of a kludge is an extra data structure that contains the number of the stream used by each message; this structure must be queried through getsocketopt. Recommended web sites are http://www.sctp.org/ and http://www.sctp.de/; the Linux implementation is the lksctp project on SourceForge.
Netfilter (iptables) BOF
Speaking of firewalls, there's a lot of new features on the way from the team developing Netfilter. Some of these seem to me to be reasonable extensions that bring the treatment of various network parameters up to the level of the ones currently recognized. Others scream "Bloat! Bloat!" Harald Welte, who presented the changes at Friday's BOF, distanced himself from many of the changes while defending others. The audience vociferously recommended that the team devote itself to creating a robust test framework and recruiting testers. This sounds like good advice to me, but coding new features is fun while testing is merely indispensable.
The upcoming features that are certain to be released soon include:
Extending the "expectation" feature (which looks inside packets to determine the state of the connection in regard to the application running) to support multiple expectations. This is useful for some complex applications like IRC.
Stateful failover: if the firewall machine fails, the rules will be all ready and up to date on the box that takes over.
Allowing the tracking of connections to be restricted to particular interfaces, so you don't suffer the overhead of tracking an interface where you haven't installed any rules.
Greater efficiency when a rule is changed: the kernel won't have to reload the entire table containing all the rules.
Of the other changes, the only one that sounds both imminent and of widespread interest is a logging facility called ulog. Currently logging involves an expensive formatting and writing of a message at the kernel level. Ulog, by contrast, will perform initial filtering and send data to a user process, which can then do any kind of time-consuming or complex processing desired.
Elsewhere in the conference
I had a chance to hear a talk by Miguel de Icaza, who is always good for an outrageous statement or three. If I heard him right, he said that intellectual property issues "killed Sun and Java." He reiterated his comments from the Open Source conference that Microsoft is encumbering .NET with patents, but that the Mono project will work around them and still manage to produce a complete implementation.
Richard Gooch presented his Linux Device File System (devfs), which solves the crisis in device numbers. (Traditional UNIX is limited to 256 major device numbers, and Linux is fast approaching the limit; Linus Torvalds has called for no new major device numbers.) Gooch was sensitive to the considerable flack he's received, and spent a good deal of time arguing that the system was not bloated and provided adequate security. Regardless of problems in devfs, the idea looks quite elegant to me and I can tell that something like it is sorely needed. When it's configured into the kernel, devices occupy a flexible, hierarchy space like the /proc directory and drivers are responsible for creating their own devices when loaded, a considerable saving of trouble for the system administrator.