Foo Camp Talk - Rate Adaptive MP3 Streaming

by Brian McConnell

Streaming MP3 has been around for years, and despite the emergence of new formats such as AAC, it remains the dominant audio format on the web. Streaming MP3 works well enough in with a LAN or broadband connection, but what about mobile networks, where bandwidth is generally more limited? This article describes a simple trick that enables the creation of rate adaptive clients, among other things, without breaking backward compatibility. This trick, I am reluctant to call it a hack because the idea is so simple, also enables audio hyperlinks within an MP3 stream.



I am calling this RAXAR, for rate adaptive experimental audio relay. I don't like the name all that much, so if someone else has a better idea, go for it.



MP3 + ID3v2 = RAXAR



Embedding meta-data within MP3 streams is easy using the ID3v2 system. ID tags are routinely used to embed title, author and other information within a stream. So why not extend this to define a family of tags that enable MP3 streams to point to downsampled streams, relays, and even to affiliate streams or audio hyperlinks.



Adaptive Rate MP3 Clients



By embedding a few ID3 tags in a stream, it can refer listening clients to alternate streams that run at higher or lower bitrates. This will enable the creation of smarter MP3 clients that automatically upgrade their connection when bandwidth is available, and automatically step down to a lower bitrate when network performance is lacking. The client could either do this automatically, or in response to the user clicking a user interface element. Imagine a cellphone MP3 player that allows you to increase/decrease sound quality via a rocker switch (similar to volume control).



To do this, we define a convention for a set of link frames in the ID3 namespace. The tag Wxxx, where xxx is the bitrate in kbps, contains a URL that points to an alternate bitrate stream. Simple. Clients that know that W064 means "a 64kbps stream lives --> here" will use this information accordingly. Older MP3 clients will just ignore the tag.

Relay and Mirror Sites



One of the neat things we can do with RAXAR is to embed pointers to relays and mirror servers. Smart MP3 clients would parse these tags and build up an index of where peer stream servers reside, and could autonomously select stream servers closest to their network address space. Think of this as a sort of poor man's multicasting.



How would this work? A RAXAR client would connect to the root stream, which would embed tags that point to relays. The client quickly learns where relays are located, and can automatically switch to them, either on its own, or in response to a force redirect tag.



Of course, existing load balancing techniques work well for spreading the initial connect requests around, but one of the neat things we can do by embedding tags in a live stream is to enable redirection after a connection is already in progress.



For example, let's say that midway through a session, a new relay becomes available. The root stream starts broadcasting this information every few frames. The client decides that the relay server is closer than the root server, and connects to it.



Using this technique it will be possible to build multi-hop MP3 relay networks, with each relay appending additional pointers to upstream and downstream relays.



To enable this feature, we create a set of ID3 tags as follows:



WRxx --> URL of relay or mirror stream #xx



The stream server will rank relays using the numeric identifier (00 = best, 99 = dead last). This list will be dynamic, so smart clients will weight relay recommendations by numeric rank and time since the recommendation was made.



Again, this is a simple trick, but it enables us to do some interesting things, like enabling automatic discovery of peer stream servers just by connecting to one stream. There is an obvious security issue here, as a stream server could insert bogus URLs for relay servers, and therefore cause all sorts of mayhem. We assume the source stream is a trusted source, probably not a good assumption, but in a trusted system, you can do some interesting things with this technique.



Stream Groups and Audio Hyperlinks



Another nifty thing we can do is to create stream groups, where one stream publishes links to other streams within an affinity group. So let's say you want to create your own package of Internet radio stations, you just embed pointers to other streams within the group. A smart client that connects to one stream will automatically learn the location and current URLs for affiliated streams.



Why do this instead of look up the shoutcast directory? Well, maybe your streaming on a mobile device with a tiny screen. This trick enables MP3 clients, especially those designed for mobile devices, to automatically discover other streams, and to enable channel surfing via a simple channel up/down interface. This also guarantees that the URLs for affiliate streams are current, as they can be updated mid-stream.



To support stream grouping, we define a few more ID3 tags:



WAFN --> name of stream or program

WAFD --> description of affiliate stream

WAFC --> channel number of affiliate stream

WAFU --> url of stream


NOTE: these tags are sent in a group, so the source stream can define as many affiliate streams/channels as it wants. RAXAR aware clients will capture these tags as they are sent to build up a channel map (a well designed client will cache maps from previous sessions).



Next Steps



The great thing about RAXAR is that it is backward compatible with existing ID3 aware MP3 players. Older players that do not recognize the tags will simply ignore them. New clients will be able to use this information to automatically discover alternate streams, and to build a channel guide from information embedded within the stream itself (a nifty capability that could be used in all sorts of creative ways).



For example, a radio station could use the stream groups feature to embed links to recent podcasts within their main live stream, or could play short tones as queues that a phrase is a hyperlink to another location. This ability to make MP3 streams hyperlinkable could lead to some neat applications.



What's left to do? To send a RAXAR stream, all you need to do is add the newly defined tags to your existing MP3 or other ID3 friendly stream. Note that as of this writing, this is a very informal spec, and the names for the new tags are arbitrary. If someone else wants to add to this or has a better idea for tag nomenclature, go for it.



It should be equally straightforward to update MP3 clients to listen for RAXAR tags. Implementing a rate adaptive MP3 client is simple enough, just listen for Wnnn tags, and build a map of which bitrates map to which URLs. Automatic upgrade/downgrade behavior will be somewhat of a black art due to the unpredictable nature of Internet connections. A good rule of thumb for mobile clients will be to be to start with a mid-range bitrate, say 64kbps, and upgrade if the connection is faster, downgrade if it's slower (e.g. GPRS). It will also be good to provide the listener with a way to manually upgrade or downgrade the connection.



Other features, such as channel groups and relay networks, will require a little more work, but not much. A simple UI for channel groups is to provide a basic channel up/down click interface so the listener can cycle through a group of streams without looking at a graphical user interface.



Audio hyperlinking is an especially interesting area. One idea that came out of the camp was to announce audio hyperlinks with a short tone. While this would be obnoxious in a music stream, this would work nicely for spoken word programming, possibly with different tones to signify different types of hyperlinks (e.g. one a completely different document, one to a short recording that defines a term or concept then reverts back to the original stream). Of course, it would be better to use a markup language to define more complex hyperlinked audio documents, but the goal here is to hack MP3 streams so this can be baked right into the stream itself (this should work for any ID3 friendly stream, not just MP3).



All in all, this should be straightforward to implement, because 95% of what we need is already there. Lastly, I should point out that I am not putting this out there as a "big idea". It's a pretty minor tweak to a widely used system, and if we can get some consensus tag nomenclature, we'll be able to build some interesting services. When I started working on this, I was mainly interested in embedding information about downsampled streams within a parent stream. While I was exploring that notion, I realized that this could be extended to other areas, such as audio hyperlinking. I like the idea of being able to explore an audio stream, so who knows, this could lead to some interesting things. I thought I'd put this out there and see who bites.