Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples

[Editor's Note: Once again, we've invited Danger, Inc., sound designer Peter Drescher to predict the future of mobile audio. We think you'll agree that his latest prophecies are both tantalizing and frighteningly plausible. This essay is based on his October 2007 Audio Engineering Society presentation "Game Audio for Broadband Phones."]

I recently got an iPhone, in part because it reminds me so much of the Star Trek data PADD, a fictional technology consisting of a thin slab of glass and plastic, which is held in one hand and tapped on with the other (see Figure 1).

Fig. 1: Pod vs. PADD

Fig. 1: The iPhone is practically a prototype of the Star Trek PADD.

The PADD (Personal Access Display Device) rendered any kind of information, in a variety of formats, via a subspace connection to the central computer. The interface was gestural, multi-touch, and self-configuring. Of course, there was no keyboard and, in fact, no typing was ever required. It's irrelevant to enter text by hand when there's perfectly accurate voice recognition and transcription services built in to the device.

While the iPhone may not be up to 24th-century standards, the technology is obviously heading in that direction, and I love it when science fiction invents reality. However, the laws of physics currently prevent even the most futuristic phones from generating loud, high-fidelity audio. Despite sophisticated techniques and materials, no cell phone speakers will ever be good enough for anything except producing annoying ringtones.

This is because sound production is all about pushing air, and the tiny bits of vibrating metal and plastic that pass for loudspeakers in mobile devices are incapable of pushing very much of it. Nor can they produce frequencies below about 200Hz, meaning no bass or kick drum in your music mix, nor engine rumble or explosive thud in your game soundtrack. Think of it this way: a speaker the size of a dime simply cannot produce a six-foot long wave.

Headphones vs. Speakers

Unless, of course, you stick it right in your ear! Then the amount of air-pushing needed decreases dramatically. Suddenly, you can fit an entire symphony orchestra in your head, with a flat response, full frequency range, and volume AS LOUD AS YOU FREAKIN' WANT IT! Music playback on cell phone earbuds is already about as good as it gets, because streaming full-resolution audio out of flash memory doesn't take much power or CPU.

Headphones are great when you want to block out the world and listen to music. But the problem with headphones is that they are exclusive, insulating, and antisocial. That's fine for many situations, but people also like to share their music with their friends. Sure, you can send them a link or put the tune on your MySpace page, but it's much more fun to play it for them in person and see their reaction. One of the reasons portable multimedia devices are so popular is because they provide a neat solution to a social problem that everybody has, namely: "how do I show off how cool I am?"

To paraphrase H.L. Mencken, nobody ever went broke overestimating the vanity of the American public, and broadband phones are the perfect attention-getters. Here, look at pictures of my dog, isn't he cute? Hey, listen to these grooves I downloaded from iTunes! C'mere, check out this Weird Al Yankovic video on YouTube! [Note the O’Reilly shoutout at 1:12. —Ed.]

But there is an obvious problem with sharing media like this, which is: You Can't Hear for Squat on cell phone speakers. To really grab your attention, the audio has to be loud enough to annoy the living crap out of the guy sitting next to you. Lucky for him, trying to produce really annoying volume levels using tiny speakers is usually an exercise in futility. Some might consider this a good thing for polite society, yet the desire to share our noise remains.

Sharing Earbuds

Have you seen advertisements featuring happy couples like the one in Figure 2, sharing earbuds?
Fig. 2: Headphone sharing

Fig. 2: Back to mono(nucleosis).

That's lovely and romantic if you want to get your head next to some hot girl, but it's not something you want to do with everyone. There's a television commercial featuring a big sweaty meathead jock, at the gym, pumping iron, endorphin grin on his face, a wild look in his eye, yelling at the camera, "Hey dude! Check out this groove I just downloaded, it totally rawks!" Then he pulls the sweaty earbuds out of his head and moves to stick them into yours — and I'm thinking, "Dude, I use my ears for a living. Get those disgusting things away from me!"

Sharing earbuds does not solve the problem of how to make a cell phone act more like a boombox. Some manufacturers try to make the phones louder by increasing the speaker size. Others install two speakers for stereo playback, though given the palm-width speaker separation, you don't get true stereo, you just get double loud. Some models apply "3D" audio processing to provide a (slightly) increased sense of presence and space.

But it's all ultimately pointless, because if you really want to listen to your music, movies, or game soundtracks, you're gonna have to use...

Headphones for Cell Phones

There are about 18 different kinds of these things. There are the "instantly twisted, horribly uncomfortable, proprietary plug" headsets that can be used only with specific models. I always figured this was the manufacturer's way of telling its customers: "Do not listen to this device using headphones." (See Figure 3.)

Fig. 3: proprietary plug

Fig. 3: Proprietary plug headsets are more trouble than they're worth.

Then there's the 3.5mm (1/8-inch) "miniplug mono headsets," like those dorky Plantronics things telemarketers wear. I remember the first time I saw one of these: I looked at the plug, saw three conductors, and thought, "Uh, stereo?" No, that's mono plus mic, but it looks just like a stereo miniplug.

Fig. 4: Three-conductor plug

Fig. 4: The mono headset connector looks exactly like a stereo mini-plug, but don't try using it for music playback.

Ironically, some music phones are equipped with 2.5mm minijacks, so you have to use an adapter if you want to listen to them using regular headphones, like those in Figure 5.

Fig. 5: Standard headphone plug

Fig. 5: The standard Walkman-style, 3.5mm stereo miniphone plug is found on everything from iPod headphones to high-end studio cans, but implemented inconsistently on cell phones.

Further, the plug on the 2.5mm-to-3.5mm RadioShack adapter in Figure 6 looks exactly like a normal headphone plug — three conductors — but it doesn't work, because you need four conductors, for left, right, ground, and mic. The gray adapter in Figure 6, sold by T-Mobile, will route the audio correctly for a T-Mobile Sidekick, but it may not work on other cell phones, because there's no standard for this stuff.

Fig. 6: Headphone adapters

Fig. 6: Adapters are ultimately pointless for mobile devices.

The iPhone, for example, uses a four-conductor jack that is compatible with standard, three-conductor headphone plugs (you use the mic on the iPhone itself), but the jack housing is so recessed into the case that standard plugs don't fit in without surgery.

There are adapters to solve that, too, but it doesn't really matter, because adapters are so confusing and aggravating for customers, nobody uses them anyway. Show me a phone without a standard headphone jack on it, and I'll show you a phone nobody's using for listening to music.


But there's a basic problem with all of these units: the damn wires! They get tangled, they get unplugged, they pull on your ears, they limit your movement, and they're just a huge pain in the neck, not to mention completely anachronistic. (See Figure 7.) I mean, really, what's the point of having a futuristic wireless device if you have to plug it in to hear the frackin' thing?

Fig. 7: Cord tangle

Fig. 7: Another good argument for going wireless.

That's why Bluetooth was invented. It's a short-range radio network for mobile devices that works exactly like invisible wires. You pair one device to another, then transfer data back and forth as if the two devices were connected by cable...except there's no cable (and no tangled wires).

Bluetooth devices correspond almost exactly to their wired counterparts. There's the standard mono Bluetooth headset: you see them in people's ears everywhere (see Figure 8). Despite Apple's attempt to make them more stylish and comfortable, they still have a dorky reputation and remain the wireless equivalent of the telemarketer's headset. Still, they're fine if all you ever do is talk on the phone.

Fig. 8: Mono Bluetooth Headsets

Fig. 8: Monophonic Bluetooth headsets.

But for music, you gotta have stereo, and for this you can get Bluetooth headphones. These are more for dancing around your living room than walking around outside, and are for some reason designed to be as uncomfortable and oddly shaped as possible.

Maybe manufacturers don't want you to wear them too long, because the battery life sucks. Plus there's no mic, so they don't even accept phone audio. If you're listening to music on your music phone and the phone rings, you need to take the headphones off to answer the phone. This seems fairly pointless.

Fiug. 9: Jabra 8010

Fig. 9: The Jabra 8010 is a stereo Bluetooth headset with mic.

Obviously, the answer is a Bluetooth stereo headset with mic (see Figure 9). This way you can listen to your music, play your kick-ass game, and still answer the phone when it rings. This configuration is becoming more popular, but it's still a fairly new technology and the physical designs are evolving rapidly.

Personally, I like the two-piece concept, but it makes me wonder how long it will be before miniaturization simply turns them into earbuds. In fact, since Bluetooth is all about eliminating wires, how about a headset that consists of a wireless left earbud, plus a wireless right earbud, plus a wireless "mic and control" unit (possibly worn as a pin on your left shoulder? Now we really are talking about Star Trek technology! :)

The Voices in My Head

Increased use of stereo headsets will change the way we relate to telephone communications. Consider this: how many of you talk on your cell phones wearing both earbuds? The first time I did it, I found it vaguely disturbing, as if the person were talking to me from the inside of my head. Excuse me, I've already got a coupla voices in there, telling me what to do, and you're just confusing me....

Until recently, talking on the phone was, without exception, a monaural experience. Even now, I almost always pull out one earbud out when I'm on a call. But the case of "listening to music, then the phone rings" is so common you quickly get used to the schizophrenic feeling of the voice in your head. In fact, it can even make you feel more connected to your caller, and facilitate communications in high-noise environments, like, say, every street-corner call you've ever made.

Stereo headphones create an audio barrier around your head. The world goes silent (or at least gets a lot quieter), and you navigate through the environment with your own soundtrack. But with stereo headsets, people who have your phone number can now pierce that barrier and join you inside it (and in the exact center of it). If your caller is also wearing a stereo headset, it's as if your bubbles are connected, like a yin-yang. You're inside of their head, and they're inside of yours (see Figure 10).

Fig. 10: Headphone Conversation Bubble

Fig. 10: Stereo headphone conversations put you and your caller inside a yin-yang bubble of communication.

Better-Sounding Phone Calls

But there's a problem: talking on the phone via Bluetooth stereo headset is the equivalent of listening to 8-bit, 8kHz, µ-law compressed voiceovers in full "CD-quality" 16/44 stereo sound. In practice, headsets don't waste the bandwidth, so phone calls still sound pretty crappy. (When receiving a call, the headset goes into Hands Free Profile [HFP] mode, which uses much less power and bandwidth than Advanced Audio Distribution Profile [A2DP] mode, which is used for music.) But there's no reason why the headset can't produce full-resolution voice audio, since it's already doing it for music playback.

Which makes me wonder: how long will it be before voice data is transferred at the same rate as everything else? If I can stream high-resolution video to my cell phone, then surely, eventually, scratchy, noisy, band-limited phone calls will be a thing of the past.

There's another good reason why high-definition voice data via broadband connection is a Really Good Idea™ — conference calls. Right now, when you're on a conference call, you get multiple streams of crappy audio, all mixed together crappily by the phone network. In a mobile broadband world, you could receive multiple streams of conferenced calls and position them in the stereo field for increased intelligibility. If you wanted to get really fancy, you could use 3D audio processing to put the boss at the front of the room and your colleagues on either side.

Sharing Music

Speaking of group auditory experiences, let's talk about music. You gotta use headphones to listen to your tunes, obviously, but you can't share them with anybody that way. But imagine if I could authorize your headset to pick up my phone's audio signal, then we could both listen to what my phone was playing.

Currently, the technology doesn't work that way. There's only ever a single pairing between Bluetooth devices, for obvious security reasons. But the range of these things is only a few feet, so sharing audio streams would be an up-close-and-personal experience anyway. All I'm really talking about is connecting an additional virtual cable to my phone, the equivalent of using a wireless Y-jack.

So now I'm wearing headphones, and you're wearing headphones, and we can both hear the music...but not each other. It's like being under water — or not! These headsets have built-in microphones, so there's no reason why you couldn't mix your voice into the shared music stream. Then I can talk to you, you can talk to me, and we can both still hear the music.

The network then becomes like a virtual boombox that only those in close proximity can hear. When you move away, the virtual cable is pulled and the music drops out of your headset. But since your local network is also connected to phone/data networks, you don't even need proximity for this feature.

Given a high-speed, high-resolution, phone audio network, you and your friend could conference-call into a music server or live performance and chat with each other while the music plays in the background. Since you're both on stereo headsets, you could also use 3D audio processing to position yourselves in the best seats in the house, with your friend on your right (who, of course, would hear you on the left).

Sharing Game Sound

Speaking of 3D audio, let's use that feature in a mobile Star Wars game to send those damn Imperial TIE-fighters buzzing around your head like flies, giving you more reason to swat them out of the sky. Then you can switch to multiplayer mode and contact the rest of your squadron. Now you're bantering via voice data network with Red Leader on your left and Red 5 on your right, all while blasting spaceship formations in coordinated attacks.

To be honest, I'm not sure what effect broadband phones will have on multiplayer gaming, but I'm pretty sure it'll be profound. Social networking and mobile technology go together like apple pie and ice cream, and Mobile Web 2.0 is what all the cool kids are into these days. That trend will only continue to increase, and I can easily imagine mobile multiplayer games, where everyone in the group shares a common audio experience. It could be battlefield bullets, concert footage, proximity alerts, or who knows what!

That's the wild card, the "who knew?" factor. You can track trends, look at the hardware, and make all the predictions you like, but there will always be that one new idea, that unforeseeable circumstance or confluence, that turns things around in ways you hadn't even considered before.

Always In, Always On

Nonetheless, I'm looking at wireless stereo headsets, and thinking that as they become more comfortable, more useful, more powerful, more commonplace, and more stylish, there will be fewer and fewer reasons to ever take them off. Eventually, you'll just stick them in your ears and forget about 'em.

They will become like acoustic contact lenses, or a heads-up display for your ears. They'll let you access and control a virtual audio reality that streams in from wireless networks all around you and is mixed with voice data from your phone and from everybody's phone. And although the ubiquitous audio network I'm describing does not yet exist, you can actually listen to what it might sound like today.

It's completely analogous to being in a recording studio, isolated by big headphones, auditioning multiple tracks, and talking to the control room via live mic. I remember my first time in a real studio: I put on the cans and was astounded by the sense of space, the detailed audio field, and the sound of my own voice — in my head, through the mixing board. Now imagine that feeling as a mobile experience, but instead of talking to the engineer on the other side of the glass, you're walking down Broadway, talking to someone on the other side of the world.

Back To the Future

So, how would that work in practice? Doctor, set the TARDIS to five years in the future! Now, it's the year 2012, and we are looking over the shoulder of Joe "TargetMarket" Consumer as he goes through his day.

First thing, of course, is coffee, and as Joe enjoys his morning brew, he unplugs his mobile device from the charger, puts on the headset, and checks news, weather, and sports, before getting his email. It's such a gorgeous morning that he does it all from his front porch, since he's got broadband connectivity everywhere he goes. He takes an extra moment to watch his favorite video blogger rant about President Obama's reelection.

During the commute to work, Joe checks the online catalog and notices that the new Spiderman game is available. A few button presses later, he's web-slinging his way uptown, enjoying the way the "thwip" sound seems to shoot out and away from his mobile device. But then the game pauses, and the Darth Vader theme plays incongruously, with a screen indicating an incoming call from his boss. Joe sends it to voice mail; he'll listen to it later.

While the game is paused, he selects the "gameplay music" menu item, which takes him to a submenu of his iTunes playlists. He notices a "recommended songs" option, and clicks it out of curiosity. An iTunes screen appears, displaying various playlists intended for use as background music during levels. The first one, of course, is the official movie soundtrack album, remixed for gameplay. Then there are popular DJ mixes of songs from the movie, a user-compiled collection of Swedish death metal and industrial goth, and some music written specifically for the game by a well-known composer.

Joe, being a purist, wirelessly downloads the movie score, and starts webbing up bad guys while grooving to the Danny Elfman theme. But only for a few more minutes, because now he's at the office, and logged into the corporate network. He works at his computer, listening to some deep house grooves, and talks on the phone, switching back and forth easily.

A friend stops by to gossip, so Joe turns his music off and turns on the external mic. His friend does the same thing with his headset, because he wants to show off the outrageous YouTube video everybody's talking about. The friend pulls out his phone, taps it a few times, and plays the video. The audio is streamed to both headsets, and about halfway through, Joe can clearly hear his friend say, "Here it comes!" An office worker passing by is startled when Joe and his friend suddenly, and for no apparent reason, laugh simultaneously.

After work, Joe gets a MySpace alert on his phone, telling him about a party his friends are going to. He uses the phone's built-in GPS locater to navigate to the venue. On the way, he passes by a group of kids, sitting on a stoop, all wearing matching headsets, all nodding in unison to a pounding beat only they can hear. It's a little surreal, but a common enough occurrence these days.

When Joe gets to the party, he finds a group of people playing an MMO tournament with folks in Saskatchewan, Seoul, and Stockholm. Each player is looking at his own device, but they (mostly) share the same audio experience. Joe joins the game, and when he makes a winning move, shouts, "Yeah!" — and opposing players all over the planet moan in dismay.

Later, he chats up a cute girl by noticing the illuminated quicksilver headset she's wearing. It reminds him of the in-ear monitors stage musicians wear, and she shows him how they glow and pulsate in response to the music she's listening to. Joe tunes his own headset to her frequency (with her permission, of course), and together they dance to a song only they can hear. Before he leaves, he takes her picture, enters her email and phone number into his phone's address book, and assigns her a ringtone of the song they were dancing to.

Annoying Audio

When he finally gets home, he watches a little late-night TV (streaming the sound to his headset, of course) before removing the earbuds to go to sleep. As he plugs the phone into the charger, he realizes he hadn't take his headset off once, all day.

Thank you for listening to my speculations about cell phone networks, hardware, and audio. I hope you found it informative (or at least entertaining). For more rants on interactive audio and mobile technology, check out the "Annoying Audio" blog.

Peter Drescher ("pdx") is a musician and composer with more than 25 years of performance experience. He has produced audio for games, the Web, and mobile devices, using his "Twittering Machine" project studio.

Return to digitalmedia.oreilly.com.

Copyright © 2009 O'Reilly Media, Inc.