Singing With Your Thumbs: How To Make User Interfaces Musical
Pages: 1, 2, 3

What Happened

Demi on Letterman

We sold every single product we made! Suddenly Jay-Z is prominently displaying his Sidekick in a video, and boom, every rapper's gotta have one. Then Demi Moore is instant-messaging Ashton Kutcher during the Letterman show, and bam!—everybody in Hollywood's gotta have one too. Suddenly, we're under the gun to produce the sequel.

Sidekick II

From an audio point of view, there have been a few improvements, most importantly with the speaker. It's big, the size of a quarter. Even better, this sucker is LOUD! The output sample rate is now 16kHz for a clearer high end, but we're still using 11.025kHz samples and IMA 4:1 compression. The audio capacity has been doubled to 500KB, which enabled dramatically improved audio quality for the built-in ringtones.


The audio UI theme for the Sidekick II was called Abstract Technical.

We wanted the [UI sounds] to be subtle, clean, sythesized, and, most important, not mickey-moused. I'd realized that it almost doesn't matter what sound you play in response to a button push or menu option, as long as it's consistent. Your ear will associate this sound with that action quite rapidly, almost regardless of what the sound actually is. As long as it's not jarringly wrong (like a explosion for a button press), or too "on the money" (like AOL's "You've got mail!"), practically any abstract synthesized sound will do.

The trick comes when they all have to work together without clashing. And again with the Sidekick II, I ended up being fairly diatonic with the occasional chromatic accent, almost against my better judgement. That's just what seemed to work best.

Sad Clown

Sometimes, you'd like the form of the sound alone to convey the intended message. That's useful when you need to alert the customer to a specific condition, like low battery. On any cell phone, running out of power's bad, but with the Sidekick, it was double bad because all your stored data had to be redownloaded from the service. God forbid you should run out of power in an area with no service; your device became a useless brick. Better to turn it off rather than let the battery drain.

In the first version, Joe Britt, one of our founders, told me to make the low battery sound "as horrifying as possible," which I did, all tritones and nasty harmonics. But people were actually frightened by the abrupt noise of the [alert], and so didn't understand it meant "plug me in!" The second time around, I wanted to make the alert more onomatopoetic and came up with this, now universally known as [sad clown]. I have been told that new users have associated "sad clown" with "low battery" on the very first listen, which I count as a personal triumph.

What Happened Next

Well, we sold every one of those devices we could make, too! The day after the "Paris Hilton got hacked" story, we were sold out in New York and L.A.—apparently, there really is no such thing as bad publicity. Ringtone sales went through the roof, and Sidekicks started popping up everywhere, prominently featured in movies and on TV, and most especially in music videos. We even had [Snoop Dog] doing catch phrases. And so, we began work on the next version....

Sidekick 3

The Sidekick 3 supports "CD-quality" audio because of the new MP3 player application.

Sidekick 3

The new device is a big departure from the previous one, and turned out to be a good news/bad news joke for the audio. Good news: MPEG compression is now available, yay! All that high-end "frying bacon" noise and crunchy audio you get with IMA 4:1 is gone—replaced by lo-rez MP3. Output sample rate was quadrupled to 44.1kHz, more than doubling the frequency response.

More RAM was available, so my audio budget doubled again, up to 1MB. With an almost 20:1 MPEG compression rate, I could make big fat audio ringtones without taking up a lot of space. As you might imagine, I was fairly excited to get to work.

Until the new hardware came in, and then I discovered the bad news: the speaker was the size of a penny...and there was only one! In previous versions, there'd been two speakers, one on the front, wired into the phone (small and quiet, for your ear) and another on the back, wired into the OS (big and loud, for open air). This time, to make the device smaller, there was just the one speaker to pull double duty—and it suuuucked! But as with any multimedia product, you simply deal with what you got.

Because the new hardware is all shiny and smooth, with transparent keys and a glowing trackball, the audio UI came to be called ["crystalline"]. The design director said he wanted it to sound "like drops of water in a cave" or "a shimmering crystal ball"; he wanted people to go, "Ooooooo." And every sound designer knows what that means: more reverb!

I Love the Sound of Breaking Glass

Reverb on a mobile device seems conceptually awkward—this tiny thing emiting gigantic sounds like you're in a cathedral. And of course, it can't make giant sounds. The device has no bottom end, no stereo (no separation even if it did have stereo), no clarity, and no volume. In short, it has none of the things necessary to fool the ear into thinking you're in a cathedral. But it turns out that keeping the reverb tails on many of the sounds gave them an increased sense of space.

Gathering source material was fun. I recorded all kind of things that went [bing!]: glass bowls, agogo bells, pipe clanks, you name it. Wine goblets and drinking glasses were the most useful, and I recorded a lot of them, because you never know how the file will sound when it's compressed and played on the device.

On the other hand, knowing that low MPEG compression rates create a characteristic blurring of transients, I purposefully encoded sounds with extremely sharp attacks, to be smoothed out by the algorithm. That worked surprisingly well, saved me a lot of space, and is an excellent example of learning to love your limitations.

And yet again, even though I set out to use non-musical tones, the sounds that worked the best were in tune with each other, and the whole thing became a kind of glass harmonica after all.

Welcome to the World of Tomorrow!

Audio UIs may currently be considered an esoteric corner of the interactive music world, but that may change, and for a very good reason: money, and lots of it! Given the wild popularity of ringtones, it seems likely that customizable system sounds represent a sizable revenue stream for carriers. How much would you pay to download a package of Simpsons wallpapers, ringtones, and "annoyed grunt" button effects, or even cooler, make your iPhone not only look like a Star Trek datapad, but sound like one too?

Bootup sounds can also represent a valuable branding opportunity. Remember that the next time you land safely at your chosen destination, the captain turns off the Fasten Seatbelt sign, and a planeload of passengers turn on their cell phones. Compare and contrast the various sounds and audio technologies demonstrated. Do the phones just beep, or do they do a little song and dance? Hear anything you [recognize]? If yes, then you have been successfully indoctrinated by T-Mobile's marketing department.

annoying audio

And There’s More

I'd like to give special thanks to the Game Audio Conference advisory board for letting me rant about this topic. For more rants about interactive audio, mobile games, and other things that annoy the living crap outta me, check out the Annoying Audio blog, published periodically by the kind (and brave) folks at O'Reilly Digital Media. Thank you.

Peter Drescher ("pdx") is a musician and composer with more than 25 years of performance experience. He has produced audio for games, the Web, and mobile devices, using his "Twittering Machine" project studio.

Return to