Voice on Your Web Site? Now You're Talking!

by John Paul Ashenfelter

Unless you've been living under a rock, you've no doubt heard about the coming wave of applications for the wireless web ... and the resulting exponential growth in related technologies such as WAP, WML, WMLScript, and a host of other brand-new technologies. Maybe you've even taken the plunge and tried working with some of these tools -- and found that the old cross-browser problems of Navigator and Internet Explorer are nothing compared to the cross- and even intra-browser incompatibilities of wireless devices. But opening up your existing web applications to the legions of wireless disciples is a compelling motivation for many businesses.

What if I told you there was a way to take your existing web site and make it accessible not only to anyone with a cell phone, but to anyone with a telephone of any sort? What if such a technology was based on open standards? What if you could start developing with this technology right now?

Here's a whirlwind look at using CallXML, VoiceXML, and a company called Voxeo to help you access your web application using a plain old telephone.

Web telephony basics

There's no real secret to turning a web application into a phone-based application -- all you need is a browser that can connect to the web server and a page on the server that the browser can interpret. In principle, turning an existing PHP application (for example) into a WML-enabled site basically requires connecting the wireless device to the web server (this burden is handled by the phone service provider) and outputting existing content marked up in WML instead of HTML.

But what about all the folks without fancy cell phones that incorporate WAP/WML/etc.? Using a phone of any sort as a web application browser should only require a way to connect to the web site using that telephone and a way to output your web page in a way the browser can understand. Of course since this is a plain old phone, the connection method we're limited to is a standard phone number, and the only browser that's available is a combination of the user's ears, voice, and the phone keypad. Fortunately there are technologies and standards that exist to help enable web telephony (WT) applications.

The browser side of a WT application is direct interaction with the user through sound and voice as well as the phone keypad. All the technical magic in this scenario lives on the server-side markup. And to make matters even more interesting, there are at least three different possible server-side "language" choices for WT applications:

  • VoiceXML, an XML standard managed by the VoiceXML forum founded by Lucent, Motorola, IBM, and AT&T. This is specifically geared to building voice-response systems and included voice-recognition technology as well as text-to-speech (tts). It is currently part of the W3C Voice Browser Working Group and will likely morph into the standard for voice browsing.
  • CallXML, an XML standard created by Voxeo that is specifically oriented towards managing phone calls and using touch-tone phones to manipulate web applications. This is simpler than VoiceXML for traditional touch-tone phone applications and is also capable of managing the calls themselves, including transfers and conferencing of other calls.
  • Microsoft WTE, a web telephony engine that ships as part of Windows 2000. This is a COM object that lets you build WT applications and script them in either stand-alone mode or through the Web using a tool such as ASP.

Each of these tools can be effectively used to create WT applications, but that's only half the battle. The hard part is connecting the phone number to the web applications. It can also be extremely expensive to build the appropriate hardware infrastructure for a WT application -- between the modems, telephony hardware, voice and speech software, and all the related technology, you will end up investing significantly in both hardware and time to build your own WT infrastructure. An alternative would be outsourcing, and that's where Voxeo comes in.

At the risk of sounding like a marketing person, I'm in love with what Voxeo can do and how well it works. Voxeo is basically an ASP for WT application infrastructure. In a nutshell, they provide you with a phone number that you can map to a URL. In effect, this means that, as an application developer, all you have to do is convert your server-side applications to output one of the WT markup languages while they provide the hardware infrastructure. And, more importantly, it is currently free for developers!

It looks like this is a limited-time offer, but their current commercial licensing model is based on a per-port monthly charge comparable to outsourcing a web site. To make the pot even sweeter, they provide a number of open source CallXML and VoiceXML applications that are available from their site and will soon be on SourceForge. They also provide a number of VoiceXML and CallXML tutorials, including examples of using ColdFusion, JSP, ASP, and PHP to build interactive applications. To finish off the toolset, they also provide an object-oriented development environment (Voxeo Designer) for building WT applications, though I found it much easier just to continue using a text editor.

Voxeo in action

I saw a demonstration of the Voxeo tools at the Allaire Developer's Conference, was fairly impressed by the technology, and immediately came back to the office determined to build an application. The first step is registering as a member of the Voxeo developer community. From that point, the main activities are managing URL mappings and using the logging tools to debug applications. The URL Mappings tab will let you register a phone number in one of the metro regions (currently CA, NJ, and NY, with about 20 more coming online in the coming months) and point it to a URL. The only problem I've seen so far is that once you select the type of server-side markup (CallXML, VoiceXML, MS-WTE), there is no way to change the markup type for that number. Once the phone number is mapped to a URL, all you have to do is build a WT application using one of these markup languages.

The Voxeo site says you'll be able to build WT applications in about an hour, and they're not far off the mark as far as quick and dirty development goes. They can walk you through several versions of the Hello World application, but it is a lot more fun to roll your own WT application. I built a quick WT front-end using CallXML to look up user information from an existing personnel database. I found that CallXML is faster for application development unless you need VoiceXML for speech recognition. Here's the code for the first page.

The user's badge number is used as a key to look up the information. Control is then passed to a ColdFusion page (though you can use PHP, Perl, or any other server-side markup language that can generate XML) that does the database lookup and "formats" the results for the text-to-speech engine to render (speak) into the user's browser (ears). Here's the code for the second page.

Now on to running the application. All we really need to do is phone the application, though I found it saved a lot of calls from Virginia to California if I followed a few simple debug steps first:

  • Validate the XML. One quick way is to use Internet Explorer to parse the XML document and ensure it is well formed.
  • Check the functionality of your server-side code by hard-coding the query or other similar functionality and running it to make sure it works.
  • Make sure any dynamically generated content is well-formed XML.

Once the preliminaries are out of the way, it really is time for a phone call. I'd heartily suggest using their logging capability to help debug your WT application since virtually any error results in a hang-up. Here's the transcript of a call to the application showing all events, URLs, and everything else going on behind the scenes.

Final thoughts

In an information economy, any way you can leverage existing information through a new channel can generate efficiencies that lead to better business. While WAP, WML, and the rest of the wireless world are exciting, plain old telephones are in use by 1.5 billion people. Web telephony applications are an excellent way to leverage existing web applications into the telephone world, and Voxeo provides a solid set of tools to help.

There's little doubt in my mind that more providers like Voxeo will emerge and that tools and hardware for building your own WT gateways will get cheaper and easier to use. New platforms for delivering WT applications, such as the IBM WebSphere Voice Server SDK, are emerging and will allow developers to build their own infrastructures as well as lead to an increase in ASP options for WT applications. But open standards like VoiceXML and CallXML will allow developers to build WT applications now and migrate them to future platforms in a fairly straightforward manner. It's ironic that plain old telephones may become the ubiquitous browser of the future.

John Paul Ashenfelter is president and CTO of, a technology development and analysis group focusing on web database applications.

Discuss this article in the O'Reilly Network Forum.

Return to the O'Reilly Network Hub.