The McGurk Effect, or what you hear may depend on what you see

by Matthew Gast

Related link: http://www.pbs.org/saf/1502/




I spent a fair amount of time in my car driving around the Bay Area, and the 511 system is a lifeline. When you call, you can get information on traffic conditions, road closures, and even transit times. It's a wonderful tool, with one persistent flaw for me. I often ask it about traffic conditions on Interstate 280, but the voice recognition system often hears "Cotati" instead of "two eighty."




I frequently use I-280 to get home. It runs from San Francisco to San Jose, which is nowhere near Cotati. Cotati is generally irrelevant to my travels--I've only been near it three or four times in the past eight years. I've found that asking about "two-eight-zero" often helps the 511 voice recognition system sort itself out, but I usually ask it for "two eighty" first, since that's what everybody around here calls the highway. Certainly, "280" sounds similar to "Cotati," but I have never had a problem making myself understood, even over the telephone with others. Context may be a component of the answer, but the sounds are different enough that I thought perhaps there might be more to the story. 511's voice recognition has always confused the two, in spite of its generally excellent performance.




Last night, a January episode of Scientific American Frontiers provided a hint. In the third segment of the program, which can be conveniently watched on-line, Alan Alda visits a researcher at IBM who is designing a "virtual passenger." Although the voice recognition is quite good, it sometimes fails due in part a phenomenon called the McGurk effect. Discovered by Harry McGurk in 1976, the effect occurs because there are certain sounds for which the visual appearance is an important part of the perception of what is heard. (The program video available at the PBS site includes a demonstration; other demonstrations are available here and here.) While watching the show, I rewound the program to listen to the pronunciation with my eyes closed, versus watching the tape. Sure enough, I heard one sound with my eyes open, and another with my eyes closed. The researchers on the program speculated that future in-car voice recognition would probably use a camera to augment the microphone, though that option isn't available for a system like 511.





2 Comments

BrianSawyer
2005-02-07 06:02:26
See Mind Hacks
For a great account of how the McGurk effect works, along with a whole mess of other cool stuff, be sure to check out O'Reilly's Mind Hacks by Tom Stafford and Matt Webb.
MatthewDoar
2005-02-07 09:48:01
Hold your nose!


When I use 511, I find that an English accent won't work. So I have to try to speak with an American accent (and my family laugh at me) So now I just hold my nose to make the number more nasal-sounding.


Not quite so hands-free as you might want, but it does work.


~Matt