Interactive Voice Response
This section of our technical library presents information and documentation relating to IVR and interactive voice response software as well as automatic call answering solutions.
Business phone systems and toll free answering systems (generally 800 numbers and their equivalent) are very popular for service and sales organizations, allowing customers and prospects to call your organization anywhere in the country.
Our PACER and Wizard IVR systems add another dimension to our call center phone system solutions. An Interactive Voice Response (IVR) processes inbound phone calls, plays recorded messages including information extracted from databases and the internet, and potentially routes calls to either in-house service agents or transfers the caller to an outside extension.
Just How Practical IS Speech Recognition Software?
by Shelley Haven
The product tagline for IBM's ViaVoice speech recognition software proclaims "You talk; it types". And it's true! The real issue, though, is how accurately it types what you say. Many people try speech recognition (SR) software, it doesn't meet their expectations straight out of the box, and they discard it as technology that's mostly hype. But if you are willing to devote the necessary time and effort and heed a few simple practices, SR can be a valuable addition to your software arsenal.
The Assistive Learning Technology Center (ALTeC) has offered speech recognition on its accessible lab computers for over six years. The software has proved successful for many students receiving services through the Student Disability Resource Center for RSI problems, physical disabilities, and certain learning issues. In addition, several faculty and staff use SR to alleviate RSI or simply to boost productivity.
In our experience, user success with SR hinges on three factors: willingness to train the software and correct misrecognized words; type of microphone and its positioning; and learning how to dictate (in contrast to speaking conversationally). The following discussion applies to the most popular SR programs: ScanSoft's Dragon NaturallySpeaking (PC only), IBM's ViaVoice (PC or Mac), and MacSpeech's iListen (Mac only).
How Speech Recognition Works
Understanding how SR software works can help you to work with it. Humans use a number of clues to recognize speech: not only the sounds of words, but an understanding of language and sentence structure, context, tone of voice, cadence, and even gestures and facial expressions. SR programs, however, do not understand language or grammar and can't even distinguish where the sound of one word stops and another begins. Instead, they rely on just three things: the audio input from the microphone, their built-in vocabulary, and a statistical database they keep of how you uniquely combine words (that is, the likelihood that you would use a particular word after this word and before that one).
Most Important Factor: User Diligence
Each of these SR programs require the user to create an initial voice profile. Reading 10-15 minutes of text as it appears on the screen allows the software to match words with the audio waveforms it receives through the microphone and sound card. This gives it a good start on how to interpret incoming sounds, match them with any of the tens of thousands of words in its vocabulary, and write its "best guess" on the screen.
Of course, some words won't be in its vocabulary and it may misrecognize words that are mumbled, slurred with adjoining words, or pronounced inconsistently. That's where additional training and correction come into play. Each time the program misrecognizes a word, the user needs to correct it with the program's built-in correction functions. This might involve inputting and speaking a new word (such as a surname) or selecting a word from the software's second through tenth guesses at what was said. Each such midcourse correction moves the software closer to accurate recognition, and the need to correct decreases significantly after several hours of regular use.
Reading additional voice training text (from preinstalled selections) also increases accuracy - an hour of extra reading can make a significant difference. The software can also examine documents you've written, identify words it doesn't know, and prompt you to train these into your voice profile.
Some users get frustrated with the need to correct the software (especially at first) or to perform additional training and give up prematurely. But diligent ongoing correction and a willingness to perform adequate training will pay off relatively quickly.
Microphone is Critical Link Between Speaker and Computer
Since the software relies heavily on what it hears to recognize speech, a good microphone can make a considerable difference. Use a noise canceling headset microphone to help separate the sound of your voice from background noises and to keep the mike in a consistent position relative to your mouth. Rather than employing the computer's sound card to convert analog audio to the digital stream used by the software, many experts recommend using a USB microphone adapter - essentially a tiny external sound card which converts your voice to digital outside of the electrical and acoustical noise present inside computers. A headset with an online mute switch also can be handy for reasons mentioned in the next section.
Position the microphone close to your mouth (about 3/4 inch) but off to one side. This alleviates distortion from the puff of air associated with plosive sounds like "p" and "b". To evaluate the audio quality of what the computer hears, listen to your voice by using the Sound Recorder application in Windows (under Accessories > Entertainment) or the recording function built into TextEdit on a Mac. Adjust microphone position or audio settings as necessary to assure a strong, clear, static-free voice.
It's Not What You Say But How You Say It
As mentioned earlier, SR software relies heavily on how you pronounce words. The first SR programs in the early and mid '90's (Dragon Dictate and Power Secretary) required the user to speak... each... word... discretely... like... this. Current programs allow speakers to talk naturally, but care must be taken to articulate words distinctly and not slur them together ("want to" vs. "wanna"). Consistency (enunciating words the same each time) is also important, as the software tries to match audio waveforms with the most likely words they represent. Modeling one's speech after that of a newscaster will produce significantly better results than using ordinary conversational speech.
In addition to sound, SR software partly bases its "guesses" on the likelihood of the guessed word being used before and after adjacent guessed words. Therefore, talking in phrases or complete sentences provides more accurate recognition than uttering individual words. Unfortunately, that's not how most of us talk, especially as we think of what we want to say. In addition, we, uh, insert other sounds while we're, um, coming up with ideas; change words in the mid--...in mid-sentence; and artificially extend words to-o-o-o fill in gaps while we think. To avoid this, most users find it helpful to first mentally compose what they want to say and then say it rather than multitasking both thinking and talking. A microphone with a mute switch can keep the program from listening while you think.
Also, punctuation must generally be spoken. Thus, for the computer to type ["It was a dark and stormy night," said Snoopy.] followed by a return, one would say "Quote, it was a dark and stormy night, comma, quote, said Snoopy, period, new line." This can sometimes be disruptive and interfere with one's compositional thought. In such cases, it might be best to first get the words out, then go back and punctuate as necessary.
Other Factors to Consider
Unless absolutely necessary for reasons of disability, SR software is most effective when used in conjunction with some type of keyboard and mouse. Productivity may be optimized if you use SR to replace most of the keyboarding burden and use a mouse (or mouse substitute) for screen navigation and to execute certain commands.
With the proper training, the right equipment, and attention to speech practices, prospective speech recognition users will more than likely be rewarded with good results. Many students, faculty, and staff already have.
- While all SR programs include their own built-in word processors, they work in varying degrees with other applications. For example, Dragon NaturallySpeaking and ViaVoice both work well with Microsoft's Office products, but ViaVoice's correction function is disabled. Users can speak into any program with iListen, and MacSpeech sells ScriptPaks to enhance its functionality with other programs.
- SR software is available in various languages as well as for different dialects (e.g., UK English, SE Asian English). These versions may work better for some users.
- SR also works with web browsers, allowing users to find text and click text links.
- Windows XP and higher includes speech recognition functionality, but (by Microsoft's own admission) it is not as robust as standalone programs such as those discussed above.
- If your primary need is for voice-operated navigation rather than voice-operated text input, consider Commodio's Q-Pointer line of products.
- Lastly, be careful not to overuse your voice or strain it unnaturally. Take regular breaks (just as you should do when typing) and sip water to keep your vocal folds well lubricated.
For More Information
If you have questions about computer accessibility and technology accommodations, want consultation on these issues, or just wish to learn more about the intriguing assistive technology available, call Shelley Haven in the ALTeC lab at 725-6173. ALTeC's services are available to students, faculty, and staff who need assistance due to a disability. Students should contact the Student Disability Resource Center (SDRC) at 723-1066 for more information. Faculty and staff who would like to access the Center should contact Rosa Gonzalez, Stanford's ADA/504 Compliance Officer, at 723-0755 for a referral.
Wizard Simplifies Development
DSC provides IVR software including our IVR wizard development tool for creating interactive voice response applications.
Our IVR software lets you increase IVR development productivity by providing a visual development environment. IVR applications can be defined in minutes using this sophisticated, yet easy to use development tool.
DSC also has available a comprehensive IVR software library known as our IVR Wizard Software Development Kit. This optional package is available for programmers and systems adminstrators who wish to manage IVR programs fromLinux IVR, Unix, or Windows IVR operating environments.
Data collected by your phone ACD (Automatic Call Distribution) or IVR (Interactive Voice Response) systems can be passed to your existing PC, Unix or Web applications through our phone software.
The PACER predictive dialer can automatically call your customers and pass only connected calls to your agents. With our computer telephony software, your telephone and computer work together to provide cost-saving benefits.