Interactive Voice Response
This section of our technical library presents information and documentation relating to IVR and interactive voice response software as well as automatic call answering solutions.
Business phone systems and toll free answering systems (generally 800 numbers and their equivalent) are very popular for service and sales organizations, allowing customers and prospects to call your organization anywhere in the country.
Our PACER and Wizard IVR systems add another dimension to our call center phone system solutions. An Interactive Voice Response (IVR) processes inbound phone calls, plays recorded messages including information extracted from databases and the internet, and potentially routes calls to either in-house service agents or transfers the caller to an outside extension.
Developing a Voice User Interface (VUI) System
by Carla King
With 1.5 billion telephone users versus only 250 million PC users worldwide, your company is probably already thinking about adding a voice interface to your existing Web site. The good news is that if you've already got a well-designed GUI back-end system, you won't need to pour much development effort into modifications for VUI. But you need to learn a few new tricks: VUI design, VoiceXML programming, and some telephony and audio-handling skills.
This article provides an overview of VUI design principles you'll need to learn, describes the architecture of VUI systems, and offers some VoiceXML (Voice eXtensible Markup Language) development and performance tips that will give you a starting point for planning your equipment and scheduling requirements.
- What You Need to Know About VUI Design
- VoiceXML Overview
- VoiceXML Systems Architecture
- What to Look for in a VoiceXML Platform
- VoiceXML Development Tips
- VoiceXML Performance Tips
What You Need to Know About VUI Design
Remember the first Web pages, before graphic designers and content editors became involved? This is the unfortunate state of the average VUI interface today. A good VUI requires the talents of a VUI designer who can choose the right elements for the VUI front end, such as background track music, voice type, directed dialog or hierarchical structure, and who can hire appropriate voice talent and usability testers.
Even though you're not responsible for the front-end work, your life as a VUI developer will be easier if you know something about the designer's tasks. While you won't be hiring voice talent or choosing background music, you'll want to learn the basics of VUI design by creating your own simple VUI (horoscope, weather, sports news, and so on) using one of the multitude of free VUI design portals available on the Web.
Here are some of the other concerns of VUI designers. There are a lot of articles on the Web that provide tips and describe the pitfalls of VUI design. If you become familiar with the following concepts, you'll be able to work more efficiently with your VUI designer:
- Grammar: The role of a grammar in an application is to define words and patterns of words that can be spoken and to interpret the spoken input. The W3C (World Wide Web Consortium) Speech Recognition Grammar Format is required for VoiceXML 2.0 browsers.
- Prompts and call flows: VUI designers use visual tools such as Visio to trace prompts and call flows through the interface. These flows determine the structure of the interface, the allowed interrupts, and other features.
- Structure: A hierarchical structure asks users to choose from an ever-more-specific list of items. A directed dialog structure leads users down a path to their desired outcome. VUI design experts say that a perfect VUI doesn't require the user to utter a sound -- an intriguing challenge, but there are many articles on how a VUI ought to be designed.
When augmented with VoiceXML, your voice portal -- the interface between a caller and an information source -- can channel your Web-based information from your servers to telephone users. VoiceXML is an HTML-like language for specifying voice dialogs. It brings together speech and telephony technologies such as automatic speech recognition (ASR) and text-to-speech (TTS) in a markup language so your software can take direction from users' spoken words or their telephone keypad tones, and respond to them via synthesized speech or audio files. VoiceXML provides this dialog management capability for the application using conventional Web and application servers.
VoiceXML version 1.0 is currently available. Version 2.0 has been on the verge of release for the past few months pending W3C clarification of non-technical issues.
VoiceXML Systems Architecture
A basic VoiceXML system, as illustrated in the following diagram (see Figure 1), consists of a server and browser just like an HTML system.
A user call is passed through the public telephone system (PSTN) via the voice line to the VoiceXML gateway. Users may also be routed to any legacy IVR platforms connected to the VoiceXML gateway.
The VoiceXML gateway terminates the phone line and passes the call to the voice browser. VoiceXML gateways currently run in various degrees of compliance with VoiceXML Specification version 1.0. You can install your own VoiceXML gateway or use any of a number of gateway services.
The voice browser determines the calling number via an Automatic Number Identification (ANI) or a Dialed Number Identification Service (DNIS) and performs other authentication, verification, and security functions. The user login is controlled by a VoiceXML page. Twenty to thirty companies now have voice browsers in various stages of development.
Once authenticated, the user is passed on through the Internet via HTTP to the Web server. VoiceXML pages determine which scripts and speech grammars are used.
What to Look for in a VoiceXML Platform
When you're seeking out a VoiceXML platform, check out all the features you'd look for in an HTML platform, such as operating system preference (for example, UNIX) and performance, with the following special considerations:
- Specification compliance: Consider the vendor's commitment to the VoiceXML and the W3C specifications and standards. Next year, when the technology is more solid and the specifications more mature, many vendors will be more compliant than they are right now.
- Tools: Look for good development and debugging tools so you can emulate and debug on your desktop. However, don't expect the tools to be perfect. VoiceXML is too young for vendors to have had the chance to standardize and perfect tools. Tools are also available on the Web from VoiceXML development portals where you can map the code, and debug and log, almost like an ASP service.
- Scalability: Choose products with scalability in mind in case VUI becomes the next hot thing for your company.
- Vendor support: Make sure the platform supports a wide variety of third-party speech-engine vendors who provide ASR, TTL, and telephony (rich PSTN protocol support and VoIP). Consider language specialties as well. For example, some companies have better support for European languages and others for Asian languages.
VoiceXML Development Tips
You can develop VoiceXML applications in any environment you're comfortable with -- servlets, JavaServer Pages[tm] (JSP[tm]) pages, Perl, CGI, ColdFusion -- to output VoiceXML as simply as you output HTML. Here are some tips to consider before and during the development process:
- Use good business logic: This basic design principle is more important than ever. If you keep the presentation layer separate from the business logic in all of your systems design, your Web and application servers can use the same back-end data to output HTML, XML, WML, and VoiceXML.
- Utilize a good, clean GUI database: It can easily work for a VUI database. However, since the human ear simply can't take in as much information as the human eye, you may have to add fields to create smaller, aural-friendly chunks of data.
- Test and tune for different vendors: VoiceXML applications are touchier and more time consuming with speech recognition than any other application, especially in these early days of VUI. If your system will be accessed via multiple gateways, allocate the extra time you'll need to tune for those gateways. If users access your site via cell phone, the recognition rate will tend to be poorer, and may require research to determine if software upgrades or optimization are necessary. Most ASR vendors recommend adding a tuning phase in early product deployment to allow time to tune prompts, grammars, and call flows.
- Evaluate authentication abilities: Basic authentication capabilities are built into VoiceXML, such as caller ID, PIN authentication, keyword entry, and password identification. If you want more sophisticated authentication, you can add a speech recognition product (from companies such as Nuance, SpeechWorks, and Philips) to the VoiceXML gateway to require voiceprint verification of each call before passing it to the voice browser.
VoiceXML Performance Tips
Many of the performance issues for VUIs are similar to those of GUI systems, but some are exclusively voice-related. Perceived performance is critical in voice applications. Here are some issues to consider when designing your voice-enabled system:
- Server proximity to gateway: Make sure there is adequate performance on the server side to respond to requests for downloading WAV files. Also, ensure that you have a very effective network (or that the files are not more than one hop away from the gateway). Good VUI applications tend to use WAV files wherever possible -- recorded audio sounds much better than TTS. Three aspects to "proximity" are server performance, network bandwidth between the Web server and the voice gateway, and latency ("quality of service").
- Grammars: Arrange with your voice gateway provider(s) to preinstall any especially large files (grammars or WAV files) that will be repeatedly downloaded. This saves download time and eliminates awkward pauses in the conversation.
- Audio quality: Don't waste effort designing in more audio quality than the telephone can handle, for example, stereo encoding for expressing data over a regular telephone line. Stereo encoding increases download times and browser interpretation effort. Recordings should be made in 8-bit 8 kilohertz formats.
- File caching: Make sure the Web server and application allow caching to work, and that the VoiceXML cache attributes allow cached content to be used. VoiceXML allows you to cache files and specify that certain files be accepted for a certain amount of time after expiry to prevent delays (for example, so that the same WAV file is not downloaded to the gateway again and again).
Many thanks to Jim Ferrans, principal staff engineer at Motorola Internet Software and Content Group; Rob Marchand, director of products and services at VoiceGenie Technologies Inc.; and Joe Nuxoll, director of platform technology at Voxeo Corporation.
Glossary of Terms
- ANI: Automatic Number Identification
- ASR: Automatic Speech Recognition
- DNIS: Dialed Number Identification Service
- DTMF: Digital Tone Multi Frequency is the system used by touch-tone telephones. DTMF assigns a specific frequency or tone to each key so that it can be easily identified by a microprocessor.
- GUI: Graphical User Interface
- IVR: Interactive Voice Response is a telephony technology in which someone uses a touch-tone telephone to interact with a database to acquire information from or enter data into the database.
- PSTN: Public Switched Telephone Network is the international telephone system based on copper wires carrying analog voice data.
- TTS: Text-To-Speech
- VoIP: Voice over Internet Protocol
- VUI: Voice User Interface
- WAV: An audio file format
- WML: Wireless Markup Language
- WT: Web Telephone
Wizard Simplifies Development
DSC provides IVR software including our IVR wizard development tool for creating interactive voice response applications.
Our IVR software lets you increase IVR development productivity by providing a visual development environment. IVR applications can be defined in minutes using this sophisticated, yet easy to use development tool.
DSC also has available a comprehensive IVR software library known as our IVR Wizard Software Development Kit. This optional package is available for programmers and systems adminstrators who wish to manage IVR programs fromLinux IVR, Unix, or Windows IVR operating environments.
Data collected by your phone ACD (Automatic Call Distribution) or IVR (Interactive Voice Response) systems can be passed to your existing PC, Unix or Web applications through our phone software.
The PACER predictive dialer can automatically call your customers and pass only connected calls to your agents. With our computer telephony software, your telephone and computer work together to provide cost-saving benefits.