VoiceXML An investigation Author: Mya Anderson Supervisor: Prof. P Clayton
Introduction VoiceXML recap The problem area My approach Some things I did… Findings,opinions and views Conclusion and questions
VoiceXML recap
VoiceXML = Voice Extensible Markup Language XML-based Internet mark-up language 4 developing voice interfaces First version released 1999 Latest version released April 2002
Allows access to Web via telephones Works in Voice Browser Provides standards-based interface for: Automatic Speech Recognition (ASR) Text-to-Speech (TTS) Dual Tone Multi-Frequency (DTMF) Call handling Other technologies
The Problem Area
VoiceXML is a new technology Many promised benefits & advantages Relatively little information available. Task Find out more Try establish whether it had any value
My approach
Step 1 Lots of reading! Web sites Text books Software manuals
Step 2 Try write a simple VoiceXML application Involved Deciding on a development environment Picking a voice browser to interpret my VoiceXML pages Getting to grips with syntax Finding examples
Step 3 Try write a prototype for Rhodes Online Student Services (ROSS) Help determine Ease of development Speed of development VoiceXML capabilities
Development environments Four potential environments: Hosted Web-based Simulated “Real”
Hosted Use Voice Service Provider Telephone access No need to worry about hardware issues Tools sometimes provided
Web-based “Integrated Development Environment” Many tools offered Provides most current support Best option, if available Best way to start Not available in SA
Simulated Stand-alone browser runs on PC Use headset or speaker and microphone Good for initial development Available browsers bit behind current standards Can’t test “real” input i.e. phone
“Real” Set up on gateway Expensive Requires hardware and telephony expertise
Selected environment Stand-alone Cheapest Browsers freely available No SA-based Voice Service Providers for a web-based environment (preferable) No need to set up gateway
Some things I did…
Started ROSS prototype Input & output student number Terminated by software issues Experimented with mixed-initiative dialogs Traditional IVR (e.g. cell-phones) = directed dialog Mixed-initiative dialogs “intelligent”
Used inline & external grammars Wrote JGSF grammar JGSF (Java Speech Grammar Format) GSL (Grammar Specification Language) SRGF (Speech Recognition Grammar Format) Did some basic scripting
Findings, Opinions and Views
Speech recognition is still VERY sensitive to background noise and loudness of speech VoiceXML (as a language) is relatively easy to use VoiceXML integrates very well into existing web infrastructures Standalone browsers are behind in what they are able to support
Very little technical assistance is available Standalone browsers are still quite “buggy” Documentation is limited Voice technologies (TTS, ASR) still not good enough Audio files VoiceXML applications are being used commercially in the USA & Europe
VoiceXML offers major advantages and cost savings for developing IVRs Smaller developer learning curve Able to generate pages dynamically Uses much of the web infrastructure already in place for normal web pages
HCI issues are VERY important when developing voice applications Badly designed VUI will simply NOT be used.
Other Issues Possible threats Poor voice technologies SALT (Speech Application Language Tags) New language combining voice & data technologies
Standards still not established/comprehensive enough Some issues still left to the developers’ discretion Certain companies implement proprietary solutions to problems resulting in platform and/or browser dependence
Conclusion
VoiceXML has much potential Highly dependent on underlying technologies Standardisation still in progress Reduces portability Inc risk (e.g. grammars, rewriting)
Already better than existing IVR development technologies Flexible Easier to learn Easier to write & modify Many potential future benefits Here to stay? Too early to tell Highly dependent on external “ifs”
THANK YOU! Any Questions???