Download presentation
Presentation is loading. Please wait.
1
Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science http://www.cs.cmu.edu/~air
2
Outline Speech Types of speech interfaces Speech systems and their structure Designing speech interfaces Some applications –SpeechWear –Communicator
3
Speech as a signal The difference between speech and sound –“CD” quality vs. intelligible quality high-quality is 44.1 / 48 kHz desirable speech bandwidth: 0-8kHz, 16bits –at 16bits/sample: 256kbps (tethered mic) –telephone: 64kbps (and lower) –Compression: –MPEG: 64kbps/channel and up (but not speech-optimal) –CELP: 16kbps … 2.4kbps (optimized for speech)
4
Speech for communication The difference between speech and language Speech recognition and speech understanding
5
Computers and speech Transcription –dictation, information retrieval Command and control –data entry, device control, navigation Information access –airline schedules, stock quotes Problem solving –travel planning, logistics
6
Speech system architecture SIGNAL PROCESSING DECODING UNDERSTANDING DISCOURSE ACTION
7
Varieties of speech systems
8
A generic speech system speech Signal processing Dialog manager Decoder Parser Language Generator Speech synthesizer Post parser Domain agent Domain agent Domain agent speechdisplayeffector
9
Decoding speech Signal processing Decoder Reduce dimensionality of signal noise conditioning Transcribe speech to words Acoustic models Language models Corpus-base statistical models
10
Creating models for recognition Acoustic models Language models Speech data Text data Train Transcribe*
11
Understanding speech Parser Post parser Extract semantic content from utterance Introduce context and world knowledge into interpretation Grammar Context Domain Agents Grounding, knowledge engineering Ontology design, language acquisition
12
Interacting with the user Dialog manager Domain agent Domain agent Domain agent Guide interaction through task Map user inputs and system state into actions Interact with back-end(s) Interpret information using domain knowledge Task schemas Database Live data (e.g. Web) Domain expert Context Task analysis Knowledge engineering
13
Communicating with the user Language Generator Speech synthesizer Display Generator Action Generator Decide what to say to user (and how to phrase it)
14
Speech recognition and understanding Sphinx system –speaker-independent –continuous speech –large vocabulary ATIS system –air travel information retrieval –context management film clip
15
Command and control systems Small vocabularies, fixed syntax –OPEN WINDOW –MOVE OBJECT to –Applications: data entry (e.g., zip codes), process control (e.g., electron microscope, darkroom equipment) Large vocabulary, fixed syntax –Web browsing (?)
16
SpeechWear Vehicle inspection task –USMC mechanics, fixed inspection form –Wearable computer (COTS components) –html-based task representation film clip
17
Information access Moderate to very large vocabulary –IVR and frame based systems Commercial systems: –Nuance: http://www.nuance.com/demo/index.html http://www.nuance.com/demo/index.html –SpeechWorks: http://www.speechworks.com/demos/demos.htm http://www.speechworks.com/demos/demos.htm –lots of others..
18
IVR and frame-based systems Interactive voice response (IVR) –interactions specified by a graph (typically a tree) Frame systems –ergodic graphs –states defined by multi-item forms
19
Graph-based systems Welcome to Bank ABC! Please say one of the following: Balance, Hours, Loan,... What type of loan are you interested in? Please say one of the following: Mortgage, Car, Personal,.....
20
Frame-based systems I would like to fly to Boston –I’d like to go to Boston on Friday, … When would you like to fly? Destination_City: Boston Departure_Date: ______ Departure_Time: ______ Preferred_Airline: ______...
21
Frame-based systems Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Transition on keyword or phrase
22
Some problems IVR systems work great, but only for well- structured (& “shallow”) tasks Frame systems are good for “tasks” that correspond to a single form leading to an action Neither approach does well with more complex problem-solving activities
23
Dialog Systems Problem solving activity; complex task –Order of progression through task depends on user goals (which can change) and system state (a back-end retrieval) and is not predictable. Track progress and help task along –mixed-initiative dialog Discourse phenomena –User expect to “converse” with the system
24
Carnegie Mellon Communicator A dialog system that supports complex problem solving in a travel planning domain –create an itinerary using air schedule, hotel and car information –186 U.S. airports (>140k enplanements/yr) currently: >500 world airports Web-based data resources –Live and cached flight information –Airport, airline, etc. information
25
Value schema/handlers value transform receptors Domain Agent
26
Compound schema value transform Value_3 Value_1 Value_2 Domain Agent e.g. SQL query +
27
Schema ordering Value i Value j Value k Schema i Schema j Schema k Destination airport Date Time Flight Leg Value transform Available flights Database lookup
28
Carnegie Mellon Communicator CMU Communicator –Call: 268-5144 –the information is accurate; you can use it for your own travel planning...
29
User-aware speech interfaces Predictable behavior on the system’s part Users coomunicate at different levels http://www.speech.cs.cmu.edu/air/papers/Interface Chars.htmlhttp://www.speech.cs.cmu.edu/air/papers/Interface Chars.html
30
User-aware speech interfaces Content: task-centric utterances Possibility: What can I do? Orientation: Where are we? Navigation: moving through the task space Control: verbose/terse, listen! Customization: define this word
31
Speech interface guidelines Speech recognition is errorful System state is often opaque to the user http://www.speech.cs.cmu.edu/air/papers/S pInGuidelines/SpInGuidelines.htmlhttp://www.speech.cs.cmu.edu/air/papers/S pInGuidelines/SpInGuidelines.html
32
Interface guidelines State transparency Input control Error recovery Error detection Error correction Log performance Application integration
33
Summary Speech and language communication Dialog structure Interface design
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.