Dialog Design - Speech & Natural Language This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory.

Dialog Design - Speech & Natural Language This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory Abowd, Jim Foley, Elizabeth Mynatt, Jeff Pierce, Colin Potts, Chris Shaw, John Stasko, and Bruce Walker. Comments directed to foley@cc.gatech.edu are encouraged. Permission is granted to use with acknowledgement for non-profit purposes. Last revision: February 2012.foley@cc.gatech.edu

UI Design - Georgia Tech2 Dialog Styles 1. Command languages 2. WIMP - Window, Icon, Menu, Pointer 3. Direct manipulation 4. Speech/natural language 5. Gesture & pen

UI Design - Georgia Tech3 Agenda What is speech? When to use speech Speech input Speech output Designing the speech interaction

UI Design - Georgia Tech4 A Voice Interface

UI Design - Georgia Tech5 When to Use Speech Hands busy – as in holding cell phone Conditions preclude use of keyboard Mobility required Eyes occupied Visual impairment Physical limitation – can’t type Simply want to avoid need to type!!

UI Design - Georgia Tech6 Speech What is speech?  Vibrations of vocal cords creates sound “ahh”  Mouth, throat, tongue, lips shape sound English speech  40 phonemes; 24 consonants, 16 vowels Sounds transmit “language”

UI Design - Georgia Tech7 Waveform & Spectrogram Speech does not equal written language

UI Design - Georgia Tech8 Parsing Sentences "I told him to go back where he came from, but he wouldn't listen."

UI Design - Georgia Tech9 Speech Input Speaker recognition Discrete vs. continuous recognition Speaker-independent vs. speaker-dependent Speech understanding

UI Design - Georgia Tech10 Speaker Recognition Tell who is speaking (voice print)  But not what they are saying Can also be used in video-conferencing to know who is speaking - for instance to highlight their image

UI Design - Georgia Tech11 Discrete vs. Continuous Recognition Discrete  Say one word at a time Continuous  Sayallthewordsruntogether  Computer calculates where one word ends and the next starts - much harder than discrete Did you vs. Didja

UI Design - Georgia Tech12 Speaker-independent vs. Speaker-dependent Speaker-independent  Are mostly discrete word-oriented  Must work with male, female & accented voices  Typically used with phone-based systems –Banking, Airline reservations  Keys to success –Limited set of choices at each step “Would you like to make domestic or international reservations?” “Speak your frequent flyer number” –Frequent feedback and error-correction opportunities “Did you say 434568432?”

UI Design - Georgia Tech13 Questions? What are your experiences with phone- based systems? In what other ways might discrete word recognition systems be used?

UI Design - Georgia Tech14 Speaker-independent vs. Speaker-dependent Speaker dependent systems require initial training  User reads text (several pages) known to system  Continues to get better after initial training –Partly by learning from mistakes/corrections –Partly by training user :) Vocabulary  Some have 50,000+ words –Plus specialized vocabularies: medicine, law, … Commercial systems include  ViaVoice, Naturally Speaking, …

UI Design - Georgia Tech15 Recognition Systems Typical system has 5 components:  Speech capture device – analog -> digital converter  Digital Signal Processor – finds word boundaries, scales, filters, cuts out silences and noise  Preprocessed signal storage – processed speech buffered for recognition algorithm  Reference patterns - stored templates or generative speech models for pattern-matching comparisons  Pattern matching algorithm - Goodness of fit from templates/model to user’s speech –Make heavy use of probabilities and large finite state machines

UI Design - Georgia Tech16 Errors Systems make four types of errors:  Substitution - one for another  Rejection - detected, but not recognized  Insertion - added  Deletion - not detected MUST HAVE means for user recovery from system errors!

UI Design - Georgia Tech17 Demonstrated Effectiveness of Command Input via Speech Using MacDraw to re-create drawings Baseline - use mouse to menu-select commands and give positions on drawing Speed improvement using keyboard accelerators - 14.5% Speed improvement using voice - 21.2% Pausch, R. and Leatherby, J. H. "A Study Comparing Mouse-Only Input vs. Mouse-Plus-Voice Input for a Graphical Editor," Proceedings of the AVIOS `90 Voice I/O Systems Applications Conference, September 1990, pages 227 - 231, and http://www- 2.cs.cmu.edu/~stage3/publications/91/conferences/avios/paper.html

UI Design - Georgia Tech18 Speech Understanding a.k.a. Natural Language (NL) Understanding Giving meaning to sequences of words Input might be spoken or typed Holy grail of Artificial Intelligence

UI Design - Georgia Tech19 Eliza - a “Faked” Understanding Person : Men are all alike. ELIZA : In what way? Person : They're always bugging us about something or other. ELIZA : Can you think of a specific example? Person : Well, my boyfriend made me come here. ELIZA : Your boyfriend made you come here? Person : He says I'm depressed much of the time. ELIZA : I am sorry to hear that you are depressed. Person : It's true. I am unhappy. ELIZA : Do you think coming here will help you not to be unhappy? Person : I need some help, that much seems certain. ELIZA : What would it mean to you if you got some help? Person : Perhaps I could learn to get along with my mother. ELIZA : Tell me more about your family http://www-ai.ijs.si/eliza/eliza.html Weizenbaum, J., "ELIZA -- A computer program for the study of natural language communication between man and machine", Communications of the ACM 9(1):36-45, 1966

UI Design - Georgia Tech20 NL Terms Syntactic  Grammar or structure Prosodic  Inflection, stress, pitch, timing Pragmatic  Situated context of utterance, location, time Semantic  Meaning of words

UI Design - Georgia Tech21 SR/NLU Advantages Easy to learn and remember Powerful Fast, efficient (not always) Little screen real estate

UI Design - Georgia Tech22 SR/NLU Disadvantages Doesn’t yet do well enough Assumes knowledge of problem domain  Not prompted, like menus  Time flies like an arrow” Requires typing skill (if keyboard) Expensive to implement

UI Design - Georgia Tech23 Remember A natural language interface need not be speech A speech interface need not use natural language (might be more command language-like) Wizard of Oz evaluations are particularly useful in this area

UI Design - Georgia Tech24 Speech Output Male or female voice?  Technical issues (freq. response of phone)  User preference (depends on the application) Rate of speech  Technically up to 550 wpm!  Depends on listener (blind: 150-300 wpm) Synthesized or Pre-recorded?  Synthesized: Better coverage, flexibility  Recorded: Better quality, acceptance

UI Design - Georgia Tech25 Speech Output Synthesis  Quality depends on software ($$)  Influence of vocabulary and phrase choices Recorded segments  Store tones, then put them together  The transitions are difficult (e.g., numbers) Numbers  Record three versions (rise, flat, fall)  Logic to determine which version to play

Speech Synthesis Intonation –Command + Delete  Is now the time for all good men to come to the aid of their country?  Now is the time for all good men to come to the aid of their country.  Now is the time for all good men to come to the aid of their country!  Country! Country? Country. Country? Country! Country.  How much wood could a woodchuck chuck if a woodchuck could chuck wood?  A woodchuck could chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood. UI Design - Georgia Tech26

UI Design - Georgia Tech27 Designing the Interaction Constrain vocabulary  Limit valid commands  Structure questions wisely (Yes/No)  Manage the interaction  Examples from the airline systems? Slow speech rate, but concise phrases Design for failsafe error recovery Process preview & progress indicator

DOET and Voice Systems Minimize the gulf of interpretation and gulf of execution Affordances Natural mappings Make state visible Conceptual model Provide feedback Constraints When in doubt, standardize UI Design - Georgia Tech28

UI Design - Georgia Tech29 Speech Tools/Toolkits Java Speech SDK  FreeTTS 1.1.1 http://freetts.sourceforge.net/docs/index.php http://freetts.sourceforge.net/docs/index.php  "For 3/4 or 75% of his time, Dr. Walker practices for $90 a visit on Dr. Dr., next to King Philip X of St. Lameer St. in Nashua NH." IBM JavaBeans for speech Visual/Real Basic speech SDK OS capabilities (speech recognition and synthesis built in to OS) (TextEdit) VoiceXML

Flow-chart The UI First! SUEDE - Flow-chart for speech interface - Landay et al, UC Berkeley (now U Washington) 30UI Design - Georgia Tech

31 The End

Dialog Design - Speech & Natural Language This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory.

Similar presentations

Presentation on theme: "Dialog Design - Speech & Natural Language This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dialog Design - Speech & Natural Language This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory.

Similar presentations

Presentation on theme: "Dialog Design - Speech & Natural Language This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory."— Presentation transcript:

Similar presentations

About project

Feedback