Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services1
Statistical Language Model-based Recognition Technologies Call Routing Speaker Identification Dictation Speaker emotion Voice pitch Age Gender Intoxication Stress Medical conditions (e.g., sleep apnea) © 2013 by Larson Technical Services2 Also used for Optical Character Recognition (OCR) Machine vision Big data analysis
Example Verbal Phrases with Annotations “I have a problem with my bill”accounting “Where is my order?”shipping “My gadget arrived broken”customer service “I need to return my gadget”shipping “My statement is wrong”accounting “I want a refund”accounting © 2013 by Larson Technical Services3 Annotate thousands of verbal phrases
Statistical Language Model (SLM) © 2013 by Larson Technical Services4 Statistical Language Model-based Speech Recognition Audio Input Feature Extraction Phoneme Identification Classifier Language Model Category Statistical Routines Verbal Phrases Annotated with categories Does not use grammars
Grammars vs. Statistical Language Models Hand-crafted rules Very high-accuracy Easy to assemble Finite phrases Used for Interactive Voice Response (IVR) Command and control Context-Free Grammars (CFGs) Data-driven High-accuracy Complex to assemble Natural language Used for dictation Statistical Language Models (SLMs)
Call Routing © 2013 by Larson Technical Services6 Where is my order? Classifier Accounting Customer Support Sales … How may I help you?
© 2013 by Larson Technical Services7 Speaker Identification Technologies General techniques for identifying people – Something you know – Something you have – Something about you Three basic functions for speaker identification – Speaker registration – Speaker authentication – Speaker identification Your speech features
© 2013 by Larson Technical Services8 Speaker Registration Speech Profiles Good Morning Joe’s Speech Features Good Morning Good Morning Wanda’s Speech Features Fred’s Speech Features
© 2013 by Larson Technical Services9 Speaker Authentication Speech Profiles Good morning Wanda’s speech features Good morning Wanda’s speech features Compare Used to supplement or replace passwords
© 2013 by Larson Technical Services10 Speaker Identification Speech Profiles Good morning Good morning Wanda’s speech features Joe’s speech features Good morning Good morning Wanda’s speech features Fred’s speech features Select
© 2013 by Larson Technical Services11 Speaker Identification Technologies Advantages – Are unobtrusive – Are location independent – Require no special equipment – Replace passwords Disadvantages – Sometimes fail Siblings with similar voice profiles Teenage male voice “break” Colds, sore throats, sore lips, etc Variety of microphones Tape recordings
Statistical Language Model-based Recognition Technologies Call Routing Speaker Authentication Dictation Speaker emotion Voice pitch Age Gender Intoxication Stress Medical conditions (e.g., sleep apnea) © 2013 by Larson Technical Services12 Widely available Actively being researched