© 2013 by Larson Technical Services Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services
Speech Recognition (ASR, SST) Grammar-Based Developer specifies words to be recognized Statistical Language Models Developer records and tags phrases © 2013 by Larson Technical Services
© 2013 by Larson Technical Services Recognition Technology Source Target Typical Technique Automatic speech recognition (ASR) Spoken language Text Hidden Markov Model, Neural Net, Table lookup Touchtone recognition Caller presses buttons on phone Digits Tone recognition Speaker Identification Names of registered callers Table lookup Voice Activity Detection Caller speaks or does not speak “On” or “Off” Attention word Classification Categories Statistical analysis Language Identification National language names © 2013 by Larson Technical Services
Touchtone Recognition Caller responds to voice menus by pressing touchtone buttons on the telephone keypad Advantages Highly accurate Disadvantages Lost in space Time-consuming menus where user must convert choice to a digit © 2013 by Larson Technical Services
© 2013 by Larson Technical Services Speech Recognition Advantages User does not convert choices to a digit Disadvantages Occasional failure to recognize what user said Time-consuming dialogs Users may interrupt prompts by “barge-in” © 2013 by Larson Technical Services
Speech Recognition Engines Low-end High-end Other Speaking mode Isolated (discrete) Continuous Keywords Enrollment Speaker dependent Speaker independent Adaptive Vocabulary size Small Large Switch vocabu-laries Speaking style Read Spontaneous Number of simultaneous callers Single-threaded Multi-threaded © 2013 by Larson Technical Services
How Speech Recognition Works Words and Phrases Word Identification Phoneme Identification Feature Extraction signal Digital signal processing Audio Input © 2013 by Larson Technical Services
How Speech Recognition Works Words and Phrases Word Identification Phoneme Identification Acoustic Model Transform features to phonemes Feature Extraction Sounds in a language Different for each language May be speaker dependent (speaker must train model) May be speaker independent (pretrained) Usually supplied by ASR vendor Audio Input © 2013 by Larson Technical Services
How Speech Recognition Works Words and Phrases Language Model Word Identification Words in a language and their pronunciation Transform phonemes to words Phoneme Identification Feature Extraction Audio Input © 2013 by Larson Technical Services
Grammar-based Speech Recognition Context-free Grammar (CFG) Words and Phrases Grammar Grammar Compiler Language Model Word Identification Lexicon Phoneme Identification Feature Extraction Audio Input © 2013 by Larson Technical Services
Where are grammars used? Interactive Response Systems (IVR) Automated telephone agents Each step may use a different grammar Grammar defines only the words which the user may speak during a step Application developers specify grammars for each step The same grammar may be reused in multiple applications © 2013 by Larson Technical Services
© 2013 by Larson Technical Services Example Grammar <grammar type = "application/srgs+xml" root = "single_digit" mode = "voice"> <rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar> © 2013 by Larson Technical Services
© 2013 by Larson Technical Services Example Grammar <grammar type = "application/srgs+xml" root = "twenties" mode = "voice"> <rule id = "twenties“> <one-of> <item> twenty </item> <item> twenty <ruleref uri = "#single_digit"/> </item> </one-of> </rule> <rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar> © 2013 by Larson Technical Services
© 2013 by Larson Technical Services Grammar with 3 Rules <grammar type = "application/grammar+xml" root = "request" mode = "voice"> <rule id = "request"> <ruleref uri = "#color"/> <ruleref uri = "#size"/> </rule> <rule id = "size"> <one-of> <item> small </item> <item> medium </item> <item> large </item> </one-of> </rule> <rule id = "color"> <one-of> <item> red </item> <item> green </item> <item> blue </item> </one-of> </rule> © 2013 by Larson Technical Services
© 2013 by Larson Technical Services Grammar Exercise Extend the grammar to include the combination of “color,” “size,” and “product” where product may be “T-shirt” or “vest” © 2013 by Larson Technical Services
XML and ABNF Grammar Formats <rule id = "single_digit"> <one-of> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule> </grammar> $single_digit = one | two | three | four | five | six | seven | eight | nine XML format Verbose Validated by XML tools ABNF format Terse Familiar to compiler experts Not validated by XML tools © 2013 by Larson Technical Services
Summary Grammar-Based Speech Recognition Various speech recognition technologies are used for a large variety of applications. Speech grammars are used to constrain the words that a user may speak during a single step of an automated conversation. Trained application developers create a grammar for each step of an automated conversation. © 2013 by Larson Technical Services
Answer: Grammar Exercise <grammar type = "application/grammar+xml" root = "request" mode = "voice"> <rule id = “request" "> <ruleref uri = "#color"/> <ruleref uri = "#size"/> <ruleref uri = "#product"/> </rule> <rule id = "size"> <one-of> <item> small </item> <item> medium </item> <item> large </item> </one-of> </rule> <rule id = "color"> <one-of> <item> red </item> <item> green </item> <item> blue </item> </one-of> </rule> <rule id = “product"> <one-of> <item> T-shirt </item> <item> vest </item> </one-of> </rule> © 2013 by Larson Technical Services