Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2013 by Larson Technical Services

Similar presentations


Presentation on theme: "© 2013 by Larson Technical Services"— Presentation transcript:

1 © 2013 by Larson Technical Services
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services

2 Speech Recognition (ASR, SST)
Grammar-Based Developer specifies words to be recognized Statistical Language Models Developer records and tags phrases © 2013 by Larson Technical Services

3 © 2013 by Larson Technical Services
Recognition Technology Source Target Typical Technique Automatic speech recognition (ASR) Spoken language Text Hidden Markov Model, Neural Net, Table lookup Touchtone recognition Caller presses buttons on phone Digits Tone recognition Speaker Identification Names of registered callers Table lookup Voice Activity Detection Caller speaks or does not speak “On” or “Off” Attention word Classification Categories Statistical analysis Language Identification National language names © 2013 by Larson Technical Services

4 Touchtone Recognition
Caller responds to voice menus by pressing touchtone buttons on the telephone keypad Advantages Highly accurate Disadvantages Lost in space Time-consuming menus where user must convert choice to a digit © 2013 by Larson Technical Services

5 © 2013 by Larson Technical Services
Speech Recognition Advantages User does not convert choices to a digit Disadvantages Occasional failure to recognize what user said Time-consuming dialogs Users may interrupt prompts by “barge-in” © 2013 by Larson Technical Services

6 Speech Recognition Engines
Low-end High-end Other Speaking mode Isolated (discrete) Continuous Keywords Enrollment Speaker dependent Speaker independent Adaptive Vocabulary size Small Large Switch vocabu-laries Speaking style Read Spontaneous Number of simultaneous callers Single-threaded Multi-threaded © 2013 by Larson Technical Services

7 How Speech Recognition Works
Words and Phrases Word Identification Phoneme Identification Feature Extraction signal Digital signal processing Audio Input © 2013 by Larson Technical Services

8 How Speech Recognition Works
Words and Phrases Word Identification Phoneme Identification Acoustic Model Transform features to phonemes Feature Extraction Sounds in a language Different for each language May be speaker dependent (speaker must train model) May be speaker independent (pretrained) Usually supplied by ASR vendor Audio Input © 2013 by Larson Technical Services

9 How Speech Recognition Works
Words and Phrases Language Model Word Identification Words in a language and their pronunciation Transform phonemes to words Phoneme Identification Feature Extraction Audio Input © 2013 by Larson Technical Services

10 Grammar-based Speech Recognition
Context-free Grammar (CFG) Words and Phrases Grammar Grammar Compiler Language Model Word Identification Lexicon Phoneme Identification Feature Extraction Audio Input © 2013 by Larson Technical Services

11 Where are grammars used?
Interactive Response Systems (IVR) Automated telephone agents Each step may use a different grammar Grammar defines only the words which the user may speak during a step Application developers specify grammars for each step The same grammar may be reused in multiple applications © 2013 by Larson Technical Services

12 © 2013 by Larson Technical Services
Example Grammar <grammar type = "application/srgs+xml" root = "single_digit" mode = "voice">      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> © 2013 by Larson Technical Services

13 © 2013 by Larson Technical Services
Example Grammar <grammar type = "application/srgs+xml" root = "twenties" mode = "voice"> <rule id = "twenties“> <one-of> <item> twenty </item> <item> twenty <ruleref uri = "#single_digit"/> </item> </one-of> </rule>      <rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> © 2013 by Larson Technical Services

14 © 2013 by Larson Technical Services
Grammar with 3 Rules <grammar type = "application/grammar+xml" root = "request" mode = "voice"> <rule id = "request"> <ruleref uri = "#color"/> <ruleref uri = "#size"/> </rule> <rule id = "size"> <one-of> <item> small </item> <item> medium </item> <item> large </item> </one-of> </rule>    <rule id = "color">         <one-of>                <item> red </item>                <item> green </item>                 <item> blue </item>           </one-of>      </rule>                   © 2013 by Larson Technical Services

15 © 2013 by Larson Technical Services
Grammar Exercise Extend the grammar to include the combination of “color,” “size,” and “product” where product may be “T-shirt” or “vest” © 2013 by Larson Technical Services

16 XML and ABNF Grammar Formats
<rule id = "single_digit">           <one-of>                <item> one </item>                <item> two </item>                <item> three </item>                <item> four </item>                <item> five </item>                <item> six </item>                <item> seven </item>                <item> eight </item>               <item> nine </item>          </one-of>      </rule> </grammar> $single_digit = one | two | three | four | five | six | seven | eight | nine XML format Verbose Validated by XML tools ABNF format Terse Familiar to compiler experts Not validated by XML tools © 2013 by Larson Technical Services

17 Summary Grammar-Based Speech Recognition
Various speech recognition technologies are used for a large variety of applications. Speech grammars are used to constrain the words that a user may speak during a single step of an automated conversation. Trained application developers create a grammar for each step of an automated conversation. © 2013 by Larson Technical Services

18 Answer: Grammar Exercise
<grammar type = "application/grammar+xml" root = "request" mode = "voice"> <rule id = “request" "> <ruleref uri = "#color"/> <ruleref uri = "#size"/> <ruleref uri = "#product"/> </rule> <rule id = "size"> <one-of> <item> small </item> <item> medium </item> <item> large </item> </one-of> </rule>    <rule id = "color">         <one-of>                <item> red </item>                <item> green </item>                 <item> blue </item>           </one-of>      </rule>                   <rule id = “product">    <one-of>         <item> T-shirt </item>          <item> vest </item>       </one-of> </rule>                   © 2013 by Larson Technical Services


Download ppt "© 2013 by Larson Technical Services"

Similar presentations


Ads by Google