Download presentation
Presentation is loading. Please wait.
Published byJordan Harper Modified over 9 years ago
1
Speech Recognition Yonglei Tao
2
Voice-Activated GPS
3
Voice User Interface (VUI) A VUI allows human interaction with computers through a voice/speech platform Basic components Speech recognition Meaning extraction Response generation Speech output Benefits Loosen some physical constraints Provide tools for universal design disability and situational impairments Intuitive and efficiency
4
System Architecture
5
Components Endpointing Speech to endpointed utterance Feature extraction Endpointed utterance to feature vectors Recognition Feature vectors to word string(s) Natural language understanding Word string(s) to meaning(s) Dialog management Meaning to actions
6
Typical Recognition Components
7
Examples Book, boot Write, right Flew, flu, flue Eight books Ate books I scream Ice cream
8
Components Acoustic models Internal representation of each basic sound Dictionary A list of words and pronunciations Grammar Defines all possible strings of words the recognizer can handle Allows to associate a meaning with those strings Either rule-based or statistical
9
Recognition Recognition search A recognizer searches the recognition model to find the best-matching word string Confidence measures A quantitative measure of how confident the recognizer is for the best-matching string VUI developers can use those measures in several ways N-Best processing A recognizer returns several results with a confidence measure for each
10
Speech Recognition Engines Microsoft Visual Studio & CMU Sphinx Grammar Android Language model – free form for dictation or web search for short phrases Google Web Speech API for Web Applications
11
BNF (Backus-Naur Form) Notation for context-free grammars Often used to describe the syntax of programming languages Also specify the words and patterns of words to be listened for by a speech recognizer EBNF (Extended Backus-Naur Form) ABNF (Augmented Backus-Naur Form) Basis for speech grammar specifications ABNF for.Net Regular grammar for Java
12
Basics ::=meaning "is defined as" | meaning "or" include category name Terminalbasic component ::= a b ca sequence ::= a | b | coptional ::= a | a one or more
13
An Example Grammar for a speech recognition calculator Reference: Grammar creation in C# https://msdn.microsoft.com/en-us/library/hh538495%28v=office.14%29.aspx
14
Speech to Text in C# using System.Speech.Recognition; using System.Speech.Synthesis; using System.Threading; static ManualResetEvent _completed = null; static void Main(string[] args) { _completed = new ManualResetEvent(false); SpeechRecognitionEngine _recognizer = new SpeechRecognitionEngine(); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("test")) Name = { "testGrammar" }); _recognizer.LoadGrammar(new Grammar(new GrammarBuilder("exit")) Name = { "exitGrammar" }); _recognizer.SpeechRecognized += _recognizer_SpeechRecognized; // add an event handler _recognizer.SetInputToDefaultAudioDevice(); _recognizer.RecognizeAsync(RecognizeMode.Multiple); … _completed.WaitOne(); // wait until speech recognition is completed _recognizer.Dispose(); // dispose the speech recognition engine }
15
Speech to Text in C#(Cont.) void _recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { if (e.Result.Text == "test") { Console.WriteLine("The test was successful!"); } else if (e.Result.Text == "exit") { _completed.Set(); } void _recognizer_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e) { if (e.Result.Alternates.Count == 0) { Console.WriteLine("Speech rejected. No candidate phrases found."); return; } Console.WriteLine("Speech rejected. Did you mean:"); foreach (RecognizedPhrase r in e.Result.Alternates) { Console.WriteLine(" " + r.Text); }
16
Text to Speech in C# SpeechSynthesizer _synthesizer = new SpeechSynthesizer(); synthesizer.Speak("Now the computer is speaking to you.");... synthesizer.Dispose(); // dispose the SpeechSynthesizer
17
References SpeechRecognitionEngine Class https://msdn.microsoft.com/en- us/library/system.speech.recognition.speechrecognitionengine%28v=vs.1 10%29.aspx?cs-save-lang=1&cs-lang=vb#code-snippet-1 https://msdn.microsoft.com/en- us/library/system.speech.recognition.speechrecognitionengine%28v=vs.1 10%29.aspx?cs-save-lang=1&cs-lang=vb#code-snippet-1 Speech recognition, speech to text, text to speech, and speech synthesis in C# http://www.codeproject.com/Articles/483347/Speech-recognition- speech-to-text-text-to-speech-a http://www.codeproject.com/Articles/483347/Speech-recognition- speech-to-text-text-to-speech-a
18
Visual Studio Speech Recognizer
21
Speech Recognition with Visual Studio Examples http://www.phon.ucl.ac.uk/courses/spsci/compmeth/speech/recog nition.html http://www.phon.ucl.ac.uk/courses/spsci/compmeth/speech/recog nition.html http://blogs.msdn.com/b/devschool/archive/2012/02/06/speech- recognition-using-visual-studio-determining-the-bna.aspx http://blogs.msdn.com/b/devschool/archive/2012/02/06/speech- recognition-using-visual-studio-determining-the-bna.aspx Grammar Class http://msdn.microsoft.com/en- us/library/system.speech.recognition.grammar.aspx http://msdn.microsoft.com/en- us/library/system.speech.recognition.grammar.aspx GrammarBuilder Class http://msdn.microsoft.com/en- us/library/system.speech.recognition.grammarbuilder.aspx http://msdn.microsoft.com/en- us/library/system.speech.recognition.grammarbuilder.aspx
22
Speech Recognition for Java Sphinx 4 A speech recognition engine written entirely in Java Created by CMU, Sun, Mitsubishi, HP, … Open source Compliant with JSpeech Grammar Format Platform- and vendor-independent Programmer’s guide http://cmusphinx.sourceforge.net/sphinx4/ An example https://www.assembla.com/code/sonido/subversion/nodes/4/sphin x4/src/apps/edu/cmu/sphinx/demo/helloworld https://www.assembla.com/code/sonido/subversion/nodes/4/sphin x4/src/apps/edu/cmu/sphinx/demo/helloworld
23
A Sample Grammar in Java #JSGF V1.0; public = ; = please | could you; = start | open | stop | close | kill | shut down ; = word | excel | out look | note pad ;
24
Android Speech Recognition public class MainActivity extends Activity { private static final int VOICE_RECOGNITION = 1; Button speakButton ; TextView spokenWords; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); speakButton = (Button) findViewById(R.id.button1); spokenWords = (TextView)findViewById(R.id.textView1); } @Override public boolean onCreateOptionsMenu(Menu menu) { // Inflate the menu; this adds items to the action bar if it is present. getMenuInflater().inflate(R.menu.main, menu); return true; }
25
public void btnSpeak(View view){ Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); // Specify free form input intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); intent.putExtra(RecognizerIntent.EXTRA_PROMPT,"Please start speaking"); intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 1); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.ENGLISH); startActivityForResult(intent, VOICE_RECOGNITION); } @Override protected void onActivityResult(int requestCode, int resultCode, Intent data) { if (requestCode == VOICE_RECOGNITION && resultCode == RESULT_OK) { ArrayList results; results = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS); // TODO Do something with the recognized voice strings Toast.makeText(this, results.get(0), Toast.LENGTH_SHORT).show(); spokenWords.setText(results.get(0)); } super.onActivityResult(requestCode, resultCode, data); }
26
Android and Web Speech Recognition Android Voice Recognition Tutorial http://www.javacodegeeks.com/2012/08/android-voice- recognition-tutorial.html http://www.javacodegeeks.com/2012/08/android-voice- recognition-tutorial.html http://code4reference.com/2012/07/tutorial-android-voice- recognition/ http://code4reference.com/2012/07/tutorial-android-voice- recognition/ Google Web Speech Recognition Examples http://stiltsoft.com/blog/2013/05/google-chrome-how-to-use- the-web-speech-api/ http://stiltsoft.com/blog/2013/05/google-chrome-how-to-use- the-web-speech-api/ http://stackoverflow.com/questions/17635354/developing-a- simple-voice-driven-web-app-using-web-speech-api http://stackoverflow.com/questions/17635354/developing-a- simple-voice-driven-web-app-using-web-speech-api http://apprentice.craic.com/tutorials/37 http://apprentice.craic.com/tutorials/37
27
Challenges for VUI Design People have very little patience for a "machine that does not understand” VUIs need to respond to input reliably, or they will be rejected by their users Designing a usable VUI requires interdisciplinary talents of computer science, linguistics and human factors The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction
28
Natural Language Understanding Ambiguity Refers to phrases that look distinct in print but sound similar when spoken, for example, “Wreck a nice beach” “Recognize speech” As the vocabulary and grammar get larger, the potential for ambiguity increases Short words and phrases are harder to recognize than longer ones
29
Language Understanding (Cont.) Deviation Deviating from what the developer expects For example, an issue with the question “Is that correct?” Expecting a simple response like “Yes”, “No”, or “Correct” Southern speakers would respond with “Yes, ma’am” or “No, ma’am”
30
Discussion What you would expect if the user asks to start Microsoft Word? Please start word Could you start word Start word Please open word Could you open word Open word
31
Language Understanding (Cont.) Keyword Extraction Important for applications built with a speech recognizer that returns a string containing the actual words spoke by the user Leaving the application to interpret their semantic meaning One might say “Computer, find me some information about the flooding in Detroit recently“ Keywords like “find”, “flooding”, and “Detroit” are crucial for an accurate response from the VUI Others are filler words
32
Dialog Management Multi-modelity Interaction can occur through different mediums Need to consider when and which part of the application allows to be multi-model Grammar There is a close relationship between what a prompt says and what the caller ends up saying to the system Especially the words used Configuration files You may choose the confidence level at which the recognizer will reject the input rather than return the answer You may also choose parameters for the endpointer, that is, how long it should listen before timing out
33
Dialog Management (Cont.) Error handling Allow the user to be able to recover after errors and get the dialog with the user back on track Recognition does not always succeed. When it fails, there are a number of messages the recognizer may return to the application. Voice recognition accuracy In-grammar data Out-grammar data
34
Error Handling In-grammar data Correct Accept the recognizer returned the correct answer False Accept the recognizer returned the wrong answer False Reject the recognizer could not find match and gave up Out-of-grammar data Correct Reject the recognizer correctly rejected the input False Accept the recognizer returned a value that is wrong because the input is not in the grammar How to handle each categories?
35
Error Handing in Android
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.