EEC-693/793 Applied Computer Vision with Depth Cameras

Slides:



Advertisements
Similar presentations
Page 1 | Microsoft Work With Depth Data Kinect for Windows Video Courses Jan 2013.
Advertisements

5/19/2015 EEC484/584: Computer Networks 1 EEC-490 Senior Design (CE) Kinect Programming Tutorial 1 Wenbing Zhao
EEC-492/592 Kinect Application Development Lecture 15 Wenbing Zhao
Work With Skeleton Data
By Rishabh Maheshwari. Objective of today’s lecture Play Angry Birds in 3D.
EEC-693/793 Applied Computer Vision with Depth Cameras
Page 1 | Microsoft Work With Color Data Kinect for Windows Video Courses Jan 2013.
Page 1 | Microsoft Introduction to audio stream Kinect for Windows Video Courses.
EEC-492/592 Kinect Application Development
EEC-492/592 Kinect Application Development Lecture 10 Wenbing Zhao
Rujchai Ung-arunyawee Department of Computer Engineering Khon Kaen University.
1 EEC-492/592 Kinect Application Development Lecture 2 Wenbing Zhao
12/5/2015 EEC492/693/793 - iPhone Application Development 1 EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 4 Wenbing Zhao
PROGRAMMING LANGUAGE CEM AYGUN VOICE CONTROLLED LED LIGHT.
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao
2/16/2016 EEC492/693/793 - iPhone Application Development 1 EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 4 Wenbing Zhao
3/3/2016 EEC492/693/793 - iPhone Application Development 1 EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 4 Wenbing Zhao
Billy Overton Getting back to software.
Speech Recognition Created By : Kanjariya Hardik G.
EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 9 Wenbing Zhao
KINECT AMERICAN SIGN TRANSLATOR (KAST)
EEC-492/592 Kinect Application Development
Ch 10- Advanced Object-Oriented Programming Features
Introduction to Microsoft Kinect Sensor Programming
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-492/592 Kinect Application Development
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
Android Sensor Programming
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
EEC-693/793 Applied Computer Vision with Depth Cameras
Shapes.
EEC-693/793 Applied Computer Vision with Depth Cameras
Chengyu Sun California State University, Los Angeles
Presentation transcript:

EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 12 Wenbing Zhao wenbing@ieee.org 1

Outline Speech Recognition How speech recognition works Exploring Microsoft Speech API (SAPI) Creating your own grammar and choices for the speech recognition engine Draw What I Want – building a speech-enabled application

How Speech Recognition Works Kinect microphone array captures the audio stream, and convert the analog audio into digital sound signals The audio sound signals are sent to the speech recognition engine for recognition The acoustic model of the speech recognition engine analyzes the audio and converts the sound into a number of basic speech elements, phonemes Then, the language model is used to analyze the content of the speech and match the word by combining the phonemes with a build-in dictionary Context sensitive

How Speech Recognition Works

Types of Speech Recognition Command mode; you say a command at a time for the speech recognition engine to recognize Sentence mode / diction mode: you say a sentence to perform an operation, e.g., mirror the shape

Microsoft Speech API Kinect SDK comes with the Microsoft Kinect speech recognition language pack

SpeechRecognitionEngine Class The InstalledRecognizers method of the speechRecognitionEngine class returns the lists of installed recognizers in the system, and we can filter them out based on the recognizer ID The SpeechRecognitionEngine class accepts an audio stream from the Kinect sensor and processes it The SpeechRecognitionEngine class raises a sequence of events when the audio stream is detected: SpeechDetected is raised if the audio appears to be a speech SpeechHypothesized then fires multiple times when the words are tentatively detected. Finally SpeechRecognized is raised when the recognizer finds the speech If the speech is detected but does not match properly or is of very low confidence level, the SpeechRecognitionRejected event handler will fire.

Steps for building speech-enabled apps Enable the Kinect audio source Start capturing the audio data stream Identify the speech recognizer Define the grammar for the speech recognizer Start the speech recognizer Attach the speech audio source to the recognizer Register the event handler for speech recognition Handle the different events invoked by the speech recognition engine

Identify the speech recognizer private static RecognizerInfo GetKinectRecognizer() { foreach (RecognizerInfo recognizer in SpeechRecognitionEngine.InstalledRecognizers()) string value; recognizer.AdditionalInfo.TryGetValue("Kinect", out value); if ("True".Equals(value, StringComparison.OrdinalIgnoreCase) && "en-US". Equals(recognizer.Culture.Name, StringComparison.OrdinalIgnoreCase)) return recognizer; } return null; RecognizerInfo recognizerInfo = GetKinectRecognizer();

Define grammar for the speech recognizer Using choice and GrammarBuilder Multiple sets of choices can be added in GrammarBuilder Creating grammar from GrammarBuilder Loading grammar into speech recognizer var colorObjects = new Choices(); colorObjects.Add("red"); colorObjects.Add("green"); colorObjects.Add("blue"); colorObjects.Add("yellow"); colorObjects.Add("gray"); // New Grammar builder for color grammarBuilder.Append(colorObjects); // Another Grammar Builder for object grammarBuilder.Append(new Choices("circle", "square", "triangle", "rectangle")); // Create Grammar from GrammarBuilder var grammar = new Grammar(grammarBuilder);

Define grammar for the speech recognizer Can also build grammar using XML SrgsDocument grammarDoc = new SrgsDocument("mygrammar.xml"); Grammar grammar = new Grammar(grammarDoc); <?xml version="1.0" encoding="UTF-8" ?> <grammar version="1.0" xml:lang="en-US" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0" root="Main"> <rule id="color" scope="public"> <one-of> <item>red</item> <item>green</item> <item>blue</item> </one-of> </rule> </grammar>

Define grammar for the speech recognizer Multiple grammars can be loaded to the recognizer

Define grammar for the speech recognizer private void BuildGrammarforRecognizer(RecognizerInfo recognizerInfo) { var grammarBuilder = new GrammarBuilder { Culture = recognizerInfo.Culture }; // first say Draw grammarBuilder.Append(new Choices("draw")); var colorObjects = new Choices(); colorObjects.Add("red"); colorObjects.Add("green"); colorObjects.Add("blue"); colorObjects.Add("yellow"); colorObjects.Add("gray"); // New Grammar builder for color grammarBuilder.Append(colorObjects); // Another Grammar Builder for object grammarBuilder.Append(new Choices("circle", "square", "triangle", "rectangle")); // Create Grammar from GrammarBuilder var grammar = new Grammar(grammarBuilder); // Creating another Grammar and load var newGrammarBuilder = new GrammarBuilder(); newGrammarBuilder.Append("close the application"); var grammarClose = new Grammar(newGrammarBuilder);

// Start the speech recognizer speechEngine = new SpeechRecognitionEngine(recognizerInfo.Id); speechEngine.LoadGrammar(grammar); // loading grammer into recognizer speechEngine.LoadGrammar(grammarClose); // Attach the speech audio source to the recognizer int SamplesPerSecond = 16000; int bitsPerSample = 16; int channels = 1; int averageBytesPerSecond = 32000; int blockAlign = 2; speechEngine.SetInputToAudioStream( audioStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, SamplesPerSecond, bitsPerSample, channels, averageBytesPerSecond, blockAlign, null)); // Register the event handler for speech recognition speechEngine.SpeechRecognized += speechRecognized; speechEngine.SpeechHypothesized += speechHypothesized; speechEngine.SpeechRecognitionRejected += speechRecognitionRejected; speechEngine.RecognizeAsync(RecognizeMode.Multiple); } RecognizeAsync(): performs a single, asynchronous recognition operation. The recognizer performs the operation against its loaded and enabled speech recognition grammars

Handle the different events invoked by the speech recognition engine private void speechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e) { } private void speechHypothesized(object sender, SpeechHypothesizedEventArgs e) { wordsTenative.Text = e.Result.Text; } private void speechRecognized(object sender, SpeechRecognizedEventArgs e) wordsRecognized.Text = e.Result.Text; confidenceTxt.Text = e.Result.Confidence.ToString(); float confidenceThreshold = 0.6f; if (e.Result.Confidence > confidenceThreshold) CommandsParser(e);

private void CommandsParser(SpeechRecognizedEventArgs e) { var result = e.Result; Color objectColor; Shape drawObject; System.Collections.ObjectModel.ReadOnlyCollection<RecognizedWordUnit> words = e.Result.Words; if (words[0].Text == "draw") string colorObject = words[1].Text; switch (colorObject) case "red": objectColor = Colors.Red; break; case "green": objectColor = Colors.Green; case "blue": objectColor = Colors.Blue; case "yellow": objectColor = Colors.Yellow; case "gray": objectColor = Colors.Gray; default: return; }

var shapeString = words[2].Text; switch (shapeString) { case "circle": drawObject = new Ellipse(); drawObject.Width = 100; drawObject.Height = 100; break; case "square": drawObject = new Rectangle(); drawObject.Width = 100; drawObject.Height = 100; case "rectangle": drawObject.Width = 100; drawObject.Height = 60; case "triangle": var polygon = new Polygon(); polygon.Points.Add(new Point(0, 30)); polygon.Points.Add(new Point(-60, -30)); polygon.Points.Add(new Point(60, -30)); drawObject = polygon; default: return; }

canvas1.Children.Clear(); drawObject.SetValue(Canvas.LeftProperty, 80.0); drawObject.SetValue(Canvas.TopProperty, 80.0); drawObject.Fill = new SolidColorBrush(objectColor); canvas1.Children.Add(drawObject); } if (words[0].Text == "close" && words[1].Text == "the" && words[2].Text == "application") { this.Close();

Build KinectAudio App Create a new C# WPF project with name DrawShapeFromSpeech Add Microsoft.Kinect reference Add Microsoft.Speech (not System.Speech!!!) Design GUI Adding code

Add Microsoft.Speech assembly

GUI Design Canvas

Adding Code Import namespaces Add member variables: using Microsoft.Kinect; using Microsoft.Speech.Recognition; using Microsoft.Speech.AudioFormat; using System.IO; Import namespaces Add member variables: Register WindowLoaded event handler programmatically KinectSensor sensor; Stream audioStream; SpeechRecognitionEngine speechEngine; public MainWindow() { InitializeComponent(); Loaded += new RoutedEventHandler(WindowLoaded); }

Adding Code: WindowLoaded private void WindowLoaded(object sender, RoutedEventArgs e) { this.sensor = KinectSensor.KinectSensors[0]; this.sensor.Start(); audioStream = this.sensor.AudioSource.Start(); RecognizerInfo recognizerInfo = GetKinectRecognizer(); if (recognizerInfo == null) MessageBox.Show("Could not find Kinect speech recognizer"); return; } BuildGrammarforRecognizer(recognizerInfo); // provided earlier statusBar.Text = "Speech Recognizer is ready";

Adding Code: code provided earlier Add event handler for speechHypothesized Add event handler for speechRecognized CommandsParser() is invoked, which draws the shape spoken You can close the app by saying: close the application Add event handler for speechRecognitionRejected empty

EEC492/693/793 - iPhone Application Development Challenge Tasks For advanced students, improve the app in the following ways: Enable both color image and skeleton data streams Display color image frames (but not the skeleton) Modify the grammar such that you can add a particular shape to a particular joint location E.g., draw a red circle at the right hand Enable drawing by right (or left) hand, using the color and shape you specified in voice command 6/16/2018 EEC492/693/793 - iPhone Application Development