Wake-Up-Word Speech Recognition:

Slides:



Advertisements
Similar presentations
Some Reflections on Augmented Cognition Eric Horvitz ISAT & Microsoft Research November 2000 Some Reflections on Augmented Cognition Eric Horvitz ISAT.
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
USA AREA CODES APPLICATION by Koffi Eddy Ihou May 6,2011 Florida Institute of Technology 1.
SirenDetect Alerting Drivers about Emergency Vehicles Jennifer Michelstein Department of Electrical Engineering Adviser: Professor Peter Kindlmann May.
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Understanding the Internet Low Bit Rate Coder Jan Linden Vice President of Engineering Global IP Sound Presented by Jan Skoglund Sr. Research Scientist.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Implementing a Speech Recognition System on a GPU using CUDA
NEURAL NETWORKS FOR DATA MINING
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
C# Language Panithan Chandrapatya Agenda C# History C# Goals C# Fixes C# Contribution C# Features C# Success C# Example.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
CT101: Computing Systems Introduction to Operating Systems.
1 “A picture speaks a thousand words.” Art By Ranjith & Waquas Islamiah Evening College.
Introduction to Machine Learning, its potential usage in network area,
Software and Communication Driver, for Multimedia analyzing tools on the CEVA-X Platform. June 2007 Arik Caspi Eyal Gabay.
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Software Testing.
Automatic Speech Recognition
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Speech recognition in mobile environment Robust ASR with dual Mic
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Artificial Intelligence for Speech Recognition
Conditional Random Fields for ASR
Introduction to Operating System (OS)
Comprehensive Design Review
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Scenarios, Niches, Architectures
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Computer Science I CSC 135.
Digital Processing Platform
Assistive System Progress Report 1
Automatic Speaker Identification Using Sentinel Word Discrimination
Understanding the Internet Low Bit Rate Coder
Isolated word, speaker independent speech recognition
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Saul Greenberg Human Computer Interaction Presented by: Kaldybaeva A., Aidynova E., 112 group Teacher: Zhabay B. University of International Relations.
On the Integration of Speech Recognition into Personal Networks
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Analysis models and design models
Multimodal Human-Computer Interaction New Interaction Techniques 22. 1
“Smart Room” A Vision for an Integrated Research at FIT
Digital Systems: Hardware Organization and Design
Automated Analysis and Code Generation for Domain-Specific Models
A maximum likelihood estimation and training on the fly approach
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Spontaneous Voice Driven Interaction with Avatars: Discriminating Alerting and Referential Contexts of Sentinel Words The Problem The Solution e-WUW:
Presenter: Shih-Hsiang(士翔)
Keyword Spotting Dynamic Time Warping
Multichannel Link Path Analysis
Presentation transcript:

Wake-Up-Word Speech Recognition: A Missing Link to Natural Language Understanding Dr. Veton Këpuska ECE Department vkepuska@fit.edu

What is: Wake-Up-Word Recognition Wake-Up-Word (WUW) Speech/Voice Recognition (SR): Automatic Speech Recognition Task of identifying a single word/phrase in a continuous free speech – Correct Recognition (e.g.): <HAL> – Arthur Clark’s “Space Odyssey 2001”, <Computer> – Capt. Pickard’s Star Trek’s computer on the starship “Enterprise”, or <Operator> – Capt. Këpuska’s WUW-SR System & more importantly Automatic Recognition of any other noise/sound/word/phrase etc. NOT to be that WUW – Correct Rejection. November 27, 2018 Dr. Veton Këpuska

WUW-SR WUW-SR Requires Continuous Monitoring of Speech WUW can be used to: Get Attention, Provide/Change Context, Resynchronize Communication Mimic Human to Human Interaction and Communication that currently is not possible, & Provides for significantly more efficient Solution (Memory and CPU) vs. any Natural Language Understanding System. It is a mode of communication that would enable more natural interaction of man and machine. November 27, 2018 Dr. Veton Këpuska

Natural Language Understanding (NLU) Task Massachusetts Institute of Technology’s (MIT’s) Spoken Language Systems Laboratory’s mission statement states: “Our goal is both simple and ambitious – create technology that makes it possible for everyone in the world to interact with computers via natural spoken language. Conversational interfaces will enable us to converse with machines in much the same way that we communicate with one another and will play a fundamental role in facilitating our move toward an information-based society”. To achieve this goal, SR and NLU communities implicitly position the solution to WUW problem in the context of solving overall natural language understanding problem. When a system that can understand the whole language is developed, the WUW problem will be solved. November 27, 2018 Dr. Veton Këpuska

Natural Language Understanding Task - Problem There are two major problems with the approach that requires solving the WUW problem within a general framework of the speech and natural language understanding system: Is an expensive solution (CPU, memory, etc.) It does not exist yet because it is very difficult to achieve. Even if it is possible to develop NLU Systems close to human capabilities – WUW is still needed (see previous slide 3). November 27, 2018 Dr. Veton Këpuska

WUW-SR Acoustic-Linguistic Context Current Implementation of WUW recognizes how he/she intuitively would use a proper name to get attention: It does not respond to other contexts where the same word (e.g., “OPERATOR”) is used for other purposes. What are other WUW contexts? November 27, 2018 Dr. Veton Këpuska

Wizard of Oz Experiment (NSF 05-551 Proposal) Study possible uses of WUW in human-to-human communication. Collaboration with: Dr. Deborah Carstens – Human Machine Interface Specialist (FIT - Management Information Systems) Dr. Ron Wallace – Bio-Behavioral Anthropology and English Language (UCF). Department of Psychology – Behavior Analysis Laboratory. November 27, 2018 Dr. Veton Këpuska

History of Wake-Up-Word Speech Recognition Wildfire of Waltham Massachusetts: Introduced rudimentary capability for Wake-Up-Word (WUW) Recognition through Personal Assistant application in mid 90’s. At that time the solution was not recognized nor was developed as being a WUW-SR problem. Application was restricted to specific word: “Wildfire” This custom solution did not perform sufficiently well and thus Wildfire does not exist any longer. November 27, 2018 Dr. Veton Këpuska

History of Wake-Up-Word Speech Recognition (cont.) Këpuska generalized and introduced a novel way of performing WUW Recognition while at ThinkEngine Networks, Marlborough, MA (2001-2003) Recognition performance of the patented solution allows practical application of WUW for any suitable word (e.g., Verizon’s “IOBI” project). Demonstration uses fixed point DSP implementation simulated in Windows platform. New generation of WUW-SR system using floating-point C++ implementation almost ready for prime time. Simulations of floating-point system indicate significant improvement over the fixed point implementation November 27, 2018 Dr. Veton Këpuska

Wake-Up-Word Speech Recognition Technology ~26000 Number of Lines of Fixed Point Implementation of C Code & Model Data. Uses Dynamic Time Warping Algorithm for Pattern Matching (DTW) Features are based on Mel-Scale Cepstral Coefficients (MFCC) + Delta’s and Second Order Delta’s Uses single Speaker Independent Model. Achieves high density on DSP November 27, 2018 Dr. Veton Këpuska

WUW-SR System: Initial Development ThinkEngine Networks, Marlborough, MA 84 Simultaneous Channels of WUW Recognition on each fixed point TI’s TMS320C205 DSP 200MHz Memory Space: 64K Byte Program 64K Byte Data 2M Byte External Data Total of 672 Channels with farm of 8 DSPs Recognition Rate >95% with ~0% False Acceptance. November 27, 2018 Dr. Veton Këpuska

Solution: 3 Patented Inventions Fundamental Contribution to Pattern Recognition Patent Application 13323-009001 - 10/152,095: “Dynamic Time Warping (DTW) Matching” Extended DTW Matching. Patent Application 13323-010001 - 10/152,447: “Rescoring using Distribution Distortion Measurements of Dynamic Time Warping Match” Feature Based Voice Activity Detector (VAD) Patent Application 13323-011001 - 10/144,248: “Voice Activity Detection Based on Cepstral Features” November 27, 2018 Dr. Veton Këpuska

WUW Fixed-Point System Performance Distribution Plot of Confidence Scores for WUW "Operator" 1.0 INV 0.9 INV-CUMMULATIVE 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Equal Error Rate OOV OOV-CUMMULATIVE Operating Threshold [%] 20 40 60 80 100 Confidence Score (0-100)% November 27, 2018 Dr. Veton Këpuska

WUW-SR Development Status Implemented C++ ETSI-MFCC Front End: Extraction of Mel-Filtered Cepstral Coefficients Standard Processing Technique to be used as a baseline C++ Framework and applied implementation emphasizes modularity to facilitate research Implemented Dynamic Time Warping (DTW) as a Back-End of the Recognition system. Integrated Perl scripts to automate model building and accuracy testing procedures. Includes automatic graph generation November 27, 2018 Dr. Veton Këpuska

Current Architecture of WUW-SR System November 27, 2018 Dr. Veton Këpuska

Performance of WUW-SR Floating Point System November 27, 2018 Dr. Veton Këpuska

WUW-SR System Performance How is it possible to achieve this performance? Considering: Single Speaker Independent Model for WUW No Additional Modeling for other acoustic events: noise/tone/sound/word/phrase Clever use of Two-Pass Scoring November 27, 2018 Dr. Veton Këpuska

Usual Recognition Scoring: First Score Standard “First” Recognition Score Performance Lowest Score of an OOV Sample November 27, 2018 Dr. Veton Këpuska

“Second” Score is NOT-Independent from the “First” Score Distribution of Second Score as Function of First Score Lowest Score of an OOV Sample November 27, 2018 Dr. Veton Këpuska

How to Obtain “Second” Score? All modern Speech Recognition Systems use multiple scoring techniques: Re-scoring N-best hypothesis to Improve Correct Recognition based on: More elaborate recognition algorithm Baum-Welch Forward-Backward HMM Scoring vs. Viterbi Scoring Different Features MFCC (Mel-scale Filtered Cepstral Coefficients) RASTA-PLP (Relative Spectral Transform - Perceptual Linear Prediction) Other Proprietary front-end’s Re-scoring using additional models (of non-WUW’s) to improve Correct Rejection (“Garbage Models”) November 27, 2018 Dr. Veton Këpuska

WUW-SR System Uses Proprietary solution that Does not require additional “Garbage Models” to increase robustness and Correct Rejection Rate, e.g., It is model independent, and even It is matching algorithm independent (DTW, HMM, Graphical Modeling, or any other paradigm). November 27, 2018 Dr. Veton Këpuska

What Next? WUW-SR: Useful technology for numerous applications: “Voice Activated” Car Navigation System Current Solutions apply mixed interfaces: Driver must press a button while speaking to the system. Dictation Systems: Require lunching the application and “informing” the system when dictation is “on” and when is “off”. PDA – removing stylus as necessary interface tool. Keyboard-less laptop computers. “Smart Rooms” November 27, 2018 Dr. Veton Këpuska

Smart Room Application November 27, 2018 Dr. Veton Këpuska

Microphone Arrays Applied Perception Laboratory CE313 November 27, 2018 Dr. Veton Këpuska

Noise Removal First Place at UML-ADI Competition June, 2005. Developed Wiener Filter Nose Removal and implemented on Analog Devices “Shark” DSP: November 27, 2018 Dr. Veton Këpuska

Speech Processing and Recognition System Architecture 48 kHz to 8 kHz Down-sampling with 70 Tap FIR Filter Wiener Filter Based Noise Removal: Switch Controlled Activation of the De-noising Algorithm Automatic Gain Control: Switch Controlled Activation of the Algorithm LED Indicate the processing state of the System Wake-Up-Word Speech Recognition Software ~26000 Lines of Speech Recognition Engine Code & Model Data in C. ~5000 Lines of Embedded C code November 27, 2018 Dr. Veton Këpuska

Experimental Results Windows PC Noisy test file: After de-noise: November 27, 2018 Dr. Veton Këpuska

Experimental Results Windows PC Footloose: Not Footloose: November 27, 2018 Dr. Veton Këpuska

Results: why didn’t this work? Hair dryer: Still there?!?!: November 27, 2018 Dr. Veton Këpuska

Experimental Results Windows PC Hair dryer: Gone: November 27, 2018 Dr. Veton Këpuska

Experimental Results on DSP Brown Noise Example: November 27, 2018 Dr. Veton Këpuska

Experimental Results on DSP Drill Test November 27, 2018 Dr. Veton Këpuska

Experimental Results on DSP Closer Drill Noise November 27, 2018 Dr. Veton Këpuska

Experimental Results on DSP Brown Noise + Drill November 27, 2018 Dr. Veton Këpuska

Research: Tools Development MATLAB (NSF EMD-MLR), perl, gnuplot November 27, 2018 Dr. Veton Këpuska

What is missing? In need of more of highly motivated students. No news there! Business opportunities and ventures need to be considered. Help, advice, … welcome. November 27, 2018 Dr. Veton Këpuska