Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星

Slides:



Advertisements
Similar presentations
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Advertisements

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Advanced Speech Enhancement in Noisy Environments
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Facial feature localization Presented by: Harvest Jang Spring 2002.
Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation Man-Wai MAK and Hon-Bill YU The Hong Kong Polytechnic University.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Advances in WP1 Turin Meeting – 9-10 March
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Speech Recognition in Noise
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Why is ASR Hard? Natural speech is continuous
Advisor: Prof. Tony Jebara
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
By Sarita Jondhale1 Pattern Comparison Techniques.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Endpoint Detection ( 端點偵測 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CSIE Dept National Taiwan Univ., Taiwan.
Speech Enhancement Using Spectral Subtraction
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Multimodal Information Analysis for Emotion Recognition
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
國立交通大學 電信工程研究所 National Chiao Tung University Institute of Communication Engineering 1 Phone Boundary Detection using Sample-based Acoustic Parameters.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Wrapping Snakes For Improved Lip Segmentation Matthew Ramage Dr Euan Lindsay (Supervisor) Department of Mechanical Engineering.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Speech Enhancement Summer 2009
Feature Mapping FOR SPEAKER Diarization IN NOisy conditions
Spoken Digit Recognition
Automated Detection of Speech Landmarks Using
Statistical Models for Automatic Speech Recognition
Missing feature theory
Endpoint Detection ( 端點偵測)
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Speech / Non-speech Detection
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
An Algorithm for Determining the Endpoints for Isolated Utterances
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星

2 Reference Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

3 Summary Entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments Better than energy-based algorithms in both detection accuracy and recognition performance Error reduction: 16%

4 Motivation Energy-based endpoint detection becomes less reliable when dealing with non- stationary noise and sound artifacts such as lip smacks, heavy breathing and mouth clicks, etc. Spectral entropy is effective in distinguishing the speech segments from the non-speech parts.

5 Spectral Entropy PDF: Normalization Spectral entropy:

6 Properties of Entropy N=2 entropyPlot.m N=3

7 Entropy Weighting A set of weighting factors can be applied: These weighting factors are statistically estimated from a large collection of speech signals.

8 Endpoint Detection The sum of the spectral entropy values over a duration of frames (20 frames) is first evaluated and smoothed by a median filter Some thresholds are used to detect the beginning and ending boundaries of the embedded speech segments A short period of background noise is first taken as the reference for some initial boundary detection process. Short speech segments (<100ms) are rejected.

9 Experiment Settings Speech database Isolated digits in Mandarin Chinese produced by 100 speakers (10 speakers for test, others for training) Speech features: 12-order MFCC and 12- order delta MFCC Models Continuous-density HMM 6 states/digits, 3 mixture/state

10 Experiment Settings Noise NOISEX-92 noise-in-speech database White noise, pink noise, volvo noise (car noise), F16 noise, machinegun noise Sound artifacts Breath noise, cough noise and mouse click noise.

11 Example

12 Experimental Results

13 Experimental Results

14 Something Not Clear… What is the sample rate? Bit resolution? What is the frame size and overlap? What is the order of the median filter? How to use the “short period of background noise”? What is the value for the thresholds of spectral entropy for determining boundaries? What are the values for  1 and  2 ?