HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.
統計圖等化法於雜訊語音辨識之進一步研究 An Improved Histogram Equalization Approach for Robust Speech Recognition 2012/05/22 報告人:汪逸婷 林士翔、葉耀明、陳柏琳 Department of Computer Science.
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.
Advances in WP1 Turin Meeting – 9-10 March
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Advances in WP1 Nancy Meeting – 6-7 July
2 Personal Introduction previousnexthome end Academic Experience ( ) Bachelor and Master Degree on Electrical Engineering, Zhejiang University,
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
HIWIRE MEETING Chania, May 10-11, 2007 José C. Segura.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Object Detection and Tracking Mike Knowles 11 th January 2005
Speech Recognition in Noise
Advances in WP1 and WP2 Paris Meeting – 11 febr
HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
Codebook-based Feature Compensation for Robust Speech Recognition 2007/02/08 Shih-Hsiang Lin ( 林士翔 ) Graduate Student National Taiwan Normal University,
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Correntropy as a similarity measure Weifeng Liu, P. P. Pokharel, Jose Principe Computational NeuroEngineering Laboratory University of Florida
11.0 Robustness for Acoustic Environment References: , 10.6 of Huang 2. “Robust Speech Recognition in Additive and Convolutional Noise Using Parallel.
15.0 Robustness for Acoustic Environment References: , 10.6 of Huang 2. “Robust Speech Recognition in Additive and Convolutional Noise Using Parallel.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Multimodal Information Analysis for Emotion Recognition
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
CAMEO: Face Recognition Year 1 Progress and Year 2 Goals Fernando de la Torre, Carlos Vallespi, Takeo Kanade.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition Shih-Hsiang Lin, Berlin Chen, Yao-Ming.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Speech Enhancement based on
Proposed Courses. Important Notes State-of-the-art challenges in TV Broadcasting o New technologies in TV o Multi-view broadcasting o HDR imaging.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement Summer 2009
Dynamical Statistical Shape Priors for Level Set Based Tracking
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Statistical Models for Automatic Speech Recognition
A Tutorial on Bayesian Speech Feature Enhancement
Missing feature theory
Speech / Non-speech Detection
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR

2 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Schedule  VAD for noise suppression & frame-dropping  Long-Term Spectral divergence  Subband OS-based detector  Non-linear feature normalization  Histogram equalization  OS-based equalization  Segmental implementation

3 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 VAD (1)  VAD: motivation  To get an estimation of the background noise for  Wiener filter design  Spectral subtraction  To discard non-speech frames WIENER FILTER / SS VAD FRAME DROPPING NOISE ESTIMATION RECOGNIZER NOISY SPEECH

4 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 VAD (2)  Our approach  Use of rather long time spans (~100ms) instead of instantaneous measures  Increase discrimination  Use an statistical model in the log-FBE domain  Smoother estimations  Use a feedback decision coupled with noise suppression  VAD works on less noisy speech  Use of Order Statistics  More robust estimation

5 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Long-Term Spectral Divergence (1)  J. Ramírez, J.C. Segura, C. Benítez, A. de la Torre and A.J. Rubio, Efficient voice activity detection algorithms using long-term speech information, Speech Communication 42 (2004) 271–287

6 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Long-Term Spectral Divergence (2)

7 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Long-Term Spectral Divergence (3)

8 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Long-Term Spectral Divergence (4)

9 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Long-Term Spectral Divergence (5)

10 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Long-Term Spectral Divergence (7)  Recognition experiments with AURORA 2 and 3

11 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Long-Term Spectral Divergence (6)

12 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Subband OSF VAD (1)  J. Ramírez, J.C. Segura, C. Benítez, A. de la Torre, and A.J. Rubio, An Effective Subband OSF-based VAD with Noise Reduction for Robust Speech Recognition, IEEE Trans. On Speech and Audio Processing (to appear in 2005)  Decision is based on averaged QSNR defined as a inter-quantile difference  Feedback structure  VAD operates over the noise-reduced signal

13 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Subband OSF VAD (2)

14 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Subband OSF VAD (3)

15 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Subband OSF VAD (4)

16 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Subband OSF VAD (5)

17 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Accurate VAD  Open topics  New alternatives to improve the performance  New decision criteria based on OS- filters  Already used for edge detection in images  Computational efficiency  Development of computationally efficient algorithms

18 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Feature normalization  Objective  Transform features to remove undesired variability  Linear techniques  CMS  Cepstral mean subtraction  Removes the effect of linear channel distortion  CMVN  Cepstral mean and variance normalization  Extension of CMS to deal with variance reduction caused by the additive noise

19 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Feature normalization  Non-linear feature distortion  Environment effects are non-linear for MFCC features  And can hardly be removed with linear techniques  Because not only the location (mean) and scale (variance) of the feature distributions are affected, but also the shape (affecting higher order moments of the distribution)  Non-linear extensions  CDF-matching approaches (HEQ and related)  Have been proved to be more effective than linear ones  Give normalization for not only the two first moments of the probability distributions

20 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 CDF-matching based equalization  The main idea  Transform the features to match a given PDF  In the one-dimensional case CDF-matching gives the solution

21 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Equalization and robust classifiers

22 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Invariance  CMS is invariant to additive bias  CMVN is invariant to linear transformations  Equalization to a reference distribution is invariant to any invertible transformation (including non-linear ones)

23 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 HEQ for robust speech recognition (1)  A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Pérez, C. Benítez and A.J. Rubio, Histogram equalization of speech representation for robust speech recognition, IEEE Tans. On Speech and Audio Processing (to appear in 2005)  Transformation of each component of the MFCC vector to a Gaussian reference  Cumulative distribution are estimated using histograms  Performance compared with CMS, CMVN and model-based feature compensation (VTS)  Combination with (VTS)

24 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 HEQ for robust speech recognition (2)

25 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 HEQ for robust speech recognition (3)

26 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 HEQ for robust speech recognition (4)

27 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 HEQ for robust speech recognition (5)

28 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Segmental HEQ (1)  J.C. Segura, C. Benítez, A. de la Torre, A.J. Rubio and J. Ramírez, Cepstral Domain Segmental Nonlinear Feature Transformations for Robust Speech Recognition, IEEE Signal Processing Letters, 11(5), May 2004  A segmental implementation of HEQ for non-stationary noise  A temporal buffer is used for the histogram estimation instead of the full sentence  The algorithmic delay is T frames

29 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Segmental HEQ (2)

30 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 OSEQ: An efficient implementation (1)  A very computationally efficient algorithm based on Order Statistics

31 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 OSEQ: An efficient implementation (2)

32 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Feature normalization  Open topics  Reference distribution Clean speech / Gaussian / ¿Others?  Dynamic features normalization (  and   ) After, before or simultaneously [Obuchi, Stern, EUSP’03]  Progressive normalization Not all MFCC are equally affected and do not have equal discriminative power [de Wet, …, ICASSP’03] Lower order moments normalization [Hsu, Lee, ICASSP’04]  Parametric techniques Actual approaches are non-parametric [ Haverinen, Kiss, EUSP’03]  New applications Speaker independence and adaptation Multi-stream normalization

33 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 Combination of techniques  Development of a combined robust front-end  An accurate VAD  For noise parameter estimation  A noise reduction technique  Spectral subtraction or Wiener filter  Statistical feature compensation  A Frame-Dropping algorithm  To discard non-speech frames  And a Feature normalization block  For residual non-linear distortion compensation

34 José C. Segura Luna HIWIRE Meeting – Crete, September, 2004 VAD (1)  Development of a combined robust front-end WIENER FILTER / SS VAD FRAME DROPPING NOISE ESTIMATION FEATURE EQUALIZATION NOISY SPEECH RECOGNIZER

HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR