Advances in WP1 Turin Meeting – 9-10 March 2006 www.loquendo.com.

Slides:



Advertisements
Similar presentations
Advances in WP1 Trento Meeting January
Advertisements

Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Advanced Speech Enhancement in Noisy Environments
Towards speaker and environmental robustness in ASR: the HIWIRE project A. Potamianos 1, G. Bouselmi 2, D. Dimitriadis 3, D. Fohr 2, R. Gemello 4, I. Illina.
Advances in WP2 Torino Meeting – 9-10 March
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation Man-Wai MAK and Hon-Bill YU The Hong Kong Polytechnic University.
Advances in WP2 Nancy Meeting – 6-7 July
Advances in WP1 Nancy Meeting – 6-7 July
ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: Computer Vision, Speech Communication and Signal Processing Research.
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
HIWIRE Progress Report Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos (WP1) Vassilis Diakoloukas (WP2)
ICCS-NTUA : WP1+WP2 Prof. Petros Maragos NTUA, School of ECE URL: Computer Vision, Speech Communication and.
Advances in WP2 Trento Meeting – January
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Development of protocols WP4 – T4.2 Torino, March 9 th -10 th 2006.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Communications & Multimedia Signal Processing 1 Speech Communication for Mobile and Hands-Free Devices in Noisy Environments EPSRC Project GR/S30238/01.
Advances in WP2 Chania Meeting – May
Advances in WP1 and WP2 Paris Meeting – 11 febr
HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.
LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006.
1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Advances in WP1 Chania Meeting – May
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Why is ASR Hard? Natural speech is continuous
Robust Automatic Speech Recognition by Transforming Binary Uncertainties DeLiang Wang (Jointly with Dr. Soundar Srinivasan) Oticon A/S, Denmark (On leave.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
An Analysis of the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Noise Compensation for Speech Recognition with Arbitrary Additive Noise Ji Ming School of Computer Science Queen’s University Belfast, Belfast BT7 1NN,
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS Joseph Picone 1 Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Algoritmi e Programmazione Avanzata
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Automatic speech recognition using an echo state network Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Speech Enhancement based on
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
An Analysis of the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement Summer 2009
Liverpool Keele Contribution.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
DeLiang Wang (Jointly with Dr. Soundar Srinivasan) Oticon A/S, Denmark
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Speech / Non-speech Detection
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Advances in WP1 Turin Meeting – 9-10 March

2 WP1: Environment & Sensor Robustness T1.2 Noise Independence Voice Activity Detection: –Model based approach using NN “Non-linear estimation of voice activity to improve automatic recognition of noisy speech”, Roberto Gemello, Franco Mana and Renato De Mori Eurospeech 2005, Lisboa, September 2005 Noise Reduction: –Spectral Subtraction (standard, Wiener and SNR dependent) and Spectral Attenuation (Ephraim-Malah SA standard and SNR dependent) “Automatic Speech Recognition With a Modified Ephraim-Malah Rule”, Roberto Gemello, Franco Mana and Renato De Mori IEEE Signal Processing Letters, VOL 13, NO 1, January 2006 –Evaluation of HEQ for feature normalization –New techniques for non-stationary noises

3 WP1: Environment & Sensor Robustness T1.2 Noise Independence Voice Activity Detection: –A Model-based approach using NN (Neural Networks) to discriminate two classes (noise and voice) will be explored; –NN input could be standard features (Cepstral coeff., Energy) after noise reduction, in case complemented by other features (pitch/voicing) produced by other partners (IRST); –Training set will be multi-style, including several types of noise conditions and languages Noise Reduction: –Some noise reduction techniques will be experimented on the test sets selected as benchmarks for the project: Spectral Subtraction (standard, Wiener and SNR dependent) and Spectral Attenuation (Ephraim-Malah SA standard and SNR dependent) New techniques for non-stationary noises

4 WP1: Speech Databases for Noise Reduction Aurora 2 - Connected digits - TIdigits data down sampled to 8 kHz, filtered with a G712 characteristic and noise artificially added at several SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB). There are three test sets: –A: same noises as in train: subway, babble, car noise, exhibition hall; –B: 4 different noises: restaurant, street, airport, train station; –C: same noises as A but filtered with a different microphone Aurora 3 - Connected digits recorded in car environment - Signal collected by hand free (ch1) and close talk (ch0) microphones. In HIWIRE we use Italian and Spanish recordings. There are two test sets: –WM: ch0 and ch1 recordings used in training and testing lists; –HM: ch0 for training and ch1 for testing Aurora 4 - Continuous speech 5k vocabulary - It is WSJ0 5K with added noise of 6 kinds: Car, Babble, Restaurant, Street, Airport, Train station. It uses the standard Bi-Gram language modeling. CLEAN / MULTI-CONDITIONS training modes.

Baseline evaluations of Loquendo ASR on Aurora2 speech databases

6 Baseline Performance evaluations Performances in terms of Word Accuracy and (Error Reduction) CLEAN ModelsTest ATest BTest CA-B-C RPLP Wiener SNR Dep.84.0(34.4)84.4(30.7)83.3(32.4)84.0(32.5) + EphMal SNR Dep.85.3(39.7)84.2(29.5)84.8(34.5)84.8(35.9) MULTI ModelsTest ATest BTest CA-B-CAvg. RPLP Wiener SNR Dep.93.9(6.1)92.1(11.2)90.5(3.1)92.5(7.4)88.2(25.8) + EphMal SNR Dep.94.0(7.7)92.0(10.1)91.1(9.2)92.6(8.6)88.7(28.9) LASR ModelsTest ATest BTest CA-B-C RPLP Wiener SNR Dep.88.1(37.7)88.3(29.9)86.2(38.4)87.8(35.1) + EphMal SNR Dep.89.0(42.4)88.6(31.7)87.0(41.9)88.4(38.3)

Baseline evaluations of Loquendo ASR on Aurora3 speech databases

8 Baseline Performance evaluations Performances in terms of Word Accuracy and (Error Reduction) Aurora3 ModelsIta WMIta HMSpa WMSpa HM RPLP Wiener SNR dep.98.3 (5.5)77.5 (59.4)97.6 (11.1)89.9 (60.2) + EphMal SNR dep.98.4 (11.1)82.2 (66.7)97.7 (14.8)88.5 (54.7) LASR ModelsIta WMIta HMSpa WMSpa HM RPLP Wiener SNR dep (41.7)-84.9 (26.6) + EphMal SNR dep (43.8)-86.2 (33.0)

Baseline evaluations of Loquendo ASR on Aurora4 speech databases (to be done)

10 Baseline Performance evaluations (Sennheiser microphone) This test was performed with the Loquendo ASR with the CLEAN / MULTI_CONDITION models trained using the Aurora4 training list. The test has been done using the CLEAN and the 6 noises testing lists. CLEAN Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP Wiener SNR dep (0.0) 67.0 (27.8) 36.6 (17.5) 30.7 (1.8) 43.1 (13.8) 31.9 (3.7) 48.8 (24.4) 43.0 (14.0) MULTI Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP Wiener SNR dep (-1.9) 75.9 (2.8) 60.3 (1.0) 56.8 (-3.3) 60.4 (5.5) 60.5 (-1.0) 62.9 (12.3) 62.8 (2.9)

11 Baseline Performance evaluations (second microphone) This test was performed with the Loquendo ASR with the CLEAN / MULTI_CONDITION models trained using the Aurora4 training list. The test has been done using the CLEAN and the 6 noises testing lists. MULTI Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP Wiener SNR dep (-7.9) 59.6 (1.5) 47.7 (4.0) 44.7 (-1.1) 45.8 (4.4) 47.8 (-1.5) 48.0 (7.0) 48.9 (2.5) CLEAN Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP Wiener SNR dep (1.7) 50.2 (22.5) 25.7 (11.3) 23.6 (2.8) 29.2 (8.2) 22.6 (4.1) 35.1 (15.9) 31.1 (10.4)

12 Baseline Performance evaluations (Loquendo Models) This test was performed with the Loquendo ASR with the Loquendo models trained using MACROPHONE. The test has been done using the CLEAN, the 6 noises testing lists, and both microphones. Sennheiser micro CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP Wiener SNR dep (0.0) 74.5 (20.1) 53.7 (17.5) 46.4 (7.7) 53.5 (21.2) 46.9 (5.7) 59.9 (31.3) 55.8 (17.1) Second micro CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP Wiener SNR dep (0.7) 65.0 (20.1) 47.2 (17.9) 40.4 (6.3) 46.2 (19.0) 42.5 (7.0) 53.1 (26.5) 49.1 (16.0)

13 HEQ Evaluation (1) (Loquendo & UGR) The HEQ algorithm introduces an amplification of the coefficient (energy in this case) in the background noise audio segment.

14 HEQ Evaluation (2) (Loquendo & UGR) The HEQ algorithm introduces a context dependent normalization. This could be a drawback for open-vocabulary recognizer where phoneme based acoustic models are used.

15 HEQ Integration (3) (Loquendo & UGR) Loquendo FE UGR HEQ Loquendo ASR Denoise (Power Spectrum level) Feature Normalization (Frame -39coeff- level) Phoneme-based Models AURORA3 ITA - HM SAWAWIWDWS Loquendo46.6%77.5%4.8%7.2%10.4% +HEQ %69.6%4.3%12.6%13.5% +HEQ %77.7%4.0%7.3%11.0%

16 WP1: Workplan Selection of suitable benchmark databases; (m6) Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR dependent) (m12) Discriminative VAD (training+AURORA3 testing) (m16) Exprimentation of Spectral Attenuation rule (Ephraim-Malah SNR dependent) (m21) Integration of denoising and normalization techniques (m33) Noise estimation and reduction for non-stationary noises (m33)