An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Advances in WP1 Trento Meeting January
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Advanced Speech Enhancement in Noisy Environments
Initial results of burst signal injections into a GEO burst search pipeline Indentify clusters of TF pixels Frame data from IFO Calculate time- frequency.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
Advances in WP1 Turin Meeting – 9-10 March
Advances in WP1 Nancy Meeting – 6-7 July
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
HIWIRE Progress Report Chania, May 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Advances in WP1 and WP2 Paris Meeting – 11 febr
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Receiver Sensitivity Sensitivity describes the weakest signal power level that the receiver is able to detect and decode Sensitivity is dependent on the.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Noise Compensation for Speech Recognition with Arbitrary Additive Noise Ji Ming School of Computer Science Queen’s University Belfast, Belfast BT7 1NN,
November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS Joseph Picone 1 Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
A Novel Method for Burst Error Recovery of Images First Author: S. Talebi Second Author: F. Marvasti Affiliations: King’s College London
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
USE OF IMPROVED FEATURE VECTORS IN SPECTRAL SUBTRACTION METHOD Emrah Besci, Semih Ergin, M.Bilginer Gülmezoğlu, Atalay Barkana Osmangazi University, Electrical.
Quantile Based Histogram Equalization for Noise Robust Speech Recognition von Diplom-Physiker Florian Erich Hilger aus Bonn - Bad Godesberg Berichter:
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
SNR-Invariant PLDA Modeling for Robust Speaker Verification Na Li and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.
Overcoming the Sensing-Throughput Tradeoff in Cognitive Radio Networks ICC 2010.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Reza Yazdani Albert Segura José-María Arnau Antonio González
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Spread Spectrum Audio Steganography using Sub-band Phase Shifting
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Missing feature theory
Survey of Robust Techniques
Potential of Non-Uniform Constellations with Peak Power Constraint
Survey of Robust Techniques
Speech / Non-speech Detection
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter: Chen Hung-Bin

2 Outline Introduction Energy Search-Based Variable Frame Rate Experiments Conclusions and future work

3 Introduction This paper describes a new variable frame rate analysis technique –based upon searching a predefined lookahead interval for the next frame position –lookahead that maximizes the firstorder difference of the log energy (ΔE ) between the consecutive frames These considerations have motivated various variable frame rate (VFR) ASR front-ends –all of which report improvements over their fixed-frame counterparts.

4 Energy Search-Based Variable Frame Rate Analysis The criterion employed in this paper is to determine the optimum relative position of the next frame –by maximizing the difference in log energy between the current frame and possible next frame

5 Energy Calculation The energy of the frame –the current frame –the next frame is dependent upon the candidate frame advance k –where k is defined relative to the beginning of the current (mth) frame

6 Experimen The proposed front-end was evaluated on the Aurora II database, test set A, –This test set contains noisy connected digits created by adding subway, babble, car, and exhibition noise at different SNRs to the original clean utterances –Model sets can alternatively be trained on clean or multi- condition data –of Aurora II results, the average accuracies throughout this section are the mean accuracies over the 0 dB to 20 dB conditions

7 Sample-by-Sample Energy Variation An example of the sample-by-sample energy variation in the speech signal is shown in Figure –In this example, the delta energy is –the peak in delta energy in Figure occurs at about 24.5ms, so the optimum next frame would be centred at 24.5 ms

8 Results for Different Noise Types The energy search VFR (ES-VFR) with Kmin and Kmax equivalent to 8.75 and ms respectively was compared with the ETSI standard front-end by noise type for both clean and multi-condition model sets

9 Results for Different Signal to Noise Ratios For the purposes of SNR comparisons, ES-VFR was compared against two baselines: the ETSI standard front end and the standard front end with cumulative distribution mapping (CDM)

10 Conclusions A noise-robust variable frame rate ASR front-end based upon an energy search that maximizes the first-order difference of the log energy between consecutive frames has been presented Future research will focus on investigating means for improving the multi-condition performance of the proposed scheme