Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Advanced Speech Enhancement in Noisy Environments
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Advances in WP1 Turin Meeting – 9-10 March
INCORPORATING MULTIPLE-HMM ACOUSTIC MODELING IN A MODULAR LARGE VOCABULARY SPEECH RECOGNITION SYSTEM IN TELEPHONE ENVIRONMENT A. Gallardo-Antolín, J. Ferreiros,
Advances in WP1 Nancy Meeting – 6-7 July
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Advances in WP2 Chania Meeting – May
Advances in WP1 and WP2 Paris Meeting – 11 febr
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Department of Electrical Engineering | University of Texas at Dallas Erik Jonsson School of Engineering & Computer Science | Richardson, Texas ,
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
From Auditory Masking to Supervised Separation: A Tale of Improving Intelligibility of Noisy Speech for Hearing- impaired Listeners DeLiang Wang Perception.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
 Detecting system  Training system Human Emotions Estimation by Adaboost based on Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki ( Kobe University ) User's.
Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Estimation of Sound Source Direction Using Parabolic Reflection Board 2008 RISP International Workshop on Nonlinear Circuits and Signal Processing (NCSP’08)
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Active Microphone with Parabolic Reflection Board for Estimation of Sound Source Direction Tetsuya Takiguchi, Ryoichi Takashima and Yasuo Ariki Organization.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement Summer 2009
Mr. Darko Pekar, Speech Morphing Inc.
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments Good morning, My name is Guan-Lin Chao, from Carnegie Mellon.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
A maximum likelihood estimation and training on the fly approach
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Speech / Non-speech Detection
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Speech Prosody Conversion using Sequence Generative Adversarial Nets
Combination of Feature and Channel Compensation (1/2)
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10

Outline Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy Environments Robust Speech Enhancement Techniques for ASR in Non- stationary Noise and Dynamic Environments NMF-base Temporal Feature Integration for Acoustic Event Classificaion

Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy Environments Ryo AIHARA, Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI Graduate School of System Informatics, Kobe University, Japen

Introduction We present in this paper a noise robust voice conversion(VC) method for a person with an articulation disorder resulting from athetoid cerebral pslsy. Exemplar-based spectral conversion using NMF is applied to a voice with an articulation disorder in real noisy environments. NMF is a well-known approach for source separation and speech enhancement. Poorly articulated noisy speech -> clean articulation

Voice conversion based on NMF

Constructing the individuality-preserving dictionary

Experimental Results ATR Japanese speech database.

Conclusions We proposed a noise robust spectral conversion method based on NMF for a voice with an articulation disorder. Our VC method can improve the listening intelligibility of words uttered by a person with an articulation disorder in noisy environments.

Robust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic Environments Gang Liu, Dimitrios Dimitriadis, Enrico Bocchieri Center for Robust Speech Systems, University of Texas at Dallas

Introduction In the current ASR systems the presence of competing speakers greatly degrades the recognition performance. Furthermore, speakers are, most often, not standing still while speaking. We use Time Differences of Arrival(TDOA) estimation, multi- channel Wiener Filtering, NMF, multi-condition training, and robust feature extraction.

Proposed cascaded system The problem of source localization/separation is often addressed by the TDOA estimation.

Experiment and results

NMF provides the largest boost, due to the suppression of the non-stationary interfering signals.

Conclusion We propose a cascaded system for speech recognition dealing with non-stationary noise in reverberated environments. The proposed system offers an average of 50% and 45% in relative improvements for the above mentioned two scenarios.

NMF-base Temporal Feature Integration for Acoustic Event Classificaion Jimmy Ludena-Choez, Ascension Gallardo-Antolin Dep. of Signal Theory and Communications, Universidad Carlos III de Madrid, Avda de la Universidad 30,28911 – Leganes(Madrid), Spain

Introduction This paper propose a new front-end for Acoustic Event Classification tasks(AEC) based on the combination of the temporal feature integration technique called Filter Bank Coefficients(FC) and Non-Negative Matrix Factorization. FC allows to capture the dynamic structure in the short time features. We present an unsupervised method based on NMF for the design of a filter bank more suitable for AEC.

Audio feature extraction

Experiments and results

Conclusions We have presented a new front-end for AEC based on the combination of FC features and NMF. NMF is used for the unsupervised learning of the filter bank which captures the most relevant temporal behavior in the short-time features. Low modulation frequencies are more important than the high ones for distinguishing between different acoustic events. The experiments have shown that the features obtained with this method achieve significant improvements in the classification performance of a SVM-based AEC system in comparison with the baseline FC parameters.