Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.

Slides:



Advertisements
Similar presentations
On an Improved Chaos Shift Keying Communication Scheme Timothy J. Wren & Tai C. Yang.
Advertisements

Time Series II.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Component Analysis (Review)
Dimensionality Reduction PCA -- SVD
Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann,
Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
CHAPTER 4 Noise in Frequency Modulation Systems
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Symmetric Matrices and Quadratic Forms
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speech Recognition in Noise
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Orthogonal Transforms
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Lecture 1 Signals in the Time and Frequency Domains
Masquerade Detection Mark Stamp 1Masquerade Detection.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
1 Secure Cooperative MIMO Communications Under Active Compromised Nodes Liang Hong, McKenzie McNeal III, Wei Chen College of Engineering, Technology, and.
March 8, 2006Spectral RTL ATPG1 High-Level Spectral ATPG for Gate-level Circuits Nitin Yogi and Vishwani D. Agrawal Auburn University Department of ECE.
Presented by Tienwei Tsai July, 2005
Non Negative Matrix Factorization
1 The Fourier Series for Discrete- Time Signals Suppose that we are given a periodic sequence with period N. The Fourier series representation for x[n]
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.
Rozhen 2010, June Singular Value Decomposition of images from scanned photographic plates Vasil Kolev Institute of Computer and Communications Systems.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University.
2D-LDA: A statistical linear discriminant analysis for image matrix
CS654: Digital Image Analysis Lecture 11: Image Transforms.
Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs C.G. Puntonet and A. Prieto (Eds.): ICA 2004 Presenter.
Unsupervised Learning II Feature Extraction
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Digital Communications Chapter 13. Source Coding
LECTURE 10: DISCRIMINANT ANALYSIS
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Principal Component Analysis (PCA)
M. Rezaei, R. Boostani and M. Rezaei
Symmetric Matrices and Quadratic Forms
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
LECTURE 09: DISCRIMINANT ANALYSIS
Non-negative Matrix Factorization (NMF)
Presented by Chen-Wei Liu
Emad M. Grais Hakan Erdogan
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Symmetric Matrices and Quadratic Forms
LAB MEETING Speaker : Cheolsun Kim
Presentation transcript:

Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪

2 Outline Introduction Nonnegative Matrix Factorization (NMF) Updating the Modulation Spectrum via NMF Experimental Setup Experimental Results and Discussions Conclusion and Future Work

Introduction NMF is a recently developed method for finding a linear and non- subtractive combination scheme to extract important ingredients that can correspond better with Most of the useful linguistic information is encapsulated in the modulation frequency components between 1 Hz and 16 Hz,with the dominant component centering around 4 Hz. We attempt to refine the features in the magnitude part of modulation spectra (which is always real and non-negative) via the technique of non-negative matrix factorization 3

Nonnegative Matrix Factorization (1/3) Nonnegative matrix factorization (NMF) is a subspace method that approximates data with an additive and linear combination of nonnegative components (or basis vectors) Given a nonnegative data matrix, NMF computes another two nonnegative matrices and such that V ≈ WH –r<< L and r<< M to ensure efficient encoding 4 ≈ X VWH (tall and thin) (short and wide) (basis)(encoding)

Nonnegative Matrix Factorization (2/3) V ≈ WH To find an approximate factorization as V ≈ WH, the cost function is defined as: With an initial (random) guess of W and H, the following multiplicative updating rule is employed to achieve a local minimum of L: and 5

Nonnegative Matrix Factorization (3/3) Procedures: 6

First, the time sequence x[n] for each utterance in the training set is converted to its spectrum x[k] via a 2L point DFT. Since the property of conjugate symmetry, only the first L+1 points of X[k] is reserved, and their magnitude parts (which are always nonnegative) form each column of the data matrix V. Accordingly, if the training set consists of M utterances, then V has M columns. Given the data matrix V and a chosen number r, we obtain the two nonnegative matrices W and H. 7 Updating the Modulation Spectrum via NMF(1/3)

Updating the Modulation Spectrum via NMF(2/3) The fixed W comes directly from the previous step, and the encoding vector h can be obtained via the updating rule. The vector V is a linearcombination of the basis vectors involved in W, which is created via the clean utterances. Therefore we expect that the vector V, representing the new magnitude spectrum, can highlight the important information for speech recognition and alleviate the effect of noise from the original V. A 2L-point inverse DFT is performed on the new modulation spectrum (with the conjugate symmetric last-half part being appended), which consists of the updated magnitude parts and the original phase parts, to obtain the new time sequence. 8

Updating the Modulation Spectrum via NMF(3/3) The basis spectra vectors of the MFCC c1 (r = 10) –Localized and sparse characteristics, which coincide with the fact that NMF often learns a parts-based representation of data –Capable of distilling or emphasizing the lower modulation frequency components of the speech features, which contains more speech information 9 (a) original MFCC c1 (b) MVN-processed MFCC c1

Experimental Setup –Feature type : 39-dimensional MFCC –The number of basis vectors, r, is varied from 5 to 20 –The DFT size :

Experimental Results (1/4) 11

Experimental Results (2/4) NMF : r = 5 NMF + CMVN : r = 15 12

Experimental Results (3/4) The power spectral density (PSD) curves of the feature streams at different signal-to-noise ratios (SNRs): 13 original c1 NMF-processed c1 noise causes significant mismatch in PSD of MFCC NMF reduces the PSD mismatch

Experimental Results (4/4) The power spectral density (PSD) curves of the feature streams at different signal-to-noise ratios (SNRs): 14 MVN-processed c1c1 processed by MVN and NMF MVN reduces the PSD mismatch more in the low frequency region NMF further alleviates the high-frequency PSD mismatch in MVN features

Conclusion and Future Work We have presented a novel use of NMF for deriving noise robust speech features –The basis spectra via NMF correspond well with the intuitive notion of the important components of modulation frequency. –NMF benefits both the plain and MVN-processed MFCC in recognition accuracy As to future work, we envisage the following two directions: (1)To further process the encoding vector H in the mapping process of NMF to give better recognition accuracy (2) To examine if some variants or extensions of NMF, such as probabilistic latent semantic analysis (PLSA), and other compressive sensing methods can further enhance the modulation spectrum 15