DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.

Slides:



Advertisements
Similar presentations
Digital Filter Banks The digital filter bank is set of bandpass filters with either a common input or a summed output An M-band analysis filter bank is.
Advertisements

QR Code Recognition Based On Image Processing
Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
University of Ioannina - Department of Computer Science Wavelets and Multiresolution Processing (Background) Christophoros Nikou Digital.
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Chapter 8 FIR Filter Design
Lecture 17 spectral analysis and power spectra. Part 1 What does a filter do to the spectrum of a time series?
A New Scheme for Progressive Image Transmission and Flexible Reconstruction with DCT Minqing Xing and Xue Dong Yang Department of Computer Science University.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Speech Recognition in Noise
Input image Output image Transform equation All pixels Transform equation.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
1 1.1 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra SYSTEMS OF LINEAR EQUATIONS.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Image Recognition and Processing Using Artificial Neural Network Md. Iqbal Quraishi, J Pal Choudhury and Mallika De, IEEE.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Digital Image Processing Chapter # 4 Image Enhancement in Frequency Domain Digital Image Processing Chapter # 4 Image Enhancement in Frequency Domain.
Implementing a Speech Recognition System on a GPU using CUDA
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Digital Image Processing CSC331 Image Enhancement 1.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Fast Fourier Transform & Assignment 2
Hsin-Ju Hsieh 謝欣汝, Wen-hsiang Tu 杜文祥, Jeih-weih Hung 洪志偉 暨南國際大學電機工程學系 報告者:汪逸婷 2012/03/20.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
Lecture#10 Spectrum Estimation
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
Summary of Widowed Fourier Series Method for Calculating FIR Filter Coefficients Step 1: Specify ‘ideal’ or desired frequency response of filter Step 2:
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
What is filter ? A filter is a circuit that passes certain frequencies and rejects all others. The passband is the range of frequencies allowed through.
Digital Image Processing Lecture 8: Fourier Transform Prof. Charlene Tsai.
An improved SVD-based watermarking scheme using human visual characteristics Chih-Chin Lai Department of Electrical Engineering, National University of.
Linear Equations in Linear Algebra
Fourier series With coefficients:.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Wavelets : Introduction and Examples
Filter Design by Windowing
4. Image Enhancement in Frequency Domain
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.
Chapter 7 Finite Impulse Response(FIR) Filter Design
Chapter 8 The Discrete Fourier Transform
Linear Equations in Linear Algebra
Source: Pattern Recognition Letters, Article In Press, 2007
한국지진공학회 추계학술발표회 IMPROVED SENSITIVITY METHOD FOR NATURAL FREQUENCY AND MODE SHAPE OF DAMPED SYSTEM Hong-Ki Jo1), *Man-Gi Ko2) and In-Won Lee3) 1) Graduate.
INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)
Chapter 7 Finite Impulse Response(FIR) Filter Design
Presented by Chen-Wei Liu
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Even Discrete Cosine Transform The Chinese University of Hong Kong
Presentation transcript:

DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science & Information Engineering National Taiwan Normal University

Outline Introduction Discrete Cosine Transform The New DTC-related Temporal Filtering Techniques The Recognition Experimental Results and Discussions Conclusion and Future Works

Introduction In this paper, we explore the various properties of cepstral time coefficients (CTC) in speech recognition, and then propose several methods to refine the CTC construction process. It is found that CTC are the filtered version of mel-frequency cepstral coefficients (MFCC), and the used filters are from the discrete cosine transform (DCT) matrix. We modify these DCT-based filters by windowing, removing DC gain, and varying the filter length.

Discrete Cosine Transform For a real-valued N-point sequence , its DFT and DCT are obtained by the following two equations, respectively:

The New DTC-related Temporal Filtering Techniques(1/5) For an MFCC feature sequence , the second and third CTC features from are represented as:

The New DTC-related Temporal Filtering Techniques(2/5) The first four rows for an N ×N DCT matrix (N = 15 here) are shown in Fig. 2. The second and third CTC features in eq. (5) can be viewed as the filtered version of MFCC features, in which the two temporal filters, and in eqs. (8) and (9), are used, respectively. Figs. 3 and 4 show the frequency responses of the two DCT filters.

The New DTC-related Temporal Filtering Techniques(3/5) The two filters and used in deriving CTC possibly have two problems: 1) The relatively high side-lobes make and emphasize the undesired non-speech components. 2) The inappropriate passband location and width of and possibly make them filter out some speech components. We try to use several well-known window functions, including Hamming, Hanning, Blackman and rectangular windows. Note that the rectangular window used here is: By multiplying either window function with each of the two original DCT-based filters, we create two new filters as

The New DTC-related Temporal Filtering Techniques(4/5) We find that the new becomes a low-pass filter, and thus it will retain the DC component of a feature stream which often contains the channel mismatch and possibly degrades the recognition performance. We convolve with a simple high-pass filter as follows:

The New DTC-related Temporal Filtering Techniques(5/5) In order to further tune the main-lobe (passband) width, here we propose to vary the filter length N in , and in eqs. (11), (12) and (13).

The Recognition Experimental Results and Discussions(1/3) The experiment results of MFCC and CTC Using CTC is more helpful in handling the nonstationary noise cases (Set B) possibly because the DCT-based filters attenuate the higher modulation frequency components caused by non-stationary noise.

The Recognition Experimental Results and Discussions(2/3) The Experiment Results of the Proposed New DCT-based Filtering Approach

The Recognition Experimental Results and Discussions(3/3) The experimental results for these windowed filters with different filter length (N=9, 11, 13, 15, and 17)

Conclusion and Future Works We find that some problems exist in the original DCT-based filters, including the high side-lobes and inappropriate passband locations in the frequency response. Then we present several directions to solve or alleviate the above problems, including “windowing the filter coefficients”, ”removing DC gain” and “varying the filter length”. In the future, we will work along the following directions: 1) Creating adaptive DCT-based filters 2) Combining the various DCT-based filter outputs linearly or nonlinearly