Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October 1 2003 Ji Gu J.Gu@umail.LeidenUniv.nl.

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

The Frequency Domain Light and the DCT Pierre-Auguste Renoir: La Moulin de la Galette from
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Mel-spectrum computation new_fe_sp.c Presentation by Yu Zhang Oct 1 st,2003 Seminar Speech Recognition.
CMSC Assignment 1 Audio signal processing
Spectral Analysis Goal: Find useful frequency related features
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.
Representing Acoustic Information
Eng. Shady Yehia El-Mashad
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Modeling speech signals and recognizing a speaker.
Fourier series. The frequency domain It is sometimes preferable to work in the frequency domain rather than time –Some mathematical operations are easier.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter 19.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Basics of Neural Networks Neural Network Topologies.
Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Digital Image Processing Chapter 4 Image Enhancement in the Frequency Domain Part I.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Digital Image Processing, 2nd ed. © 2002 R. C. Gonzalez & R. E. Woods Chapter 4 Image Enhancement in the Frequency Domain Chapter.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Fourier and Wavelet Transformations Michael J. Watts
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Chapter 2. Characteristics of Signal ※ Signal : transmission of information The quality of the information depends on proper selection of a measurement.
Digital Image Processing Lecture 7: Image Enhancement in Frequency Domain-I Naveed Ejaz.
PATTERN COMPARISON TECHNIQUES
Discrete Fourier Transform (DFT)
Section II Digital Signal Processing ES & BM.
Ch. 5: Speech Recognition
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Fourier and Wavelet Transformations
Speech Processing Speech Recognition
Stack Memory 2 (also called Call Stack)
Discrete Cosine Transform (DCT)
Image Enhancement in the
Discrete-Time Complex
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
1-D DISCRETE COSINE TRANSFORM DCT
Digital Systems: Hardware Organization and Design
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Mark Hasegawa-Johnson 10/2/2018
Homomorphic Speech Processing
Exponential and Logarithmic Functions
More Properties of Logarithms
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October 1 2003 Ji Gu J.Gu@umail.LeidenUniv.nl

Mel-spectrum to Mel-cepstrum Computation Now we have known: The FFT processing step converts each frame of N samples from the time domain into the frequency domain. The result of the Mel-spectrum computation is:

Mel-spectrum to Mel-cepstrum Computation To compute Mel-cepstrum: We convert the log Mel-spectrum back to time domain using the Discrete Cosine Transform (DCT). (Because the Mel-spectrum coefficients and their logarithm are real numbers) The result obtained is called the Mel Frequency Cepstrum Coefficients (MFCC).

Mel-spectrum to Mel-cepstrum Computation Therefore : A DCT is applied to the natural logarithm of the Mel-spectrum to obtain the Mel-cepstrum,c[n] as: C is the number of the cepstral coefficients

Mel-spectrum to Mel-cepstrum Computation In SPHINX III Signal Processing Front End Specification First, the Cosine section of c[n] is computed: int32 fe_compute_melcosine(melfb_t *MEL_FB) { float period, freq; int32 i,j; period = (float)2*MEL_FB->num_filters; if ((MEL_FB->mel_cosine = (float **) fe_create_2d(MEL_FB->num_cepstra,MEL_FB->num_filters, sizeof(float)))==NULL){ fprintf(stderr,"memory alloc failed in fe_compute_melcosine()\n...exiting\n"); exit(0); }

Mel-spectrum to Mel-cepstrum Computation for (i=0; i<MEL_FB->num_cepstra; i++) { freq = 2*(float)M_PI*(float)i/period; for (j=0;j< MEL_FB->num_filters;j++) MEL_FB->mel_cosine[i][j] = (float)cos((double)(freq*(j+0.5))); } return(0); Second, a Cosine transform of the Logarithm of the Mel-spectrum:

Mel-spectrum to Mel-cepstrum Computation void fe_mel_cep(fe_t *FE, double *mfspec, double *mfcep) { int32 i,j; /* static int first_run=1; */ /* unreferenced variable */ int32 period; float beta; period = FE->MEL_FB->num_filters; for (i=0;i<FE->MEL_FB->num_filters; ++i) if (mfspec[i]>0) mfspec[i] = log(mfspec[i]); else mfspec[i] = -1.0e+5; }

Mel-spectrum to Mel-cepstrum Computation for (i=0; i< FE->NUM_CEPSTRA; ++i){ mfcep[i] = 0; for (j=0;j<FE->MEL_FB->num_filters; j++){ if (j==0) beta = 0.5; else beta = 1.0; mfcep[i] += beta*mfspec[j]*FE->MEL_FB->mel_cosine[i][j]; } mfcep[i] /= (float)period; return;

Mel-spectrum to Mel-cepstrum Computation By applying the procedure described above: For each speech frame, a set of mel-frequency cepstrum coefficients(MFCC) is computed. This set of coefficients is called an acoustic vector which represents the phonetically important characteristics of speech and is very useful for further analysis and processing in Speech Recognition. End