Mel-spectrum computation new_fe_sp.c Presentation by Yu Zhang Oct 1 st,2003 Seminar Speech Recognition.

Slides:



Advertisements
Similar presentations
Db Decibels Decibels. "A" Weighting Attenuates the lower frequencies to approximate the response of the human ear, which is most sensitive to frequencies.
Advertisements

Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
©2008 The McGraw-Hill Companies, Inc. All rights reserved. Electronics Principles & Applications Seventh Edition DSP Audio Examples (Ch. 16 supplement)
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Image Processing Lecture 4
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Electronics Principles & Applications Sixth Edition DSP Audio Examples (Ch. 16 supplement) ©2003 Glencoe/McGraw-Hill Charles A. Schuler.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Reminder Fourier Basis: t  [0,1] nZnZ Fourier Series: Fourier Coefficient:
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Transforms: Basis to Basis Normal Basis Hadamard Basis Basis functions Method to find coefficients (“Transform”) Inverse Transform.
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Basics of Signal Processing. frequency = 1/T  speed of sound × T, where T is a period sine wave period (frequency) amplitude phase.
Representing Acoustic Information
LE 460 L Acoustics and Experimental Phonetics L-13
Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.
DTFT And Fourier Transform
Basics of Signal Processing. SIGNALSOURCE RECEIVER describe waves in terms of their significant features understand the way the waves originate effect.
GG 313 Lecture 26 11/29/05 Sampling Theorem Transfer Functions.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Acoustics/Psychoacoustics Huber Ch. 2 Sound and Hearing.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Introduction to SOUND.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
CH. 21 Musical Sounds. Musical Tones have three main characteristics 1)Pitch 2) Loudness 3)Quality.
Sound in everyday life Pitch: related to frequency. Audible range: about 20 Hz to 20,000 Hz; Ultrasound: above 20,000 Hz; Infrasound: below 20 Hz Loudness:
Part I: Image Transforms DIGITAL IMAGE PROCESSING.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
CEPSTRAL ANALYSIS Cepstral analysis synthesis on the mel frequency scale, and an adaptative algorithm for it. Cecilia Caruncho Llaguno.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Properties of Sound. Pitch Loudness Speed in Various Media.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Sound “A sound man…”. Frequency The frequency of a sound wave is perceived as the pitch (note) –High frequency → high pitch → high note –“Middle C” has.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
I can’t believe my ears !. Subjective Loudness Pitch Tone quality Objective Amplitude Frequency Spectrum.
Digital Audio I. Acknowledgement Some part of this lecture note has been taken from multimedia course made by Asst.Prof.Dr. William Bares and from Paul.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
15.1 Properties of Sound. Chapter 15 Objectives  Explain how the pitch, loudness, and speed of sound are related to properties of waves.  Describe how.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Noise & Sound Graeme Murphy – National Brand Manager, Industrial Equipment.
PATTERN COMPARISON TECHNIQUES
CS 591 S1 – Computational Audio
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
Spoken Digit Recognition
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
7.2 Even and Odd Fourier Transforms phase of signal frequencies
Digital Systems: Hardware Organization and Design
Homomorphic Speech Processing
Processing Sound Ranges part 3
Chapter 15: Wavelets (i) Fourier spectrum provides all the frequencies
Wavelet transform application – edge detection
Presentation transcript:

Mel-spectrum computation new_fe_sp.c Presentation by Yu Zhang Oct 1 st,2003 Seminar Speech Recognition

We know that human ears, for frequencies lower than 1 kHz, hears tones with a linear scale instead of logarithmic scale for the frequencies higher that 1 kHz. The mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. The voice signals have most of their energy in the low frequencies. It is also very natural to use a mel-spaced filter bank showing the above characteristics. Mel-frequency Wrapping

line 165 of new_fe_sp.c float32 fe_mel(float32 x) { return( * ( float32 ) log10 (1.0 + x / ) ); } float32 fe_melinv(float32 x) { return( * ( ( float32 ) pow (10.0, x / ) ) ); } Mel-frequency Wrapping Use the following approximate formula to compute the mels for a given frequency f in Hz:

For each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the ‘mel’ scale. The pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels. The mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz.

Figure 1: Power Spectrum without Mel-frequency Wrapping Figure 2: Mel-frequency Wrapping of Power Spectrum Mel-frequency Wrapping Figure 1 Figure 2 Considering the full image with the mel frequency wrapping set, there is less imformation than the one without the mel frequency.But instead if we looking in details,we see that the image with the mel frequency wrapping keeps the low frequences and removes some informaiton. To summarize,the Mel Frequency wrapping set allows us to keep only the part of useful information.

S[k] is the power spectrum N is the length of the Discrete Fourier Transform L is total number of Triangular Mel weighting filters. The Mel spectrum is computed by multiplying the Power Spectrum by each of the Triangular Mel Weighting filters and integrating the result. = 0,1,…,L-1 Mel spectrum

line 62 in new_fe_sp.c int32 fe_build_melfilters(melfb_t *MEL_FB) { //estimate filter coefficients MEL_FB->filter_coeffs = (float32 **)fe_create_2d(MEL_FB->num_filters, MEL_FB->fft_size, sizeof(float32)); MEL_FB->left_apex = (float32 *) calloc(MEL_FB->num_filters,sizeof(float32)); MEL_FB->width = (int32 *) calloc(MEL_FB->num_filters,sizeof(int32)); filt_edge = (float32 *) calloc(MEL_FB->num_filters+2,sizeof(float32)); … melmax = fe_mel(MEL_FB->upper_filt_freq); melmin = fe_mel(MEL_FB->lower_filt_freq); for (i=0;i num_filters+1; ++i){ filt_edge[i] = fe_melinv(i*dmelbw + melmin); } … for (whichfilt=0;whichfilt num_filters; ++whichfilt) { //Building the triangular mel weighting filters … } … } Building the Triangular Mel Weighting filters

line 156 in new_fe_sp.c void fe_mel_spec(fe_t *FE, float64 *spec, float64 *mfspec) { int32 whichfilt, start, i; float32 dfreq; dfreq = FE->SAMPLING_RATE/(float32)FE->FFT_SIZE; for (whichfilt = 0; whichfilt MEL_FB->num_filters; whichfilt++){ start = (int32)(FE->MEL_FB->left_apex[whichfilt]/dfreq) + 1; mfspec[whichfilt] = 0; for (i=0; i MEL_FB->width[whichfilt]; i++) mfspec[whichfilt] +=FE->MEL_FB->filter_coeffs[whichfilt][i]*spec[start+i]; } } /* *FE is the triangular mel weighting filter *spec is the power spectrum *mfspec is the mel spectrum variables marked in red are coefficients of mel weighting filter */ Building the Mel spectrum l=0,1,…L-1

REFERENCES (1)SPHINX III Signal Processing Front End Specification 31 August 1999, Michael Seltzer CMU Speech Group (2) Digital Signal Processing Mini-Project “An Automatic Speaker Recognition System” Minh N. Do, Audio Visual Communications Laboratory Swiss Federal Institute of Technology, Lausanne, (3) Project of Digital Signal Processing - AN AUTOMATIC SPEAKER RECOGNITION SYSTEM Swati Rastogi (DSC) David Mayor (DSC)