Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Similar presentations


Presentation on theme: "Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )"— Presentation transcript:

1 Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

2 Speaker Recognition Introduction What is Speaker Recognition? Speaker Recognition Words “Who are you?” Speech Signal A process that automatically recognizes, who is speaking on the basis of individual information included in the speech waves

3 Speaker Recognition The goal of this project is to build a simple, yet complete and representative ‘speaker recognition system ‘. The system should be able to identify speakers based on the different voice characteristics of each of the known speakers. This identification should be accomplished regardless of the sentence spoken (Text independent). Speaker Recognition System Goals

4 Speaker Recognition Basic Structure of Speaker Recognition System Speaker Identification / Speaker Verification

5 Speaker Recognition Principle of speaker Recognition system Introduction All speaker Recognition systems have to serve two distinguished phases. Enrollment or Training phase Testing phase In training phase each registered speaker has to provide samples of their speech so that the system can build a reference model for that speaker In testing the input speech is matched with stored reference model(s) and recognition decision is made

6 Speaker Recognition Basic structure of speaker Recognition system Feature Extraction / Feature Matching

7 Speaker Recognition Windowing the frames minimize the signal discontinuities at the beg & end of each frame Windowing minimize spectral distortion to taper the signal to zero at beg. & end of each frame. y[n]=x[n]w[n] Typically Hamming window is used which has the FFT Cosine Transform (Mel Cepstrum) Windowing Mel freq. Wrapping Mel freq. Wrapping Cepstrum Continuous signal is blocked into frames of N samples. 1 st fram consists of N samples 2 nd frame begins M samples after the 1 st & overlap it N-M samples and so on Typically N=256(radix 2 FFT), M=100 Mel cepstrum Frame Blocking Frame Blocking Fourier Transform Fourier Transform Mel spectrum MFCC Processor Block diagram

8 Speaker Recognition Speech Production A Convolution Process Speech can be modeled as convolution between Glottal exitation source g[n] & A vocal tract impulse response v[n] y[n] =g[n]*v[n]

9 Speaker Recognition It is believed that vocal tract characterstics are important to speech & speaker recognition. We would like to separate out this filtered response. Cepstrum does this & converts multiplication (convolution in time) Y( )=g( )v( ) to sum Y ~ ( )=log[g( )]+log[v( )] Cepstrum A transformation

10 Speaker Recognition Mel Cepstrum Mimicing the behaviour of human ear

11 Speaker Recognition Triangular shaped filters emphasize center frequency and span to the next center frequency. Thus for each tone with actual freq. in Hz. a subjective pitch is measured on Mel scale mel(f)= 2595*log 10 (1+f / 700) (Fant’s expresion) Mel filter bank linear spacing below 1kHz, log. Scale above 1kHz

12 Speaker Recognition Part 2 Speaker Verification Part 2 Speaker Verification

13 Speaker Recognition Clasification of objects of interest into patterns or acoustic vectors extracted from input speech Since the classification is applied on extracted features, the process can also be reffered to as feature matching Various feature maching techniques DTW,HMM & VQ etc Vector Quantization is a process of mapping vectors from a large vector space to a small number of regions in space. Each region is called a cluster and is represented by its center called a ‘codeword’. The collection of all the ‘codewords’ is called a codebook. Speaker Verification Feature Matching

14 Speaker Recognition Vector Quantization The codebook

15 Speaker Recognition Vector Quantisation (The LBG algorithm)


Download ppt "Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )"

Similar presentations


Ads by Google