Download presentation
Presentation is loading. Please wait.
Published byAshlyn Thornton Modified over 8 years ago
1
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
2
Speaker Recognition Introduction What is Speaker Recognition? Speaker Recognition Words “Who are you?” Speech Signal A process that automatically recognizes, who is speaking on the basis of individual information included in the speech waves
3
Speaker Recognition The goal of this project is to build a simple, yet complete and representative ‘speaker recognition system ‘. The system should be able to identify speakers based on the different voice characteristics of each of the known speakers. This identification should be accomplished regardless of the sentence spoken (Text independent). Speaker Recognition System Goals
4
Speaker Recognition Basic Structure of Speaker Recognition System Speaker Identification / Speaker Verification
5
Speaker Recognition Principle of speaker Recognition system Introduction All speaker Recognition systems have to serve two distinguished phases. Enrollment or Training phase Testing phase In training phase each registered speaker has to provide samples of their speech so that the system can build a reference model for that speaker In testing the input speech is matched with stored reference model(s) and recognition decision is made
6
Speaker Recognition Basic structure of speaker Recognition system Feature Extraction / Feature Matching
7
Speaker Recognition Windowing the frames minimize the signal discontinuities at the beg & end of each frame Windowing minimize spectral distortion to taper the signal to zero at beg. & end of each frame. y[n]=x[n]w[n] Typically Hamming window is used which has the FFT Cosine Transform (Mel Cepstrum) Windowing Mel freq. Wrapping Mel freq. Wrapping Cepstrum Continuous signal is blocked into frames of N samples. 1 st fram consists of N samples 2 nd frame begins M samples after the 1 st & overlap it N-M samples and so on Typically N=256(radix 2 FFT), M=100 Mel cepstrum Frame Blocking Frame Blocking Fourier Transform Fourier Transform Mel spectrum MFCC Processor Block diagram
8
Speaker Recognition Speech Production A Convolution Process Speech can be modeled as convolution between Glottal exitation source g[n] & A vocal tract impulse response v[n] y[n] =g[n]*v[n]
9
Speaker Recognition It is believed that vocal tract characterstics are important to speech & speaker recognition. We would like to separate out this filtered response. Cepstrum does this & converts multiplication (convolution in time) Y( )=g( )v( ) to sum Y ~ ( )=log[g( )]+log[v( )] Cepstrum A transformation
10
Speaker Recognition Mel Cepstrum Mimicing the behaviour of human ear
11
Speaker Recognition Triangular shaped filters emphasize center frequency and span to the next center frequency. Thus for each tone with actual freq. in Hz. a subjective pitch is measured on Mel scale mel(f)= 2595*log 10 (1+f / 700) (Fant’s expresion) Mel filter bank linear spacing below 1kHz, log. Scale above 1kHz
12
Speaker Recognition Part 2 Speaker Verification Part 2 Speaker Verification
13
Speaker Recognition Clasification of objects of interest into patterns or acoustic vectors extracted from input speech Since the classification is applied on extracted features, the process can also be reffered to as feature matching Various feature maching techniques DTW,HMM & VQ etc Vector Quantization is a process of mapping vectors from a large vector space to a small number of regions in space. Each region is called a cluster and is represented by its center called a ‘codeword’. The collection of all the ‘codewords’ is called a codebook. Speaker Verification Feature Matching
14
Speaker Recognition Vector Quantization The codebook
15
Speaker Recognition Vector Quantisation (The LBG algorithm)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.