ARTIFICIAL NEURAL NETWORKS

ARTIFICIAL NEURAL NETWORKS
SPEAKER VERIFICATION USING ARTIFICIAL NEURAL NETWORKS

What is speech ? Speech refers to the processes associated with the production and perception of sounds used in spoken language. Every speaker has got a characteristic way of speaking .This enables us to differentiate one speaker from other.

Characteristics associated with speech
Frequency domain Loudness Timbre (sound quality) Pitch

Problem statement To create a user recognition system by extracting certain important features of speaker’s voice.

Proposed SOLUTION SPEAKER IDENTIFICATION SPEAKER RECOGNITION
SPEAKER VERIFICATION PROCESS OF DETERMINING WHICH REGISTERED SPEAKER PROVIDES A GIVEN UTTERANCE PROCESS OF ACCEPTING OR REJECTING THE IDENTITY CLAIM OF THE PERSON

SPEAKER RECOGNTION FEATURE EXTRACTION
Extracts a small amount of data from the voice signal that characterizes given speaker. FEATURE VERIFICATION Process to identify the unknown speaker by comparing the extracted features with his/her voice signal.

FEATURE EXTRACTION There are several methods for feature extraction like – Fourier series analysis Fourier transform analysis Mel frequency cepstrum coefficients DSP training techniques

MEL FREQUENCY SPECTRUM
OUR CHOICE: MEL FREQUENCY SPECTRUM Best in terms of feature extraction A large number of coefficients. Speech signal is parametrically represented using MEL- frequency Cepstrum coefficients (MFCC).

STEPS INVOLVED IN FEATURE EXTRACTION
Frame Blocking Windowing FFT spectrum MEL frequency wrapping Cepstrum

BLOCK DIAGRAM FOR FEATURE EXTRACTION

FRAME BLOCKING In this step the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M < N). The first frame consists of the first N samples. The second frame begins M samples after the first frame, and overlaps it by N -M samples and so on. This process continues until all the speech is accounted for within one or more forms.

windowing The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. If we define the window as w (n), 0<n<N -1, where N is the number of samples in each frame, then the result of windowing is the signal, yl (n) = xl (n) w (n) Hamming window is used which has the form,

Fast fourier transform
The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT), which is defined on the set of N samples {xn}, as follow:

Mel frequency wrapping
Human ear is more sensitive to low frequency components of sound So we stretch the low frequency components of the FFT spectrum and shrink the high frequency components of the same This is accomplished by using filters that are linearly placed for frequency less than 1000Hz and are logarithmic for higher frequencies

Mel frequency scale

Mel frequency scale Filter bank has triangular band pass frequency .
The spacing and bandwidth is determined by mel-frequency coefficients. Formula:-

Mel filter bank processing
Apply the bank of filters according Mel scale to the spectrum Each filter output is the sum of its filtered spectral components

cepstrum The mel spectrum obtained above is converted back to time domain This gives us the mel-frequency cepstrum coefficients of the sound wave given as input

Pattern recognition using ann
Feature verification Pattern recognition using ann Coefficients extracted are fed as input to the neural networks. The goal of pattern recognition is to classify objects of interest into one of a number of categories or classes. feature matching : supervised pattern recognition. Concept of Vector Quantization.

Vector quantization Create a training set of feature vectors
VQ is a process of mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword.

Vector quantization

Vector quantization contd..
First step: training phase Second step: finding VQ distortion In the recognition phase, an input utterance of an unknown voice is “vector-quantized” using each trained codebook and the total VQ distortion is computed. The speaker corresponding to the VQ codebook with smallest total distortion is identified as the speaker of the input utterance.

Basic structure of speaker recognition

Problems attached Expressions and volumes
Misspoken or misread prompted phrases Condition of the user Background noises Change in voice due to cold

Future direction The problem can be overcome by training the network under different conditions Can be developed to model a speech to text converter Password recognition

conclusion Speaker recognition is challenging problems and there is still a lot of work that needs to be done in this area. In this seminar, it is demonstrated how a speaker recognition system can be designed by artificial neural network using Mel-Frequency Cepstrum Coefficients matrix of voice as inputs to ANN.

References An automatic speaker Recognition system : author : Christian Cornaz ,Urs Hunkeler. Digital Signal Processing by L.R.Rabiner, R.W.Schafer

Deep H. Thaker 0601106140 Dept. of Instrumentation and Engineering
by Deep H. Thaker Dept. of Instrumentation and Engineering College of Engineering and Technology Bhubaneswar

ARTIFICIAL NEURAL NETWORKS

Similar presentations

Presentation on theme: "ARTIFICIAL NEURAL NETWORKS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ARTIFICIAL NEURAL NETWORKS

Similar presentations

Presentation on theme: "ARTIFICIAL NEURAL NETWORKS"— Presentation transcript:

Similar presentations

About project

Feedback