Download presentation
1
A PRESENTATION BY SHAMALEE DESHPANDE
SPEAKER RECOGNITION A PRESENTATION BY SHAMALEE DESHPANDE
2
INTRODUCTION Speaker Recognition * Automatically recognizing speaker
* Uses individual information from the speaker’s speech waves
3
INTRODUCTION Two Approaches Text-Dependant Recognition
Text-Independent Recognition
4
INTRODUCTION Two Approaches Text-Dependant Recognition
*Use of keywords or sentences having the same text for the templates and the recognition Text-Independent Recognition
5
INTRODUCTION Text-Dependant Recognition Text-Independent Recognition
Two Approaches Text-Dependant Recognition Text-Independent Recognition * Does not rely on a specific text being spoken.
6
INTRODUCTION Classes of Sound: Voiced, unvoiced, Plosive
Production of Pitch Frequency and Formants Glottal Waveform
7
BLOCK DIAGRAM OF A SPEAKER RECOGNITION SYSTEM
8
DESIRABLE ATTRIBUTES OF A SPEAKER RECOGNITION SYS
Feature should occur naturally and frequently in speech Easily measurable Doesn’t change over time or be affected by speakers health Isn’t affected by background noise Not be subject to mimicry
9
SOURCES OF VARIABILITY IN SPEECH
Phonetic Identity Two samples may correspond to different phonetic segments. E.g. Vowel and fricative Pitch Pitch, other features like breathiness and amplitude can be varied Speaker Differences due to source physiology, emotions Microphone Environment
10
Possible Acoustic Parameters
* Formant Frequencies * LPC * Pitch * Nasal Co articulation * Gain
11
COMMON SPEAKER RECOGNITION TECHNIQUES
DISCRETE FOURIER TRANSFORM LINEAR PREDICTIVE CODING CEPSTRAL ANALYSIS DYNAMIC TIME WARPING HIDDEN MARKOV MODELS
12
DISCRETE / FAST FOURIER TRANSFORM
Changes time domain signals into freq domain signal representations Enables reduced complexity for processor Read N speech samples from input Append N-L zeroes to the input data Calculation of DFT Windowing
13
LINEAR PREDICTIVE CODING
TUBE Vocal tract BUZZER Glottal excitation Characterized by intensity and pitch Characterized by formants LPC model of the speech producing organs of the body
14
CEPSTRAL ANALYSIS Dis-adv of DFT/FFT is that formant freqs may shift the pitch or overlap it In Cepstral analysis, formants are completely removed from the spectrum Defined as Fourier Transform of the Log of the power spectrum S(n) = p(n) * v(n) X(n) = w(n) * s(n) S’(w) = p’(w) * v’(w) Fourier Transform Log S’(w)=log p’(w) + log v’(w) C(q)= log S’(q) = log p’(q) + log v’(q) Q – quefrency , C(q) – complex cepstrum
15
CEPSTRAL ANALYSIS Window DFT LOG IDFT Speech Cepstrum
16
DYNAMIC TIME WARPING Incoming speech is usually compared frame by frame with stored template Achieved via a pair wise comparison of feature vectors from each sequence Dis Adv – variation in length of corresponding phonemes DTW takes into account non linear relation between lengths of the two signals Used as a matching algorithm Example DTW grid
17
HIDDEN MARKOV MODELS Speech signal is identified during search process rather than explicitly Comprises of – Hidden Markov Chain representing temporal variability Observable process representing spectral variability Portrayed as stochastic pair (X,Y) HMM is a Finite State Machine where a Probability Density Function p(x|s) is associated with each state s
18
FUTURE RESEARCH To extract and apply all levels and information from the speech signal conveying speaker identity Acoustic – use spectral features conveying vocal tract information Prosodic - use features derived from pitch, energy tracks to classify information Phonetic – use phone sequences to characterize speaker specific pronunciations Idiolect – use words to characterize user specific word patterns Linguistic – use linguistic patterns to characterize speaker specific conversation style
19
APPLICATIONS Access Control- physical facilities, computer networks and websites PC Login and Password Reset Secured Transactions – remote banking and online credit card purchase authentication Time Attendance - workplaces Law Enforcement – forensics, parole
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.