Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

Similar presentations


Presentation on theme: "BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared."— Presentation transcript:

1 BIOMETRICS VOICE RECOGNITION

2 Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared to a template, used in cases to identify specific people by certain characteristics. Biometrics are used to identify the input sample when compared to a template, used in cases to identify specific people by certain characteristics. Possession based Possession based Knowledge based Knowledge based

3 Characteristics BIOMETRICS PHSYIOLOGICALBEHAVIORAL

4

5 Physiological are related to the shape of the body. The oldest traits, that have been used for more than 100 years, are fingerprints. Other examples are face recognition, hand geometry and iris recognition. Physiological are related to the shape of the body. The oldest traits, that have been used for more than 100 years, are fingerprints. Other examples are face recognition, hand geometry and iris recognition.fingerprintsface recognitionhand geometryiris recognitionfingerprintsface recognitionhand geometryiris recognition

6 Behavioral are related to the behavior of a person. The first characteristic to be used, still widely used today, is the signature. More modern approaches are the study of keystroke dynamics and of voice Behavioral are related to the behavior of a person. The first characteristic to be used, still widely used today, is the signature. More modern approaches are the study of keystroke dynamics and of voicesignaturekeystroke dynamicsvoicesignaturekeystroke dynamicsvoice Strictly speaking, voice is also a physiological trait because every person has a different pitch, but voice recognition is mainly based on the study of the way a person speaks, commonly classified as behavioral. Strictly speaking, voice is also a physiological trait because every person has a different pitch, but voice recognition is mainly based on the study of the way a person speaks, commonly classified as behavioral.pitch

7 Introduction Speaker recognition has a history dating back some four decades and uses the acoustic features of speech that have been found to differ between individuals. Speaker recognition has a history dating back some four decades and uses the acoustic features of speech that have been found to differ between individuals.

8 There is a difference between speaker recognition (recognizing who is speaking) and speech recognition (recognizing what is being said). These two terms are frequently confused, as is voice recognition. There is a difference between speaker recognition (recognizing who is speaking) and speech recognition (recognizing what is being said). These two terms are frequently confused, as is voice recognition. speech recognition speech recognition Voice recognition is a synonym for speaker, and thus not speech, recognition. In addition, there is a difference between the act of authentication (commonly referred to as speaker verification or speaker authentication) and identification. Voice recognition is a synonym for speaker, and thus not speech, recognition. In addition, there is a difference between the act of authentication (commonly referred to as speaker verification or speaker authentication) and identification.

9 If the speaker claims to be of a certain identity and the voice is used to verify this claim this is called verification or authentication. On the other hand, identification is the task of determining an unknown speaker's identity. If the speaker claims to be of a certain identity and the voice is used to verify this claim this is called verification or authentication. On the other hand, identification is the task of determining an unknown speaker's identity. In a sense speaker verification is a 1:1 match where one speaker's voice is matched to one template (also called a "voice print") whereas speaker identification is a 1:N match where the voice is compared against N templates. In a sense speaker verification is a 1:1 match where one speaker's voice is matched to one template (also called a "voice print") whereas speaker identification is a 1:N match where the voice is compared against N templates.

10 Variants of speaker recognition Each speaker recognition system has two phases: Enrollment and verification. Each speaker recognition system has two phases: Enrollment and verification. ENROLLMENT ENROLLMENT During enrollment, the speaker's voice is recorded and typically a number of features are extracted to form a voice print, template, or model. During enrollment, the speaker's voice is recorded and typically a number of features are extracted to form a voice print, template, or model.

11 Speech Samples are waveforms Speech Samples are waveforms Time on horizontal axis and Loudness on vertical axis Time on horizontal axis and Loudness on vertical axis Speaker recognition system analyses frequency content Speaker recognition system analyses frequency content Compares characteristics such as the quality, duration intensity dynamic and pitch of the signal Compares characteristics such as the quality, duration intensity dynamic and pitch of the signal

12 . In the verification phase, a speech sample or "utterance" is compared against a previously created voice print.. In the verification phase, a speech sample or "utterance" is compared against a previously created voice print.

13

14 Front-end processing - the "signal processing" part, which converts the sampled speech signal into set of feature vectors, which characterize the properties of speech that can separate different speakers. Front-end processing is performed both in training- and recognition phases. Front-end processing - the "signal processing" part, which converts the sampled speech signal into set of feature vectors, which characterize the properties of speech that can separate different speakers. Front-end processing is performed both in training- and recognition phases. Speaker modeling - this part performs a reduction of feature data by modeling the distributions of the feature vectors. Speaker modeling - this part performs a reduction of feature data by modeling the distributions of the feature vectors.

15 Speaker database - the speaker models are stored here. Speaker database - the speaker models are stored here. Decision logic - makes the final decision about the identity of the speaker by comparing unknown feature vectors to all models in the database and selecting the best matching model. Decision logic - makes the final decision about the identity of the speaker by comparing unknown feature vectors to all models in the database and selecting the best matching model.

16 Speaker recognition systems fall into two categories: text-dependent and text- independent. Speaker recognition systems fall into two categories: text-dependent and text- independent. If the text is same for enrollment and verification this is called text-dependent recognition If the text is same for enrollment and verification this is called text-dependent recognition In a text-dependent system, prompts can either be common across all speakers (e.g.: a common pass phrase) or unique In a text-dependent system, prompts can either be common across all speakers (e.g.: a common pass phrase) or unique In addition, the use of shared-secrets (e.g.: passwords and PINs) or knowledge-based information) can be employed in order to create a multi-factor authentication scenario. In addition, the use of shared-secrets (e.g.: passwords and PINs) or knowledge-based information) can be employed in order to create a multi-factor authentication scenario.

17 Text-independent systems are most often used for speaker identification as they require very little if any cooperation by the speaker. Text-independent systems are most often used for speaker identification as they require very little if any cooperation by the speaker. In this case the text during enrollment and test is different. In fact, the enrollment may happen without the user's knowledge, as in the case for many forensic applications. In this case the text during enrollment and test is different. In fact, the enrollment may happen without the user's knowledge, as in the case for many forensic applications. As text-independent technologies do not compare what was said at enrollment and verification, verification applications tend to also employ speech recognition to determine what the user is saying at the point of authentication. As text-independent technologies do not compare what was said at enrollment and verification, verification applications tend to also employ speech recognition to determine what the user is saying at the point of authentication.speech recognitionspeech recognition

18 Speaker Verification and Speaker Recognition

19 Erorrs False Match Ratio(FMR) False Match Ratio(FMR) False Non-match Rate(FNMR) False Non-match Rate(FNMR) Failure To Enroll Rate Failure To Enroll Rate

20 FMR System gives false +ve matching a user biometrics with another user's biometrics. Type 1 error System gives false +ve matching a user biometrics with another user's biometrics. Type 1 error Occurs when two people have high degree of similarity Occurs when two people have high degree of similarity It may used to eliminate the non matches. And continue the process again. It may used to eliminate the non matches. And continue the process again.

21 FNR User’s templates is matched with the enrolled templates and an incorrect decision of non match is made. Type 2 error User’s templates is matched with the enrolled templates and an incorrect decision of non match is made. Type 2 error Due to environment, aging, sickness. Due to environment, aging, sickness.

22 FER Biometric data of some user may not be clear. Biometric data of some user may not be clear.

23 Technology The various technologies used to process and store voice prints include frequency estimation, hidden Markov models, gaussian mixture models, pattern matching algorithms, neural networks, matrix representation and decision trees. Some systems also use "anti- speaker" techniques, such as cohort models, and world models. The various technologies used to process and store voice prints include frequency estimation, hidden Markov models, gaussian mixture models, pattern matching algorithms, neural networks, matrix representation and decision trees. Some systems also use "anti- speaker" techniques, such as cohort models, and world models.frequency estimationhidden Markov models neural networksmatrix representation decision treesfrequency estimationhidden Markov models neural networksmatrix representation decision trees

24 VQ Speaker Verification Speech Feature Extraction

25 Mel Frequency Cepstral Coefficients

26 Cepstral Coefficients Power of the triangular filter = summarized Power of the triangular filter = summarized Log calculated Log calculated Convert them to time domain using the Discrete Cosine Transform (DCT) Convert them to time domain using the Discrete Cosine Transform (DCT) Result is called the mel frequency cepstral coefficients (MFCC). Result is called the mel frequency cepstral coefficients (MFCC).

27 Verification Threshold Threshold Cohort Speakers Cohort Speakers Ratio Ratio

28 Speaker Verification and Speaker Recognition Accessing confidential information areas Accessing confidential information areas Access to remote computers Access to remote computers Voice dialing Voice dialing Banking by telephone Banking by telephone Telephone shopping Telephone shopping Database access services Database access services Information services Information services Voice mail Voice mail PIN code for your ATM PIN code for your ATM


Download ppt "BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared."

Similar presentations


Ads by Google