Juan Ortega 10/20/09 NTS490. Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their.

Slides:

Advertisements

Similar presentations

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors

Voice Biometric Overview for SfTelephony Meetup March 10, 2011 Dan Miller Opus Research.

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

U of HCOSC 6397 – Lecture 1 #1 U of HCOSC 6397 Lecture 1: Introduction to Biometrics Prof. Ioannis Pavlidis.

Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.

Instructor: Dr. G. Bebis Reza Amayeh Fall 2005

FIT3105 Biometric based authentication and identity management

Introduction to Biometrics Dr. Pushkin Kachroo. New Field Face recognition from computer vision Speaker recognition from signal processing Finger prints.

EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.

Biometrics and Authentication Shivani Kirubanandan.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

Why is ASR Hard? Natural speech is continuous

A PRESENTATION BY SHAMALEE DESHPANDE

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

Biometrics: Voice Recognition

Marjie Rodrigues

Security-Authentication

Natural Language Understanding

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Karthiknathan Srinivasan Sanchit Aggarwal

Introduction to Automatic Speech Recognition

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Speaker Recognition By Afshan Hina.

June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.

Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

Douglas A. Reynolds, PhD Senior Member of Technical Staff

A Talking Elevator, WS2006 UdS, Speaker Recognition 1.

1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.

1 Computational Linguistics Ling 200 Spring 2006.

Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

BIOMETRICS By: Lucas Clay and Tim Myers. WHAT IS IT?  Biometrics are a method of uniquely identifying a person based on physical or behavioral traits.

Voice Recognition All Talk No Walk.

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,

Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Speaker Authentication Qi Li and Biing-Hwang Juang, Pattern Recognition in Speech and Language Processing, Chap 7 Reporter : Chang Chih Hao.

Biometrics Authentication Technology

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.

PRESENTATION ON BIOMETRICS

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Performance Comparison of Speaker and Emotion Recognition

INTRODUCTION TO BIOMATRICS ACCESS CONTROL SYSTEM Prepared by: Jagruti Shrimali Guided by : Prof. Chirag Patel.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #6 Guest Lecture + Some Topics in Biometrics September 12,

Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.

RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.

By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.

By Kyle Bickel. Road Map Biometric Authentication Biometric Factors User Authentication Factors Biometric Techniques Conclusion.

An Introduction to Biometrics

Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.

A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.

BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.

ARTIFICIAL NEURAL NETWORKS

Artificial Intelligence for Speech Recognition

Authentication.

Biometrics Reg: AMP/HNDIT/F/F/E/2013/067.

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

Seminar Presentation on Biometrics

A maximum likelihood estimation and training on the fly approach

Auditory Morphing Weyni Clacken

Presentation transcript:

Juan Ortega 10/20/09 NTS490

Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their voices. Speaker recognizes who is speaking, where as speech recognition recognizes what is being said. Voice recognition is a combination of the two where it uses learned aspects of a speakers voice to determine what is being said.

Speaker verification has co-evolved with the technologies of speech recognition and speech synthesis (TTS) because of the similar characteristics and challenges associated with each Gunnar Fant, a Swedish professor published a model describing the physiological components of acoustic speech production, based on the analysis of x-rays of individuals making specified phonic sounds – Dr. Joseph Perkell used motion x-rays and included the tongue and jaw to expand on Fant’s model. Original speaker recognition systems used the average output of several analog filters to perform matching – often aided by humans.

1976 – Texas Instruments built a prototype system that was tested by the U.S. Air Force and The MITRE Corporation. Mid 1980s – The National Institute of Standards and Technology (NIST) developed the NIST Speech Group to study and promote the use of speech processing techniques. Since 1996 – Under funding from the NSA, the NIST Speech Group has hosted yearly evaluations, the NIST Speaker Recognition Workshop, to foster the continued advancement of the speaker recognition community.

The physiological component of voice recognition is related to the physical shape of an individuals vocal tract, which consists of an airway and the soft tissue cavities from which vocal sounds originate. The acoustic patterns of speech come from the physical characteristics of the airways. Motion of the mouth and pronunciations are the behavioral components of this biometric. This source sound is altered as it travels through the vocal tract, configured differently based on the position of the tongue, lips, mouth, and pharynx.

Speech samples are waveforms with time on the horizontal axis and loudness on the vertical access. The speaker recognition system analyzes the frequency content of the speech and compares characteristics such as the quality, duration, intensity, dynamics, and pitch of the signal.

r eh k ao g n ay z s p iy ch "recognize speech" r eh k ay n ay s b iy ch "wreck a nice beach"

Two major applications of speaker recognition technologies and methodologies exist. Speaker authentication or verification is the task of validating the identity the speaker claims to be. The verification is a 1:1 match where one speaker’s voice is matches against one template (called “voice print” or “voice model”). Speaker identification is the task of determining an unknown speaker’s identity. Identification is a 1:N match where it is compared against N templates.

Text-Dependent require the speaker to provide utterances (speak) of key words or sentences, the same text being used for both training and recognition. Text-Independent is when predetermined key words cannot be used. Human beings recognize speakers irrespective of the content of the utterance. Text-Prompted Methods prompts each user with a new key sentence every time the system is used.

How can speaker recognitions normalize the variation of likelihood values in speaker verification? In order to compensate for the variations, two types of normalization techniques have been tried: parameter domain, and likelihood domain. Adaptation of the reference model as well as the verification threshold for each speaker is indispensable to maintaining a high recognition accuracy over a long period.

Parameter domain Spectral equalization (“blind equalization”) has been confirmed to be effective in reducing linear channel effects and long-term spectral variation. This method is especially effective for text-dependent speaker recognition applications using sufficiently long utterances. Likelihood domain Ratio is the conditional probability of the observed measurements of the utterance given the claimed identity is correct, to the conditional probability of the observed measurements given the speaker is an impostor. Posteriori probability method is calculated by using a set of speakers including the claimed speaker.

1)The quality/duration/loudness/pitch features are extracted from the submitted sample. 2)The extracted sample is compared to the claimed identity and other models. The other-speakers models contain the “states” of a variety of individuals, not including that of the claimed identity. 3)The input voice sample and enrolled models are compared to produce a “likelihood ratio”, indicating the likelihood of the input sample came from the claimed speaker.

How to update speaker models to cope with the gradual changes in people’s voices. It is necessary to build each speaker model based on a small amount of data collected in a few sessions, and then the model must be updated using speech data collected when the system is used. The reference template for each speaker is updated by averaging new utterances and the present template after time registration. These methods have been extended and applied to text-independent and text-prompted speaker verification using HMMs.

Hidden Markov Models (HMMs) are random based model that provides a statistical representation of the sounds produced by the individual. The HMM represents the underlying variations and temporal changes over time found in the speech states using quality/duration/intensity dynamics/pitch characteristics. Guassian Mixture Model (GMM) is a state-mapping model closely related to HMM, often used for “text- independent”. Uses the speaker’s voice to create a number of vector “states” representing the various sound forms. These methods all compare the similarities and differences between the input voice and the stores voice “states” to produce a recognition decision.

 Some companies use voiceprint recognition so people can gain access to information or give authorization without being physically present.  Instead of stepping up to an iris scanner or hand geometry reader, someone can give authorization by making a phone call.  Unfortunately, people can bypass some systems, particularly those that work by phone, with a simple recording of an authorized person's password. That's why some systems use several randomly-chosen voice passwords or use general voiceprints instead of prints for specific words.

 Except for text-promoted systems, speaker recognition are susceptible to spoofing attacks through the use of recorded voice.  Text-dependent systems are less suitable for public use.  Noise in the background can be disruptive, although equalizers may be used to fix this problem.  Text-independent is currently under research, although methods have been proposed calculating the rhythm, speed, modulation, and intonation, based on personality type and parental influence.  Authentication is based on ratio and probability.  Frequent enrollment needs to happen to deal with voice changes.  Someone who is deaf or mute can’t use this type of biometrics.

 All you need is software and a microphone.  Many methods have been proposed: Text-Dependent DTW-Based Methods HMM-Based Methods Text-Independent Long-Term-Statistics-Based Methods VQ-Based Methods Ergodic-HMM-Based Methods Speech-Recognition-Based Methods  Fast authentication.  Give someone else authentication.

Speaker recognition. Retrieved October 20, from Wikipedia web site: Sadoki, Dr. F. (2008). Speaker Recognition. Retrieved October 20, from Scholarpedia web site: W-Based_Methods W-Based_Methods The Speaker Recognition Homepage. Retrieved October 20, from speaker-recognition web site: recognition.org/ recognition.org/ (2006). Speaker Recognition. Retrieved October 20, from biometrics web site: Howstuffworks “How speech recognition works”. Retrieved October 21, from howstuffworks web site: gadgets/speech-recognition.htm/printable gadgets/speech-recognition.htm/printable Wilson. T. Howstuffworks “Voiceprints”. Retrieved October 21, from howstuffworks web site: