1 Bioinformatic Voice Applications: Speaker Recognition and Verification Andrew Rosenberg Biometric Seminar Day August 23, 2010.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Linear Regression.
Unsupervised Learning
Pattern Recognition and Machine Learning
Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification
Indian Statistical Institute Kolkata
What is Statistical Modeling
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Classification and risk prediction
Speaker Adaptation for Vowel Classification
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Today Today: Finish Chapter 9, start Chapter 10 Sections from Chapter 9: 9.1, 9.4, 9.5, 9.10 (know just class notes for these sections) Recommended Questions:
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Advisor: Prof. Tony Jebara
Introduction to machine learning
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Today Evaluation Measures Accuracy Significance Testing
Introduction to Automatic Speech Recognition
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Isolated-Word Speech Recognition Using Hidden Markov Models
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
A Talking Elevator, WS2006 UdS, Speaker Recognition 1.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
MATH 2400 Ch. 15 Notes.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Math 4030 – 9a Introduction to Hypothesis Testing
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Predicting Voice Elicited Emotions
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Deep Feedforward Networks
ARTIFICIAL NEURAL NETWORKS
Statistical Models for Automatic Speech Recognition
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Statistical Models for Automatic Speech Recognition
Sfax University, Tunisia
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
LECTURE 07: BAYESIAN ESTIMATION
A maximum likelihood estimation and training on the fly approach
EM Algorithm and its Applications
Presentation transcript:

1 Bioinformatic Voice Applications: Speaker Recognition and Verification Andrew Rosenberg Biometric Seminar Day August 23, 2010

2 Outline Biometrics and Voice What can the Voice tell us about a Speaker Representing Speech Modeling Speakers Gaussian Mixture Model Universal Background Model

3 Biometrics and Voice Applications of Voice Biometrics Speaker Verification Are you who you say you are? Speaker Recognition Who are you? Diagnoses of Medical Pathologies and other Speaker States The voice can tell us other things about a speaker

4 Advantages of Voice Biometrics Minimally Intrusive Cheap Mechanisms to Collect Speech Data Established, low-risk, legal eavesdropping scenarios

5 Biometrics and Voice How does speech carry biometric information? How is speech produced? Articulators Vocal Tract First Language and Regional Influences Speech Pathologies Individual Differences

6 Production of Speech

7 Its ten below outside From the Queens University Speech Production and Perception Laboratory

8 Production of Speech Why did Ken set the soggy net on top of his deck? From the Queens University Speech Production and Perception Laboratory

9 Influences of Native Tongue Negative Language Transfer When speaking in a non-native tongue, speakers will use some characteristics from their native tongue. Very common in pronunciation /r/ vs. /l/ in Japanese and Chinese Cognates and false-cognates “elektrisch” = electricity “embarasada” ≠ embarassed Limited evidence of language transfer regarding grammar and word choice.

10 Assessment and Monitoring of medical problems How well is a patient coping with cancer treatment? Zellerman (2002) Is a patient clinically depressed? Alpert (2001) Moore (2003) Mundt (2007) Diagnosis of Schizophrenia through word choice Elvelag (2007 & 2009) Autism Spectrum Disorders demonstrated through lexical effects and “flat” prosody Rapin & Dunn (2003) Mesibov (1992) Le Normand (2008) Van Santen (2009

11 Automatic Detection of Pathological Speech Apraxia Green (2004) Shriberg (2004) Spasmodic Dysphonia & Muscular Tension Dysphonia Schlotthauer (2006) Stuttering Howell (1997) Czyzewski (2003) Parkinson’s Little (2008) Hammen (1989) Dyslexia Schulte-Köme (1999)

12 Speaker Verification Are you who you say you are? Security Applications Banking Restricted Facility Entry Forensics Compare stored speech against test speech Statistical modeling

13 Text Dependent vs. Text Independent Text Dependent Everyone says the same short phrase Text Independent Speakers say whatever they want. Typically no impact of the words that are said Text Dependent approaches have higher performance Text Independent approaches are more widely applicable

14 Speaker Verification Schematic Pipeline Training Testing Speech Parameterization speech data known speaker identity speech data claimed speaker identity Score Normalization Statistical Modeling Speech Parameterization Statistical Models speaker model speaker model background model Accept / Reject

15 Representation of Speech Mel-Frequency Cepstral Coefficients Typically taken every 10ms Often 20 coefficients Also include ∆ and ∆∆ in the feature vector, for a vector of 60 elements windowingFFTFilter Bank Cepstral Transform (DCT)

16 Statistical Modeling How does statistical modeling work? Learn a function that produces a probability. (Training) These functions are commonly represented in a parametric form. Learn the parameters.

17 Gaussian Model Gaussian Model or Normal Distribution Common and Easy to Work With Has 2 parameters: mean, variance (or standard deviation)

18 Gaussian Models in Higher Dimensions Normal Distributions in higher dimensions require slightly more complicated math, but operate identically Two parameters: A mean vector with d elements, a d-by-d covariance matrix.

19 Training a Gaussian Model The Gaussian Model that best fits a set of data has the traditional mean and standard deviation values. Can be proven with calculus, but we’re not going to today.

20 Gaussian Mixture Model But a lot of data is not actually normally distributed. A Mixture of Gaussian Models (GMM) allows us to add contributions from a number of Gaussians to best fit the data.

21 Modeling with a Gaussian Mixture Model Fitting a GMM to data. There isn’t a closed form to find the best parameterization of a GMM. Expectation-Maximization Powerful iterative optimization approach. Can be slow Can fall into local optima Algorithm: Initialize Assign points to mixtures Estimate mixture parameters Repeat until convergence

22 Speaker Verification Schematic Pipeline Training Testing Speech Parameterization speech data known speaker identity speech data claimed speaker identity Score Normalization Statistical Modeling Speech Parameterization Statistical Models speaker model speaker model background model Accept / Reject

23 Score normalization What does a score of.0005 mean? At what score should a system accept a users claim that they are who they say they are? We want to compare the likelihood that a speaker is who they say they are to the likelihood that they are another speaker. Universal Background Model

24 Speaker Verification with UBM score normalization For each speaker we have a GMM representing their voice. Additionally, we have one UBM-GMM that represents “speech” generally.

25 Speaker Verification Schematic Pipeline Training Testing Speech Parameterization speech data known speaker identity speech data claimed speaker identity Score Normalization Statistical Modeling Speech Parameterization Statistical Models speaker model speaker model background model Accept / Reject

26 Speaker Recognition Given speech from an unknown speaker can you tell me who it is? Requires some known material from the person in question. Now no longer a binary (True vs. False) question. Now a 1-of-N problem.

27 Verification vs. Recognition with GMMs

28 Speaker Recognition Overview Training Testing Speech Parameterization speech data known speaker identity speech data claimed speaker identity Score Normalization Statistical Modeling Speech Parameterization Statistical Models speaker model speaker models background model Speaker Prediction

29 State-of-the-art Speaker Verification What we have works fine. There has been a significant improvement to the state-of-the-art. Rather than model a speaker directly... model how the speaker differs from the average speaker (UBM). How can we do this? Move the UBM to best fit the new speaker.

30 Maximum A Posteriori Adaptation Update the UBM model parameters to best fit the new speaker data.

31 Maximum A Posteriori Adaptation Update the UBM model parameters to best fit the new speaker data.

32 Maximum A Posteriori Adaptation Store the transformation (or new value) of each parameter. Construct a new feature vector. Classifier using SVM (or another classifier) “Supervectors” Feature vectors of model parameters rather than speech features. MAP Classifier (SVM) supervectors speech representation (MFCC) UBM

33 UBM-MAP Overview Training Testing Speech Parameterization speech data known speaker identities speech data SVM Testing UBM-MAP Speech Parameterization UBM-MAP training supervectors testing supervectors Speaker Prediction SVM Training

34 Limitations of Current Speaker Verification and Adaptation Require Training material from the target. Can be slow to train. Best performance with Text-Dependent approaches

35 Summary of Voice Biometrics Speech carries speaker specific information Physiology Native Language Interference Personality Speaker State Idiosyncracies Speech is an attractive Biometric option. Inexpensive Technology requirements Minimally intrusuve Low-risk surveillance GMM modeling is a powerful way to statistically model a speaker’s voice for recognition and verification. >85-95% classification accuracy

36 Questions? Feel free to