Speaker Recognition G. CHOLLET, G. GRAVIER,

Slides:



Advertisements
Similar presentations
MAJORDOME Gérard CHOLLET, Richard CROCE, Laurence LIKFORMAN,
Advertisements

Spoken Language Interaction in Telecommunication at ENST/CNRS-LTCI Gérard CHOLLET, Richard CROCE, Dijana PETROVSKA-DELACRETAZ, Marc SIGELLE, Pascal VAILLANT,
Becars: an Automatic Speaker Verification system
Some activities on Biometrics at ENST/CNRS-LTCI
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
AN OVERVIEW OF BIOMETRIC ATMs. WHY ? CONVENTIONAL ATMs -> BIOMETRIC ATMs Environmental Concerns Environmental Concerns Security Concerns Security Concerns.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Introduction to Biometrics Dr. Pushkin Kachroo. New Field Face recognition from computer vision Speaker recognition from signal processing Finger prints.
1 Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet Reconnaissance du locuteur 1.Introduction, Historique, Domaines.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A PRESENTATION BY SHAMALEE DESHPANDE
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Introduction to Automatic Speech Recognition
Speech Technology Center Solutions for Mobile Phones.
Institute of Information Science, Academia Sinica, Taiwan Speaker Verification via Kernel Methods Speaker : Yi-Hsiang Chao Advisor : Hsin-Min Wang.
An Introduction to Biometric Identity Verification
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Douglas A. Reynolds, PhD Senior Member of Technical Staff
A Talking Elevator, WS2006 UdS, Speaker Recognition 1.
1 Phoneme and Sub-phoneme T- Normalization for Text-Dependent Speaker Recognition Doroteo T. Toledano 1, Cristina Esteve-Elizalde 1, Joaquin Gonzalez-Rodriguez.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Juan Ortega 10/20/09 NTS490. Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their.
Voice Based Autonomous Access Control Terminals HEXIUM Technical Development Co., Ltd
At a glance…  Introduction  How Biometric Systems Work ?  Popular Biometric Methodologies  Multibiometrics  Applications  Benefits  Demerits 
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani 1 Gérard Chollet 2 1: DIVA Group, University of Fribourg 2: GET-ENST, CNRS-LTCI,
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
BIOMETRICS FOR RECOGNITION. Presentation Outlines  Traditional methods of security  Need for biometrics  Biometrics recognition techniques  How biometrics.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
© 2013 by Larson Technical Services
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
INTRODUCTION TO BIOMATRICS ACCESS CONTROL SYSTEM Prepared by: Jagruti Shrimali Guided by : Prof. Chirag Patel.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Shital ghule..  INTRODUCTION: This paper proposes an ATM security model that would combine a physical access card,a pin and electronic facial recognition.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.
High Quality Voice Morphing
PATTERN COMPARISON TECHNIQUES
BLIND AUTHENTICATION: A SECURE CRYPTO-BIOMETRIC VERIFICATION PROTOCOL
Speech Technology Center Solutions
Biometrics Reg: AMP/HNDIT/F/F/E/2013/067.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Seminar Presentation on Biometrics
Sfax University, Tunisia
Speaker Identification:
Presentation transcript:

Speaker Recognition G. CHOLLET, G. GRAVIER, J. KHARROUBI, D. PETROVSKA-DELACRETAZ (chollet, kharroub,petrovsk)@tsi.enst.fr ggravier@infres.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 http://www.tsi.enst.fr/~chollet

Our affiliations ENST: Ecole Nationale Supérieure des Télécommunications http://www.enst.fr CNRS: Centre National de la Recherche Scientifique http://www.cnrs.fr LTCI: Laboratoire de Traitement et Communication de l’Information http://www.enst.fr/ura/ura.html

What is ENST? Ecole Nationale Supérieure des Télécommunications classed among the ‘Grandes Ecoles d'Ingénieurs’. 250 state certified engineers each year . part of ‘Groupement des Ecoles de Télécommunications’

Modalities for Identity Verification Bla-bla SECURED SPACE PIN 111111111

Modalities for Identity Verification A device you own (key, smart card,…) A code you remember (password, …) Could be lost or stolen Physiological characteristics: Face, iris, finger print, hand shape,… Need special equipment Behavioral characteristics: Speech, signature, keystroke,… Speech is the prefered modality over the telephone (but a ‘voice print’ is much more variable than a finger print)

Outline Where is the information about the speaker identity in the speech signal ? How well could humans recognize a speaker ? Applications of Speaker Recognition Prior knowledge on what the speaker said Combining Speech Recognition and Speaker Verification Some research activities at ENST: Speaker verification: The CAVE-PICASSO projects (text dependent) The ELISA consortium, NIST evaluations (text independent) The EUREKA !2340 MAJORDOME project Multimodal Identity Verification: The M2VTS and BIOMET projects Perspectives

Speaker Identity in Speech Differences in Vocal tract shapes and muscular control Fundamental frequency (typical values) 100 Hz (Male), 200 Hz (Female), 300 Hz (Child) Glottal waveform Phonotactics Lexical usage The differences between Voices of Twins is a limit case Voices can also be imitated or disguised

Speaker Identity suprasegmental factors segmental factors (~30ms) spectral envelope of / i: / f A Speaker A Speaker B Speaker Identity segmental factors (~30ms) glottal excitation: fundamental frequency, amplitude, voice quality (e.g., breathiness) vocal tract: formant frequencies and bandwidths suprasegmental factors speaking speed (timing and rhythm of speech units) intonation patterns dialect, accent, pronunciation habits

Inter-speaker Variability We were away a year ago.

Intra-speaker Variability We were away a year ago.

Vocal Apparatus

Speech production

Glottal Waveform Modeling Fitting a glottal pulse model to the excitation waveform allows perceptually relevant modifications to voice quality A t original residual: blue synthetic residual: red

Applications of Speaker Recognition Identification from an open set (unrealistic) Identification from a closed set (who is speaking in a videoconference ?) Verification of claimed identity (risk of deliberate imposture) The human performance in speaker recognition is far from being perfect (highly dependent on familiarity with the subject)

Speaker Verification Typology of approaches (EAGLES Handbook) Text dependent Public password Private password Customized password Text prompted Text independent Incremental enrolment Evaluation

What are the sources of difficulty ? Intra-speaker variability of the speech signal (due to stress, pathologies, environmental conditions,…) Recording conditions (filtering, noise,…) Temporal drift Intentional imposture Voice disguise

Text-dependent Speaker Verification Uses Automatic Speech Recognition techniques (DTW, HMM, …) Client model adaptation from speaker independent HMM (‘World’ model) Synchronous alignment of client and world models for the computation of a score.

Dynamic Time Warping (DTW)

HMM structure depends on the application

Signal detection theory

Score normalisation World model Cohort normalisation Discriminant techniques

Detection Error Tradeoff (DET) Curve

CAVE – PICASSO http://www.picasso.ptt-telecom.nl/project/

Incremental enrolment of customised password The client chooses his password using some feedback from the system. The system attempts a phonetic transcription of the password. Incremental enrolment is achieved on further repetitions of that password Speaker independent phone HMM are adapted with the client enrolment data. Synchronous alignment likelihood ratio scoring is performed on access trials.

Deliberate imposture The impostor has some recordings of the target client voice. He can record the same sentences and align these speech signals with the recordings of the client. A transformation (Multiple Linear Regression) is computed from these aligned data. The impostor has heard the target client password. He records that password and applies the transformation to this recording. The PICASSO reference system with less than 1 % EER is defeated by this procedure (more than 30 % EER)

Speaker Verification (text independent) The ELISA consortium ENST, LIA, IRISA, ... http://www.lia.univ-avignon.fr/equipes/RAL/elisa/index_en.html NIST evaluations http://www.nist.gov/speech/tests/spk/index.htm Ergodic HMM Gaussian Mixture Model

Gaussian Mixture Model Parametric representation of the probability distribution of observations:

Gaussian Mixture Models 8 Gaussians per mixture

National Institute of Standards & Technology (NIST) Speaker Verification Evaluations Annual evaluation since 1995 Common paradigm for comparing technologies

GMM speaker modeling WORLD GMM MODEL TARGET GMM MODEL GMM MODELING WORLD DATA TARGET SPEAKER Front-end GMM MODELING WORLD GMM MODEL GMM model adaptation TARGET GMM MODEL

Baseline GMM method l WORLD GMM MODEL HYPOTH. TARGET GMM MOD. = Front-end WORLD GMM MODEL Test Speech = LLR SCORE

Support Vector Machines and Speaker Verification Hybrid GMM-SVM system is proposed SVM scoring model trained on development data to classify true-target speakers access and impostors access, using new feature representation based on GMMs Modeling Scoring GMM SVM

SVM principles X y(X) Feature space Input space H Class(X) Ho Separating hyperplans H , with the optimal hyperplan Ho Ho H Class(X)

Results

Combining Speech Recognition and Speaker Verification. Speaker independent phone HMMs Selection of segments or segment classes which are speaker specific Preliminary evaluations are performed on the NIST extended data set (one hour of training data per speaker)

Selection of nasals in words in -ing being everything getting anything thing something things going

«MAJORDOME» Vecsys EDF Software602 KTH Mensatec UPC Airtel Unified Messaging System Eureka Projet no 2340 D. Bahu-Leyser, G. Chollet, K. Hallouli , J. Kharroubi, L. Likforman, D. Mostefa, D. Petrovska, M. Sigelle, P. Vaillant

Majordome’s Functionalities Speaker verification Dialogue Routing Updating the agenda Automatic summary Voice Fax E-mail MAJORDOME (

Voice technology in Majordome Server side background tasks: continuous speech recognition applied to voice messages upon reception Detection of sender’s name and subject User interaction: Speaker identification and verification Speech recognition (receiving user commands through voice interaction) Text-to-speech synthesis (reading text summaries, E-mails or faxes)

BIOMET Bla-bla SECURED SPACE PIN 111111111

BIOMET An extension of the M2VTS and DAVID projects to include such modalities as signature, finger print, hand shape. Initial support (two years) is provided by GET (Groupement des Ecoles de Télécommunications) Emphasis will be on fusion of scores obtained from two or more modalities.

Conclusions and Perspectives Evaluation trials (as conducted by NIST) help improve technology. A strategy combining speech recognition and segmental scoring seems to be a promissing approach for speaker verification. Whenever possible, text independent speaker verification should be confirmed by text dependent verification. Whenever possible, fusion of multiple experts (preferably multimodal) should be performed.