1 Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet Reconnaissance du locuteur 1.Introduction, Historique, Domaines.

Slides:



Advertisements
Similar presentations
Becars: an Automatic Speaker Verification system
Advertisements

Some activities on Biometrics at ENST/CNRS-LTCI
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Speaker Recognition G. CHOLLET, G. GRAVIER,
The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.
Introduction to Biometrics Dr. Pushkin Kachroo. New Field Face recognition from computer vision Speaker recognition from signal processing Finger prints.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Introduction to Automatic Speech Recognition
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speaker Recognition By Afshan Hina.
An Introduction to Biometric Identity Verification
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Douglas A. Reynolds, PhD Senior Member of Technical Staff
A Talking Elevator, WS2006 UdS, Speaker Recognition 1.
7-Speech Recognition Speech Recognition Concepts
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Juan Ortega 10/20/09 NTS490. Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
IAFPA 2007 Plymouth, July 22-25, 2007 Developments in automatic speaker recognition at the BKA Michael Jessen, Bundeskriminalamt Franz Broß, Univ. Applied.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Voice Recognition All Talk No Walk.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani 1 Gérard Chollet 2 1: DIVA Group, University of Fribourg 2: GET-ENST, CNRS-LTCI,
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Speaker Authentication Qi Li and Biing-Hwang Juang, Pattern Recognition in Speech and Language Processing, Chap 7 Reporter : Chang Chih Hao.
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.
ARTIFICIAL NEURAL NETWORKS
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Speech Processing Speech Recognition
Statistical Models for Automatic Speech Recognition
Sfax University, Tunisia
Asst. Prof. Arvind Selwal, CUJ,Jammu
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
A maximum likelihood estimation and training on the fly approach
Presentation transcript:

1 Cours parole du 9 Mars 2005 enseignants: Dr. Dijana Petrovska-Delacrétaz et Gérard Chollet Reconnaissance du locuteur 1.Introduction, Historique, Domaines d’applications 2.Les indices de l’identité dans la parole 3.Vérification du locuteur 1.Théorie de la decision 2.Dépendante / Indépendante du texte 4.L’imposture vocale 5.Vérification audio-visuelle de l’identité 6.Evaluations 7.Conclusions

2 Why should a computer recognize who is speaking ? Protection of individual property (habitation, bank account, personal data, messages, mobile phone, PDA,...) Limited access (secured areas, data bases) Personalization (only respond to its master’s voice) Locate a particular person in an audio-visual document (information retrieval) Who is speaking in a meeting ? Is a suspect the criminal ? (forensic applications)

3 Tasks in Automatic Speaker Recognition Speaker verification (Voice Biometrics)  Are you really who you claim to be ? Identification (Speaker ID) :  Is this speech segment coming from a known speaker ?  How large is the set of speakers (population of the world) ? Speaker detection, segmentation, indexing, retrieval, tracking :  Looking for recordings of a particular speaker Combining Speech and Speaker Recognition  Adaptation to a new speaker, speaker typology  Personalization in dialogue systems

4 Applications Access Control  Physical facilities, Computer networks, Websites Transaction Authentication  Telephone banking, e-Commerce Speech data Management  Voice messaging, Search engines Law Enforcement  Forensics, Home incarceration

5 Voice Biometric Avantages  Often the only modality over the telephone,  Low cost (microphone, A/D), Ubiquity  Possible integration on a smart (SIM) card  Natural bimodal fusion : speaking face Disadvantages  Lack of discretion  Possibility of imitation and electronic imposture  Lack of robustness to noise, distortion,…  Temporal drift

6 Speaker Identity in Speech Differences in  Vocal tract shapes and muscular control  Fundamental frequency (typical values)  100 Hz (Male), 200 Hz (Female), 300 Hz (Child)  Glottal waveform  Phonotactics  Lexical usage The differences between Voices of Twins is a limit case Voices can also be imitated or disguised

7 spectral envelope of / i: / f A Speaker A Speaker B Speaker Identity segmental factors (~30ms)  glottal excitation: fundamental frequency, amplitude, voice quality (e.g., breathiness)  vocal tract: characterized by its transfer function and represented by MFCCs (Mel Freq. Cepstral Coef) suprasegmental factors  speaking speed (timing and rhythm of speech units)  intonation patterns  dialect, accent, pronunciation habits

8 What are the sources of difficulty ? Intra-speaker variability of the speech signal (due to stress, pathologies, environmental conditions,…) Recording conditions (filtering, noise,…) Channel mismatch between enrolment and testing Temporal drift Intentional imposture Voice disguise

9 Acoustic features Short term spectral analysis

10 Intra- and Inter-speaker variability

11 Speaker Verification Typology of approaches (EAGLES Handbook)  Text dependent  Public password  Private password  Customized password  Text prompted  Text independent Incremental enrolment Evaluation

12 History of Speaker Recognition

13 Current approaches

14 Dynamic Time Warping (DTW) Best path “Bonjour” locuteur test Y “Bonjour” locuteur X “Bonjour” locuteur 1 “Bonjour” locuteur 2 “Bonjour” locuteur n DODDINGTON 1974, ROSENBERG 1976, FURUI 1981, etc.

15 Vector Quantization (VQ) best quant. Dictionnaire locuteur 1 Dictionnaire locuteur 2 Dictionnaire locuteur n “Bonjour” locuteur test Y Dictionnaire locuteur X SOONG, ROSENBERG 1987

16 Hidden Markov Models (HMM) Best path “Bonjour” locuteur 1 “Bonjour” locuteur 2 “Bonjour” locuteur n “Bonjour” locuteur test Y “Bonjour” locuteur X ROSENBERG 1990, TSENG 1992

17 Ergodic HMM Best path HMM locuteur 1 HMM locuteur 2 HMM locuteur n “Bonjour” locuteur test Y HMM locuteur X PORITZ 1982, SAVIC 1990

18 Gaussian Mixture Models (GMM) REYNOLDS 1995

19 HMM structure depends on the application

20 Some issues in Text-dependent Speaker Verification Systems : The CAVE and PICASSO projects Sequences of digits  Speaker independent HMM of each digit  Adaptation of these HMMs to the client voice (during enrolment and incremental enrolment)  EER of less than 1 % can be achieved Customized password  The client chooses his password using some feedback from the system Deliberate imposture

21 Gaussian Mixture Model Parametric representation of the probability distribution of observations:

22 Gaussian Mixture Models 8 Gaussians per mixture

23 GMM speaker modeling Front-end GMM MODELING WORLD GMM MODEL Front-end GMM model adaptation TARGET GMM MODEL

24 Baseline GMM method HYPOTH. TARGET GMM MOD. Front-end WORLD GMM MODEL Test Speech LLR SCORE =

25 Two types of errors :  False rejection (a client is rejected)  False acceptation (an impostor is accepted) Decision theory : given an observation O and a claimed identity  H 0 hypothesis : it comes from an impostor  H 1 hypothesis : it comes from our client H 1 is chosen if and only if P(H 1 |O) > P(H 0 |O) which could be rewritten (using Bayes law) as Decision theory for identity verification

26 Signal detection theory

27 Decision

28 Distribution of scores

29 Detection Error Tradeoff (DET) Curve

30 Evaluation Decision cost (FA, FR, priors, costs,…) Receiver Operating Characteristic Curve Reference systems (open software) Evaluations (algorithms, field trials, ergonomy,…)

31 NIST Speaker Verification Evaluations A reference standard to compare algorithms and stimulate new developments Distribution (via LDC) of development and test databases with :  Increasing difficulty (from land line to mobile)  Several hundreds of speakers (2 mn of training data per client),  Several thousands test accesses (5 to 50 sec per access), Participation of labs every year (MIT, IBM, Nuance, Queensland Univ, ELISA consortium,….) Annual workshop, Special issues in Journals, …

32 National Institute of Standards & Technology (NIST) Speaker Verification Evaluations Annual evaluation since 1995 Common paradigm for comparing technologies

33 Speaker Verification (text independent) The ELISA consortium  ENST, LIA, IRISA,...  BECARS : Balamand-ENST CEDRE Automatic Recognition of Speakers NIST evaluations 

34 NIST evaluations : Results

35 Evaluations: NIST 2004

36 Combining Speech Recognition and Speaker Verification. Speaker independent phone HMMs Selection of segments or segment classes which are speaker specific Preliminary evaluations are performed on the NIST extended data set (one hour of training data per speaker)

37 ALISP : Automatic Language Independent Speech Processing Data-driven speech segmentation

38 Searching in client and world speech dictionaries for speaker verification purposes

39 Fusion

40 Fusion results

41 Voice Transformations and Forgery (occasional, dedicated) Isolated individuals with few resources or “professional impostors” with a dedicated budget can menace the security of speaker recognition systems Voice transformation technologies (e.g. segmental synthesis using an inventory of client speech data) are nowadays available Speaker recognition research should explicitly address this forgery issue and define appropriate countermeasures  Prevention by predicting many different forgery scenarios

42 Voice Forgery using ALISP The same words or not Impostor The same words or not client transformation A modification of a source speaker‘s speech to imitate a target speaker

43 Conversion system: ALISP encoder Speech MFCC analysis HNM HMM recognition Harmonic envelope Symbol index - Representative index - DTW path Choice of the best representative unit Prosody (energy+pitch) MFCC + delta Database of HNM Representatives HMM models Noise envelope

44 Conversion system: ALISP Decoder Concatenation of HNM parameters for each representative HNM Synthesis Speech signalSymbol index Pitch, energy, timing Representative index DTW path

45 Preliminary results: DET curves Fa before forgery : 16 ± 2.0 % (1700 files) Fa after forgery : 26 ± 2.0 % (1700 files)

46 Preliminary results True distributions

47 Multimodal Identity Verification M2VTS (face and speech)  front view and profile  pseudo-3D with coherent light BIOMET: (face, speech, fingerprint, signature, hand shape)  data collection  reuse of the M2VTS and DAVID data bases  experiments on the fusion of modalities

48 Speaking Faces : Motivations In many situation a video sequence is acquired Fusion of face and speech increases robustness Forgery is more difficult

49 Talking Face Recognition (hybrid verification)

50 Lip features Tracking lip movements

51 A talking face model Using Hidden Markov Models (HMMs) Acoustic parameters Visual parameters

52 Imposture Model

53 Cloning

54 Conclusions, Perspectives Deliberate imposture is a challenge for speech only systems Verification of identity based on features extracted from talking faces should be developped Common databases and evaluation protocols are necessary Free access to reference systems will facilitate future developments

55 BioSecure Residential Workshop Aug. 1st - 26th, 2005 in ENST, Paris Reference systems for speech, face, talking face, fingerprint, iris, hand, signature, … Comparative evaluations on large databases (BIOMET, BANCA, FVC,…) Fusion of modalities