Download presentation
Presentation is loading. Please wait.
Published byCecily Harrell Modified over 9 years ago
1
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST chollet@tsi.enst.fr@ Biometrics in Current Security Environments Biometrics in Current Security Environments
2
June 28th, 2004BioSecure, SecurePhone2Outline State of affairs (tasks, security, forensic,…) Speaker characteristics in the speech signal Automatic Speaker Verification : Decision theory Text dependent / Text independent Imposture (occasional, dedicated) Voice transformations Audio-visual speaker verification Evaluations (algorithms, field tests, ergonomy,…) Conclusions, Perspectives
3
June 28th, 2004BioSecure, SecurePhone3 Why should a computer recognize who is speaking ? Protection of individual property (habitation, bank account, personal data, messages, mobile phone, PDA,...) Limited access (secured areas, data bases) Personalization (only respond to its master’s voice) Locate a particular person in an audio-visual document (information retrieval) Who is speaking in a meeting ? Is a suspect the criminal ? (forensic applications)
4
June 28th, 2004BioSecure, SecurePhone4 Tasks in Automatic Speaker Recognition Speaker verification (Voice Biometric) Are you really who you claim to be ? Identification (Speaker ID) : Is this speech segment coming from a known speaker ? How large is the set of speakers (population of the world) ? Speaker detection, segmentation, indexing, retrieval, tracking : Looking for recordings of a particular speaker Combining Speech and Speaker Recognition Adaptation to a new speaker, speaker typology Personalization in dialogue systems
5
June 28th, 2004BioSecure, SecurePhone5 Applications Access Control Physical facilities, Computer networks, Websites Transaction Authentication Telephone banking, e-Commerce Speech data Management Voice messaging, Search engines Law Enforcement Forensics, Home incarceration
6
June 28th, 2004BioSecure, SecurePhone6 Voice Biometric Avantages Often the only modality over the telephone, Low cost (microphone, A/D), Ubiquity Possible integration on a smart (SIM) card Natural bimodal fusion : speaking face Disadvantages Lack of discretion Possibility of imitation and electronic imposture Lack of robustness to noise, distortion,… Temporal drift
7
June 28th, 2004BioSecure, SecurePhone7 Speaker Identity in Speech Differences in Vocal tract shapes and muscular control Fundamental frequency (typical values) 100 Hz (Male), 200 Hz (Female), 300 Hz (Child) Glottal waveform Phonotactics Lexical usage The differences between Voices of Twins is a limit case Voices can also be imitated or disguised
8
June 28th, 2004BioSecure, SecurePhone8 spectral envelope of / i: / f A Speaker A Speaker B Speaker Identity segmental factors (~30ms) glottal excitation: fundamental frequency, amplitude, voice quality (e.g., breathiness) vocal tract: characterized by its transfer function and represented by MFCCs (Mel Freq. Cepstral Coef) suprasegmental factors speaking speed (timing and rhythm of speech units) intonation patterns dialect, accent, pronunciation habits
9
June 28th, 2004BioSecure, SecurePhone9 Acoutic features Short term spectral analysis
10
June 28th, 2004BioSecure, SecurePhone10 Intra- and Inter-speaker variability
11
June 28th, 2004BioSecure, SecurePhone11 Speaker Verification Typology of approaches (EAGLES Handbook) Text dependent Public password Private password Customized password Text prompted Text independent Incremental enrolment Evaluation
12
June 28th, 2004BioSecure, SecurePhone12 History of Speaker Recognition
13
June 28th, 2004BioSecure, SecurePhone13 Current approaches
14
June 28th, 2004BioSecure, SecurePhone14 HMM structure depends on the application
15
June 28th, 2004BioSecure, SecurePhone15 Gaussian Mixture Model Parametric representation of the probability distribution of observations:
16
June 28th, 2004BioSecure, SecurePhone16 Gaussian Mixture Models 8 Gaussians per mixture
17
June 28th, 2004BioSecure, SecurePhone17 Two types of errors : False rejection (a client is rejected) False acceptation (an impostor is accepted) Decision theory : given an observation O and a claimed identity H 0 hypothesis : it comes from an impostor H 1 hypothesis : it comes from our client H 1 is chosen if and only if P(H 1 |O) > P(H 0 |O) which could be rewritten (using Bayes law) as Decision theory for identity verification
18
June 28th, 2004BioSecure, SecurePhone18 Signal detection theory
19
June 28th, 2004BioSecure, SecurePhone19 Decision
20
June 28th, 2004BioSecure, SecurePhone20 Distribution of scores
21
June 28th, 2004BioSecure, SecurePhone21 Detection Error Tradeoff (DET) Curve
22
June 28th, 2004BioSecure, SecurePhone22 Evaluation Decision cost (FA, FR, priors, costs,…) Receiver Operating Characteristic Curve Reference systems (open software) Evaluations (algorithms, field trials, ergonomy,…)
23
June 28th, 2004BioSecure, SecurePhone23 National Institute of Standards & Technology (NIST) Speaker Verification Evaluations Annual evaluation since 1995 Common paradigm for comparing technologies
24
June 28th, 2004BioSecure, SecurePhone24 NIST evaluations : Results
25
June 28th, 2004BioSecure, SecurePhone25 Combining Speech Recognition and Speaker Verification. Speaker independent phone HMMs Selection of segments or segment classes which are speaker specific Preliminary evaluations are performed on the NIST extended data set (one hour of training data per speaker)
26
June 28th, 2004BioSecure, SecurePhone26 ALISP data-driven speech segmentation
27
June 28th, 2004BioSecure, SecurePhone27 Searching in client and world speech dictionaries for speaker verification purposes
28
June 28th, 2004BioSecure, SecurePhone28 Fusion
29
June 28th, 2004BioSecure, SecurePhone29 Fusion results
30
June 28th, 2004BioSecure, SecurePhone30 Speaking Faces : Motivations A person speaking in front of a camera offers 2 modalities for identity verification (speech and face). The sequence of face images and the synchronisation of speech and lip movements could be exploited. Imposture is much more difficult than with single modalities. Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e- banking,…
31
June 28th, 2004BioSecure, SecurePhone31 Talking Face Recognition (hybrid verification)
32
June 28th, 2004BioSecure, SecurePhone32 Lip features Tracking lip movements
33
June 28th, 2004BioSecure, SecurePhone33 A talking face model Using Hidden Markov Models (HMMs) Acoustic parameters Visual parameters
34
June 28th, 2004BioSecure, SecurePhone34 Morphing, avatars
35
June 28th, 2004BioSecure, SecurePhone35 Conclusions, Perspectives Deliberate imposture is a challenge for speech only systems Verification of identity based on features extracted from talking faces should be developped Common databases and evaluation protocols are necessary Free access to reference systems will facilitate future developments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.