Download presentation
Presentation is loading. Please wait.
Published byLenard Baker Modified over 9 years ago
1
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*, B. Dorizzi*, G. Chollet** * GET/INT, dept EPH, 9 rue Charles Fourier, 91011 EVRY France; **GET/ENST, Lab. CNRS-LTCI, 46 rue Barrault, 75634 Paris Emails: {Bao.Ly_van, Sonia.Salicetti, Bernadette.dorizzi}@int-evry.fr; {Blouet, Renouard, Chollet}@tsi.enst.fr
2
Outline Introduction: Why Speech and Signature? BIOMET database: brief description –Signature data –Speech data Writer verification systems Speaker verification systems Fusion systems Results and Conclusions
3
Introduction Multimodality in order to improve biometric authentication Two well accepted, non intrusive modalities: speech and signature Easy to implement on mobile devices such as PDA or mobile phones Verification systems were available in our respective teams
4
The BIOMET Database 5 modalities: hand-shape, fingerprints, on-line signatures, talking faces (video with digits and sentences) 131 persons: 50% male, 50% female Data from 68 persons for fusion (the rest of the persons was used for building a world model for speech verification purpose) Time variability: two sessions spaced of 5 months –S. Garcia-Salicetti, C. Beumier, G. Chollet, B. Dorizzi, J. Leroux- Les Jardins, J. Lunter, Y. Ni, D. Petrovska-Delacretaz, "BIOMET: a Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities", 4th International Conference on Audio and Video-Based Biometric Person Authentication, 2003.
5
Signature capture Captured on a digitizer : 200 Hz –WACOM Intuos2 A6 5 parameters: –Coordinates –Axial pressure –Azimuth and Altitude 15 genuine trials per person 12 impostor trials per person Altitude (0°-90°) 90° 270° 0° Azimuth (0°-359°) 180°
6
Signature modeling Preprocessing (filtering) Feature extraction: 12 parameters Modeling signature: continuous HMM –2 states, 3 gaussians per state –Bagging techniques: 10 models to build an «aggregated» model (average score) –Training: 10 signatures of one session Normalized score: |S i (O) - S i * |
7
Speech Two verification systems: –Data: Text-dependent: only a sequence of 4 digits among the 10 available digits (5 templates per speaker) Text-independent: sentences extracted from the original data: –client model: trained on digits (15 seconds) and tested on sentences –world model: trained on all the data available from 53 persons (131-68 people) –Methods: Text-dependent: DTW (Dynamic Time Warping) Text-independent: GMM (Gaussian Mixture Model)
8
Text-dependent (DTW) The DTW algorithm computes the spectral distance between two template patterns Template speech signal Sample speech signal DTW Score
9
Front-end GMM MODELING WORLD GMM MODEL Front-end GMM model adaptation TARGET GMM MODEL Text-independent (GMM)
10
Baseline GMM method HYPOTH. TARGET GMM MOD. Front-end WORLD GMM MODEL Test Speech LLR SCORE =
11
Fusion systems Additive Tree Classifier (ATC) –Boosting techniques on binary trees independently trained with the CART algorithm Support Vector Machine (SVM) –Linear kernel Inputs: –Normalized signature score –Text-dependent LLR score –Text-independent LLR score
12
Goal: finding an optimal partition R = {R k } 1 k K of the score space S=(s 1, s 2, s 3 ) according to an Information Theory criterion a sub-optimal solution, based on CART: Best partition : R* = arg min R C(R) Score estimation based on P(client|R k ) and P(world|R k ) at each leaf of a given tree Use of RealAdaboost to build 50 trees per client and to obtain a robust estimation of P(client|R k ) and P(world|R k ) Tree-based Approach for score fusion
13
A score S=(s 1, s 2, s 3 ) is presented to the system composed of 50 trees : each tree gives as output a decision score, based on the corresponding region R k the LLR score is computed with P(client|R k ) and P(world|R k ) an average score is then computed with the 50 scores Verification based on ATC
14
SVM principles X (X)(X) Input space Feature space Separating the data, with the optimal hyperplan H o HoHo H Class(X)
15
Fusion experiments The 68 people database: split in 2 equal parts –34 people: Fusion Learning Base (and threshold estimation for unimodal systems with the criterion min TE) –34 people: Fusion Test Base (and test of unimodal systems) Per person: –5 genuine bimodal values –12 impostor bimodal values
16
Fusion Performances
17
Conclusions Equivalent results of ATC and SVM: –role of Boosting (ATC) Fusion increases performance by a factor 2 relatively to the best unimodal system (in clear or noisy environments) Other methods to create noisy environments should be tested (not gaussian white noise but real one !) Fusion performances should also be studied only on the 2 speech verification systems, since no noise was introduced in the signature modality
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.