Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*, B. Dorizzi*, G. Chollet** * INT, dept EPH, 9 rue Charles Fourier, EVRY France; **ENST, Lab. CNRS-LTCI, 46 rue Barrault, Paris s: {Bao.Ly_van, Sonia.Salicetti, {Blouet, Renouard,
Overview Introduction: Why Speech and Signature? BIOMET database: brief description –Signature data –Speech data Writer verification Speaker verification systems Fusion systems Results and Conclusions
The BIOMET Database 5 modalities: hand shape, fingerprints, on- line signatures, talking faces 131 people: 50% male, 50% female Data from 68 people for fusion Time variability: two sessions spaced of 5 months –S. Garcia-Salicetti, C. Beumier, G. Chollet, B. Dorizzi, J. Leroux-Les Jardins, J. Lunter, Y. Ni, D. Petrovska-Delacretaz, "BIOMET: a Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities", 4th International Conference on Audio and Video-Based Biometric Person Authentication, 2003.
Signatures capture Captured on a digitizer : 200 Hz –WACOM Intuos2 A6 5 parameters: –Coordinates –Axial pressure –Azimuth and Altitude 15 genuine per person 12 forgeries per person Altitude (0°-90°) 90° 270° 0° Azimuth (0°-359°) 180°
Signatures modeling Preprocessing (filtering) Feature extraction: 12 parameters Modeling signature: continuous HMM –2 states, 3 gaussians per state –Bagging techniques: 10 models to build an «aggregated» model (average score) –Training: 10 signatures of one session Normalized score: |S i (O) - S i * |
Speech Two verification systems: –Data: volontary degraded Text-dependent: only 4 digits sequence among 10 digits (5 templates per speaker) Text-independent: sentences extracted from the original data: –client model: trained on digits (15 seconds) and tested on sentences –world model: trained on data from people –Methods: Text-dependent: DTW (Dynamic Time Warping) Text-independent: GMM (Gaussian Mixture Model)
Text-dependent (DTW) DTW computes the spectral distance between two template patterns Template speech signal Sample speech signal DTW Score
Front-end GMM MODELING WORLD GMM MODEL Front-end GMM model adaptation TARGET GMM MODEL Text-independent (GMM)
Baseline GMM method HYPOTH. TARGET GMM MOD. Front-end WORLD GMM MODEL Test Speech LLR SCORE =
Fusion systems Additive Tree Classifier (ATC) –Boosting techniques on Binary Trees –CART algorithm Support Vector Machine (SVM) –Linear kernel Input: –Normalized signature score –Text-dependent LLR score –Text-independent LLR score
Goal: finding an optimal partition R = {R k } 1 k K of the score space S=(s 1, s 2, s 3 ) according to an Information Theory criterion a sub-optimal solution, based on CART: Best partition : R* = arg min R C(R) Score estimation based on P(client|R k ) and P(world|R k ) at each node of a given tree Use of RealAdaboost to build 50 trees per client and to obtain a robust estimation of P(client|R k ) and P(world|R k ) Tree-based Approach for score fusion
A score S=(s 1, s 2, s 3 ) is presented to the system composed of 50 trees : each tree gives as output a score, based on the affected region R k the LLR score is computed with P(client|R k ) and P(world|R k ) an average score is then computed with the 50 scores Verification based on ATC
SVM principles X (X)(X) Input space Feature space Separating hyperplans H, with the optimal hyperplan H o HoHo H Class(X)
Fusion experiments The 68 people database: splitted in 2 equal parts –34 people: Fusion Learning Base (and threshold estimation for unimodal systems with the criterion min TE) –34 people: Fusion Test Base (and test of unimodal systems) Per person: –5 genuine bimodal values –12 impostor bimodal values
Fusion Performances
Conclusions Equivalent results of ATC and SVM: –role of Boosting (ATC) Fusion increases performance by a factor 2 relatively to the best unimodal system (in clear or noisy environments) Other methods to create noisy environments should be tested (not gaussian white noise but real one !) Fusion performances should also be studied only on the 2 speech verification systems, since no noise was introduced in the signature modality