Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Parametric measures to estimate and predict performance of identification techniques Amos Y. Johnson & Aaron Bobick STATISTICAL METHODS FOR COMPUTATIONAL.
Design of Experiments Lecture I
Brief introduction on Logistic Regression
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Learning Algorithm Evaluation
Table of Contents Exit Appendix Behavioral Statistics.
1 Performance Evaluation of Score Level Fusion in Multimodal Biometric Systems Web Computing Laboratory Computer Science and Information Engineering Department.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.
Classification and risk prediction
QUANTITATIVE DATA ANALYSIS
PSY 307 – Statistics for the Behavioral Sciences
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Sampling Distributions
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Major Points An example Sampling distribution Hypothesis testing
Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics IEEE Trans on PAMI, VOL. 25, NO.9, 2003 Kyong Chang, Kevin W. Bowyer,
Robert S. Zack, Charles C. Tappert, and Sung-Hyuk Cha Pace University, New York Performance of a Long-Text-Input Keystroke Biometric Authentication System.
Biometric ROC Curves Methods of Deriving Biometric Receiver Operating Characteristic Curves from the Nearest Neighbor Classifier Robert Zack dissertation.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Experimental Evaluation
Inferences About Process Quality
Chemometrics Method comparison
Examples of continuous probability distributions: The normal and standard normal.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Chapter 3 Data Exploration and Dimension Reduction 1.
by B. Zadrozny and C. Elkan
MAKING DECISIONS IN THE FACE OF VARIABILITY TWSSP Wednesday.
PARAMETRIC STATISTICAL INFERENCE
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
1 Pattern Recognition Concepts How should objects be represented? Algorithms for recognition/matching * nearest neighbors * decision tree * decision functions.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Chapter 9 Probability. 2 More Statistical Notation  Chance is expressed as a percentage  Probability is expressed as a decimal  The symbol for probability.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Lecture 4 Linear machine
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
RESEARCH & DATA ANALYSIS
Biometric for Network Security. Finger Biometrics.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
1.  In the words of Bowley “Dispersion is the measure of the variation of the items” According to Conar “Dispersion is a measure of the extent to which.
NTU & MSRA Ming-Feng Tsai
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
The Normal Distribution. Normal and Skewed Distributions.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Different Types of Data
MATH-138 Elementary Statistics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Multi-Biometrics: Fusing At The Classification Output Level Using Keystroke and Mouse Motion Features Todd Breuer, Paola Garcia Cardenas, Anu George, Hung.
EE513 Audio Signals and Systems
Model generalization Brief summary of methods
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM

Agenda  Introduction  Measures of performance  Measures of ensemble effectiveness  Combination Rules  Experimental Results  Conclusion 2

INTRODUCTION

Introduction  Multimodal biometrics is better  Fuse multiple biometric results  Fusion at matching level is easier 4

Introduction  Which biometric experts shall we choose?  How to evaluate ensemble effectiveness?  Which measure gives out the best result? 5

MEASURES OF PERFORMANCE

Measures of performance  Notation  E={E 1 …E j …E N }: a set of N experts  U={u i }: the set of users  s j : the set of all scores by E j for all user  s ij : the score by E j for a user u i  f j (u i ): function of E j produce s ij for u i  th: threshold; gen: genuine; imp: impostor 7

Measures of performance: Basic  False Rejection Rate(FRR) for expert E j :  False Acceptance Rate(FAR) for expert E j : 8

Measures of performance: Basic  p(s j |gen): E j score probability distribution to genuine users  p(s j |imp): E j score probability distribution to impostor users  Threshold(th) changes with the requirements of the application at hand 9

Measures of performance  Area under the ROC curve(AUC)  Equal error rate(ERR)  The “decidability” index d’ 10

Measures of performance 11

Measures of performance: AUC  Estimate AUC by Mann-Whitney statistics:  This formulation of AUC is also called the “probability of correct pair-wise ranking”, as it computes the probability P( > ) 12

Measures of performance: AUC   n + /n − : no. of genuine/imposter users  : score set by E j for genuine users  : score set by E j for impostor users 13

Measures of performance: AUC  Features of AUC estimated by WMW stat. :  Theoretically equivalent to the value by integrating ROC curve  Attain more reliable estimation of AUC in real cases(finite samples)  Divide all scores s ij into 2 sets: & 14

Measures of performance: EER  EER is the point of ROC curve where FAR and FRR are equal  The lower the value of EER, the better the performance of a biometric system 15

Measures of performance: d’  The d’ in the biometrics is to measure the separability of the distributions of genuine and impostor scores  16

Measures of performance: d’  μ gen /μ imp : mean of genuine/impostor score distribution  σ gen /σ imp : std. deviation of genuine/impostor score distribution  The larger the d’, the better the performance of a biometric system 17

MEASURES OF ENSEMBLE EFFECTIVENESS

Measures of ensemble effectiveness  4 measures for estimating effectiveness of ensemble of biometric experts: AUC, EER, d’, and Score Dissimilarity(SD) Index  But we must take the difference in performance among the experts into consideration 19

Measures of ensemble effectiveness  Generic, weighted and normalized performance measure(pm) formulation:  pm δ =μ pm ∙ (1−tanh(σ pm ))  For AUC: AUC δ =μ AUC ∙ (1−tanh(σ AUC ))  The higher the AUC average, the better the performances of an ensemble of experts 20

Measures of ensemble effectiveness  For ERR: ERR δ =μ ERR ∙ (1−tanh(σ ERR ))  The lower the ERR average, the better the performances of an ensemble of experts  For d’, consider the value of d’ that can be much larger than 1, use normalized D’=log b (1+d’) instead of d’, and base b=10 according to the values of d’ in experiments  Thus D’ δ =μ D’ ∙ (1−tanh(σ D’ )) is used 21

Measures of ensemble effectiveness: SD index  SD index is based on the WMW formulation of the AUC, and is designed to measure the amount of improvement in AUC of the combination of an ensemble of experts  SD index is a measure of the amount of AUC that can be “recovered” by exploiting the complementarity of the experts 22

Measures of ensemble effectiveness: SD index  Consider 2 experts E1 & E2, and all possible scores pairs, divide these pairs into 4 subsets S 00, S 10, S 01, S 11 : 23

Measures of ensemble effectiveness: SD index  AUC of E1 & E2 are listed below, where card(S uv ) is the cardinality of the subset S uv :  SD index is defined as: 24

Measures of ensemble effectiveness: SD index  The higher the value of SD, the higher the maximum AUC that could be obtained by the combined scores  But actual increments of AUC depends on the combination method, and high SDs usually related to low performance experts  Performance measure formulation for SD: SD δ =μ SD ∙ (1−tanh(σ SD )) 25

COMBINATION RULES

Combination Rules  Combination(Fusion) in this work is at the score level, as it is the most widely used and flexible combination level  Investigate the performance of 4 combination methods: mean rule, product rule, linear combination by LDA, and DSS  LDA & DSS require a training phase to estimate the parameters needed to perform the combination 27

Combination Rules: Mean Rule  The mean rule is applied directly to the matching scores produced by the set of N experts  28

Combination Rules: Product Rule  The product rule is applied directly to the matching scores produced by the set of N experts  29

Combination Rules: Linear Combination by LDA  Linear discriminant analysis(LDA) can be used to compute the weights of a linear combination of the scores  This rule is to attain a fused score with minimum within-class variations and maximum between-class variations  30

Combination Rules: Linear Combination by LDA   W t (W): transformation vector computed using a training set  S i : vector of the scores assigned to the user u i by all the experts  μ gen /μ imp : mean of genuine/impostor score distribution  S w : within-class scatter matrix 31

Combination Rules: DSS  Dynamic score selection(DSS) is to select one of the scores s ij available for each user u i, instead of fusing them into a new score  The ideal selector is based on the knowledge of the state of nature of each user: 32

Combination Rules: DSS  DSS selects the scores according estimation of the state of nature for each user, and the algorithm is based on quadratic discriminant classifier (QDC)  For the estimation, a vector space is built where the vector components are the scores assigned to the user by the N experts 33

Combination Rules: DSS  Train a classifier on this vector space by using a training set related to genuine and impostor users  Using the classifier to estimate the state of nature of the user  After getting the estimation of the state of nature of the user, select user’s score according to (5). 34

EXPERIMENTAL RESULTS

Experimental Results: Goal  Investigate the correlation between the measures of the effectiveness of the ensemble  Understand final performances achieved by the combined experts, and get the best measures 36

Experimental Results: Preparation  Scores source: 41 experts and 4 DBs from open category in 3rd Fingerprint Verification Competition(FVC2004)  No. of scores: For each sensor and for each expert, a total of 7750 scores, attempts from gen./imp. users are 2800/4950  For LDA & DSS training, divide scores into 4 subsets, with 700 gen. and 1238 imp. each 37

Experimental Results: Process  No. of expert pairs: 13,120(41x40x2x4)  For each pair, compute the measures of effectiveness by AUC, EER, d’ and SD index  Combine the pairs using 4 combination rules, then compute related values of AUC and EER to show the performance  Use a graphical representation of the results of the experiments 38

Experimental Results: AUC δ plotted against AUC 39

Experimental Results: AUC δ plotted against AUC 40

Experimental Results: AUC δ plotted against AUC  According to graphs, AUC δ isn’t useful because no clear relationship with AUC of combination rules  High AUC δ attains high AUC, but lower AUC δ gets value in wide range  High AUC δ relates to high performance and similar behavior experts pair  Mean rule has best AUC δ 41

Experimental Results: AUC δ plotted against EER 42

Experimental Results: AUC δ plotted against EER 43

Experimental Results: AUC δ plotted against EER  AUC δ is uncorrelated with the EER too  Any value of AUC δ, the EER spans over a wide range of values  Can not predict the performance of the combination in terms of EER by AUC δ 44

Experimental Results: EER δ plotted against AUC 45

Experimental Results: EER δ plotted against AUC 46

Experimental Results: EER δ plotted against AUC  Behavior better than AUC δ, but still no clear relationship between EER δ and AUC  Mean rules has best result too 47

Experimental Results: EER δ plotted against EER 48

Experimental Results: EER δ plotted against EER 49

Experimental Results: EER δ plotted against EER  No correlation between EER δ and EER  Graphs from AUC δ against EER and EER δ against EER have similar results  So AUC and EER are not suitable to evaluate combination of experts, despite that they are widely used for unimodal biometric system 50

Experimental Results: D’ δ plotted against AUC 51

Experimental Results: D’ δ plotted against AUC 52

Experimental Results: D’ δ plotted against AUC  Higher values of D' δ guarantee smaller ranges of values of the performance of the combination  D' δ has higher and clearer correlation with performance of combination  Mean rule gets best result, and product rule is the worst 53

Experimental Results: D’ δ plotted against EER 54

Experimental Results: D’ δ plotted against EER 55

Experimental Results: D’ δ plotted against EER  D' δ has better correlation with EER too  D' δ is much better than AUC δ and EER δ  D' δ is a good measure to evaluate the effectiveness of candidate ensembles of biometric experts 56

Experimental Results: SD δ plotted against AUC 57

Experimental Results: SD δ plotted against AUC 58

Experimental Results: SD δ plotted against AUC  SD δ does have some correlation with AUC because SD is designed to predict max improvement in AUC by combining experts, but is still not clear enough  Small SD δ s guarantee large performance, especially for high performance experts pair, because higher the AUC of the individual experts, the smaller the complementarity 59

Experimental Results: SD δ plotted against EER 60

Experimental Results: SD δ plotted against EER 61

Experimental Results: SD δ plotted against EER  SDδ with EER isn’t as good as AUC  Result from product rule is still no good 62

CONCLUSION

Conclusion  To predict performance improvement, product rule exhibit worst, mean rule is best, and LDA & DSS not far from mean rule  Under mean rule, LDA & DSS have similar results  Performance of combined experts is not highly correlated with single one in general 64

Conclusion  The best measure of ensemble is D' δ, while AUC δ and ERR δ isn’t good enough, and SD δ performs like AUC δ  Based on above results, D' δ with mean rule tops any other pairs of measure and combination rule, and is the most suitable method to be the measure of ensemble effeectiveness 65

THANKS FOR LISTENING! It’s Q&A time!