Creating and Maintaining Databases Dr. Pushkin Kachroo
Enrollment Collect Private Information, e.g. fingerprint Follow “enrollment policy” Policy should be: –acceptable to the public –Clear on how, where and when the private info will be used
Enrollment Steps Positive Enrollment: –Trusted Individuals –Enrollment Policy E M –Authentication through: Seed Documents (Birth Cert., passport) –Store machine representation of the enrolled in Verification Database M
Enrollment Steps Negative Enrollment: –Criminal Identification –Enrollment Policy E N –Store machine representation of the enrolled in Screening Database N
General Enrollment Target Population: World W Ground Truth: legacy databases: –Criminal or civil –Can contain Fake and Duplicate Identities
Fake Identity Created Identity –Non-existent person –Biometric screening against criminal databases might catch the “fake” Stolen Identity
The Zoo Sheep: –Real world biometric distinctive and stable Goats: –Difficult to authenticate Lambs: –Enrolled that are easy to imitate (cause passive FA) Wolves: –Good at imitating (cause active FA) Chameleons: –Easy to imitate and are good at imitating
Sample Quality Control Random False Reject/Accept caused by Adverse Signal Acquisition Solution –Better User Interface –Better model probabilistic into feature extraction/matching –Interactively improve input
Quality Control Define “desirable” Quality related to process-ability Quantify quality to decide action based on the level of quality, e.g. present info differently, apply image enhancement etc. Compromise between convenience and quality –Affects FTE, and also FA and FR ROC can be improved by eliminating poor data
ROC-Quality Control FMR (False Match Rate) FNMR (False Non-match Rate) Throw out bad data
Training Like Machine Learning Relate scores to probability that the biometric matches someone or doesn’t Training Testing
Enrollment as System Training Assigning IDs to Subjects Three possibilities –Correct –Someone faking enrolled (duplicate) –Someone faking unenrolled (fake) –P D =Prob(duplicate) –P F =Prob(fake)
Database Integrity How well database reflects the truth data Database duplication: Purge detected duplicates P D =FNMR E X P DEA –Prob of duplicate= Match bet. 2 samples not detected; double enroll P F =FMR E X P IA –Prob of fake enroll= Match bet. 2 samples falsely detected; Impersonation attack
P D -P F FMR (P F… ) FNMR (P D..)
Probabilistic Enrollment Enrollment Process Goal: –Build access control for from that are authorized –Likelihood of d_i given stored token B_i
Probabilistic Enrollment Enrollment Process Goal: –Machine representation of the “real” biometric Assumption about score : likelihood that we have the same subject –True if equivalently –.
Probabilistic Enrollment.. For realistic assumptions we need to model the world Probability can be approximated unrealistically by We need (given biomeric data collected during enrollment, O)
Modeling the World-1 Prior probability that subject d_i is present Prior probability that this observation will occur Modeling numerator on right is a matter of fitting model to data; rest impractical/impossible
Modeling the World-2 Cohorts –Models of most similar subjects World Modeling: –Reduce cohorts to a single model
Modeling the World-3 For Cohort Modeling
Updating Probabilities
Use of Probabilities Accuracy improvements Define measure of biometric integrity Integrity of different biometrics can be combined etc.