Download presentation
Presentation is loading. Please wait.
Published byIra Dickerson Modified over 9 years ago
1
Frequentist and Bayesian Measures of Association Quality in Algorithmic Toolmark Identification
2
Outline Introduction Details of Our Approach The Data Some alternative (testable!) measures of an association quality Confidence: Vovk et al. Conformal Prediction Believability: Efron 2-groups “Empirical Bayes” Future directions
3
All impressions made by tools and firearms can be represented as numerical patterns – Machine learning trains a computer to recognize patterns Can give “…the quantitative difference between an identification and non-identification” Moran Can yield identification error rate estimates May be even confidence measures for I.D.s…… Background Information
4
Obtain striation/impression patterns from 3D microscopy Store files in ever expanding database (NIST Zheng ) Data files are available to practitioner and researcher community (NIST Zheng ) Data Acquisition
5
9mm-Glock Fired Cartridges
6
2D profiles 3D surfaces (interactive) Screwdriver Striation Patterns in Lead
7
We can simulate profiles as well Baiker Profile Simulator Based on DWT MRA May shed light on the processes which generate the surfaces
8
Working on a simulator for 2D toolmarks: Surface Simulator LH4 HL4 HH4 + Simulate stocastic detail
9
Toolmarks (screwdriver striation profiles) form database Biasotti-Murdock Dictionary For Striated Toolmarks: Consecutive Matching Striae Space CMS-space features
10
Visually explore: 3D PCA of CMS-space for 1740 real and simulated screwdriver striation patterns : ~9% variance retained
11
How good of a “match” is it? Conformal Prediction Vovk Data should be IID but that’s it Cumulative # of Errors Sequence of Unk Obs Vects 80% confidence 20% error Slope = 0.2 95% confidence 5% error Slope = 0.05 99% confidence 1% error Slope = 0.01 Can give a judge or jury an easy to understand measure of reliability of classification result This is an orthodox “frequentist” approach Roots in Algorithmic Information Theory Confidence on a scale of 0%-100% Testable claim: Long run I.D. error- rate should be the chosen significance level
12
How Conformal Prediction works for us Given a “bag” of obs with known identities and one obs of unknown identity Vovk Estimate how “wrong” labelings are for each observation with a non- conformity score (“wrong-iness”) Looking at the “wrong-iness” of known observations in the bag: Does labeling-i for the unknown have an unusual amount of “wrong-iness”??: For us, one-vs-one SVMs: If not: p possible-ID i ≥ chosen level of significance Put ID i in the (1 - )*100% confidence interval
13
Conformal Prediction Theoretical (Long Run) Error Rate: 5% Empirical Error Rate: 5.3% 14D PCA-SVM Decision Model for screwdriver striation patterns For 95%-CPT (PCA-SVM) confidence intervals will not contain the correct I.D. 5% of the time in the long run Straight-forward validation/explanation picture for court
14
An I.D. is output for each questioned toolmark This is a computer “match” What’s the probability the tool is truly the source of the toolmark? Similar problem in genomics for detecting disease from microarray data They use data and Bayes’ theorem to get an estimate How good of a “match” is it? Efron Empirical Bayes’
15
Bayesian Statistics The basic Bayesian philosophy: Prior Knowledge × Data = Updated Knowledge A better understanding of the world Prior × Data = Posterior
16
Empirical Bayes’ From Bayes’ Theorem we can get Efron : Estimated probability of not a true “match” given the algorithms' output z-score associated with its “match” Names: Posterior error probability (PEP) Kall Local false discovery rate (lfdr) Efron Suggested interpretation for casework: = Estimated “believability” that the specific tool produced the toolmark
17
Empirical Bayes’ Bootstrap procedure to get estimate of the KNM distribution of “Platt-scores” Platt,e1071 Use a “Training” set Use this to get p-values/z-values on a “Validation” set Inspired by Storey and Tibshirani’s Null estimation method Storey z-score From fit histogram by Efron’s method get: “mixture” density We can test the fits to and ! What’s the point?? z-density given KNM => Should be Gaussian Estimate of prior for KNM Use SVM to get KM and KNM “Platt-score” distributions Use a “Validation” set
18
Rough Procedure Sample to get a set of IID simulated log(KNM-scores) (“reusing data” less too…??) Compute p-values for the validation set from the fit null Compute KM scores off the validation set
19
Use locfdr locfdr Fit classic Poisson regression for f(z) Use modified locfdr/JAGS JAGS,Plummer or Stan Stan Fit Bayesian hierarchal Poisson regressions z z Fit local-fdr models Check: Is underneath here about N(0,1)?
20
Check Calibration The SVM alg got these Primer shear IDs wrong Do right answers get high “believability” (low pep?) Do wrong answers get low “believability” (high pep?)
21
Posterior Association Probability: Believability Curve 12D PCA-SVM locfdr fit for Glock primer shear patterns +/- 2 standard errors
22
Empirical Bayes’ Model’s use with crime scene “unknowns”: This is the est. post. prob. of no association = 0.00027 = 0.027% Computer outputs “match” for: unknown crime scene toolmarks-with knowns from “Bob the burglar” tools This is an uncertainty in the estimate
23
Future Directions Exploit scientific image processing toolkits openCV, scikit-image, vlfeat (features) openCV, Panorama Tools, Hugin, Montage Image Mosaic (stitching) Better toolmark features CADRE Research Labs doing impressive work Parallel implementation of computationally intensive routines Forensic algorithm architectural review board (ARB) for law applications akin to Khronos group Khronos or Boost.org Boost OpenFMC
24
Acknowledgements Alan Zheng (NIST) Erich Smith (If I tell you where he works, I have to kill you) Ryan Lillian (CADRE) JoAnn Buscaglia (You all know where she works) Professor Chris Saunders (SDSU) Research Team: Ms. Tatiana Batson Dr. Martin Baiker Ms. Julie Cohen Dr. Peter Diaczuk Mr. Antonio Del Valle Ms. Carol Gambino Dr. James Hamby Mr. Nick Natalie Mr. Mike Neel Ms. Alison Hartwell, Esq. Ms. Loretta Kuo Ms. Frani Kammerman Dr. Brooke Kammrath Mr. Chris Lucky Off. Patrick McLaughlin Dr. Linton Mohammed Ms. Diana Paredes Mr. Nicholas Petraco Ms. Stephanie Pollut Dr. Peter Pizzola Dr. Jacqueline Speir Dr. Peter Shenkin Mr. Chris Singh Mr. Peter Tytell Dr. Peter Zoon
25
Website: Data, codes, reprints and preprints: http://jjcweb.jjay.cuny.edu/npetraco/main.html npetraco@gmail.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.