Pushing Out the Frontiers of Forensic Science Application of Chemometrics and Advanced Pattern Recognition to Trace Evidence Analysis Pushing Out the Frontiers of Forensic Science
Outline Admissibility of Scientific Evidence is a problem! Frye and the Daubert Standards How chemistry, math and computers can help forensic science Current Projects At John Jay: Gasoline Tool Marks Footwear Questioned Documents
Admissibility of scientific evidence! Principal legal standards: Frye and Daubert Frye (1923) – Testimony offered as “scientific” must “...have gained general acceptance in the particular field in which it belongs”. North Carolina and New York are still a “Frye States”
Frye and Daubert Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is reliable Has empirical testing been done? Falsifiability Has the science been subject to peer review? Are there known error rates? Is there general acceptance? Federal Government and 26(-ish) States are “Daubert States”
Raising Standards with Data and Statistics DNA profiling the most successful application of statistics in forensic science. Responsible for current interest in “raising standards” of other branches in forensics. No protocols for the application of statistics to physical evidence. Our goal: application of objective, numerical computational pattern comparison to physical evidence
What Statistics Can Be Used? Statistical pattern comparison! Modern algorithms are called machine learning Idea is to measure features of the physical evidence that characterize it Train algorithm to recognize “major” differences between groups of features while taking into account natural variation and measurement error.
Statistical Methods Principal Component Analysis (PCA) Method minimizes the dimensionality of the data by discounting variables with minimal contributions to the overall spread of the data. 3D PCA 2D PCA Raw Data
Canonical Variate Analysis (CVA) Maximize the difference between groups by exploiting the inter and intra group variance in determining clustering. 2D PCA 2D CVA Raw Data
For Classifying Data: Neural Networks Support Vector Machines “Train” the decision rules: Support Vector Machines “Train” the decision rules: 0 ≤ li ≤ C i = {1, …, n}
Error Rates: Hold-Out Bootstrap Use a big test set with trained algorithm Hold-one-out most common Hold-none-out generates apparent error rate. Biased and overly optimistic Bootstrap Sample from data set many times (n-bootstrapped samples) Train algorithm n times with n-bootstrapped samples Use each set of bootstrap trained decision rules on entire data set Average the n-error rates together - bootstrap error rate
Conformal Prediction Theory New, but has roots in 1960’s with Kolmogorov’s ideas on randomness and algorithmic complexity. Can be used with any statistical pattern classification algorithm. Independent of data’s underlying probability distribution. This is a very important property for forensic tool mark analysis!! For identification of patterns, method produces Level of confidence, 1-ε Measure of how likely identification is to be correct Results are valid:
Fire Debris Analysis Casework Liquid gasoline samples recovered during investigation: Unknown history Subjected to various real world conditions. If an individual sample can be discriminated from the larger group, this can be of forensic interest. Gas-Chromatography Commonly Used to ID gas. Peak comparisons of chromatograms difficult and time consuming. Does “eye-balling” satisfy Daubert, or even Frye .....????
Study Design This study was undertaken to examine the variability of gasoline components in Twenty liquid gasoline samples Samples from fire investigations in the New York City area All samples analyzed using Gas Chromatography-Mass Spectrometry Keto and Wineman target compounds Fifteen peaks were chosen in this study that represented the common components present in gasoline. Normalized GC-MS peak areas were utilized to test the discrimination potential of multiple multivariate methods for discrimination.
Chosen Peaks
2D PCA 97.3% variance retained Avg. LDA HOO correct classification rate: 83%
3D PCA 98.7% variance retained Avg. LDA HOO correct classification rate: 88%
2D CVA Avg. LDA HOO correct classification rate: 92%
3D CVA Avg. correct classification rate: 100%.
G. Petillo 5/8” Consecutively manufactured chisels Known Match Comparisons G. Petillo
Current Approach For Striated Tool Marks Obtain striation pattern profiles form 3D confocal microscopy
Glock 19 firing pin impression Primer shear Glock 19 firing pin impression
3D confocal image of entire shear pattern
Shear marks on primer of two different Glock 19s
Mean “waviness” profile: Mean total profile: Mean “waviness” profile: Mean “roughness” profile:
3D PCA-SVM Bootstrap error rate ~1%:
Accidental Patterns on Footwear Shoe prints contain marks and patterns due to various circumstances that can be used to distinguish one shoe print from another. How reliable are the accidental patterns for identifying particular shoes?
Facial Recognition Approach to Accidental Pattern Identification 3D PCA 59.7% of variance
Questioned Documents: Photocopier Identification Mordente, Gestring, Tytell Photocopy of a blank sheet of paper
Just Getting Started: Things to Come Dust Soil Wrenches Chisels Hammers Tire Tracks Hair Blood Spatter Gun Shot Residue
Acknowledgements National Institute of Justice New York City Police Department Crime Lab John Jay College of Criminal Justice Research Team: Mr. Peter Diaczuk Ms. Carol Gambino Dr. James Hamby Dr. Thomas Kubic Off. Patrick McLaughlin Mr. Jerry Petillo Mr. Nicholas Petraco Dr. Peter A. Pizzola Dr. Graham Rankin Dr. Jacqueline Speir Dr. Peter Shenkin Mr. Peter Tytell Helen Chan Manny Chaparro Aurora Ghita Eric Gosslin Frani Kammerman Brooke Kammrath Loretta Kuo Dale Purcel Stephanie Pollut Rebecca Smith Elizabeth Willie Chris Singh Melodie Yu