Probabilistic genotyping

Slides:



Advertisements
Similar presentations
Overcoming DNA Stochastic Effects 2010 NEAFS & NEDIAI Meeting November, 2010 Manchester, VT Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA Cybergenetics.
Advertisements

Forensic DNA Inference ICFIS 2008 Lausanne, Switzerland Mark W Perlin, PhD, MD, PhD Joseph B Kadane, PhD Robin W Cotton, PhD Cybergenetics ©
DNA Mixture Interpretations and Statistics – To Include or Exclude Cybergenetics © Prescription for Criminal Justice Forensics ABA Criminal Justice.
Creating informative DNA libraries using computer reinterpretation of existing data Northeastern Association of Forensic Scientists November, 2011 Newport,
Attaching statistical weight to DNA test results 1.Single source samples 2.Relatives 3.Substructure 4.Error rates 5.Mixtures/allelic drop out 6.Database.
Design of Experiments Lecture I
Deconvoluting Mixtures Using Proportional Allele Sharing What does it mean and how do you do it?
Lecture 12: Autosomal STR DNA Profiling
TrueAllele ® Casework Validation on PowerPlex ® 21 Mixture Data Australian and New Zealand Forensic Science Society September, 2014 Adelaide, South Australia.
Using TrueAllele ® Casework to Separate DNA Mixtures of Relatives California Association of Criminalists October, 2014 San Francisco, CA Jennifer Hornyak,
Kern Regional Crime Laboratory Laboratory Director: Dr. Kevin W. P. Miller TRUEALLELE® WORK AND WORKFLOW: KERN COUNTY’S FIRST CASES APRIL 23, 2014.
Expert Systems for Automated STR Analysis SWGDAM Quantico, VA Mark W. Perlin January, 2003.
Statistical weights of mixed DNA profiles Forensic Bioinformatics ( Dan E. Krane, Wright State University, Dayton, OH Forensic DNA.
2 Person Mixture #2 Vaginal swab of Victim. Case Scenario Assault occurred in dorm room Suspect says it was consensual No other parties heard or saw anything.
More informative DNA identification: Computer reinterpretation of existing data Ria David, PhD Cybergenetics, Pittsburgh, PA Cybergenetics ©
Computer Interpretation of Uncertain DNA Evidence National Institute of Justice Computer v. Human June, 2011 Arlington, VA Mark W Perlin, PhD, MD, PhD.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
You don’t know what you don’t know But does it matter? Or is everything inconclusive?
How TrueAllele ® Works (Part 2) Degraded DNA and Allele Dropout Cybergenetics Webinar November, 2014 Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh,
TrueAllele ® Genetic Calculator: Implementation in the NYSP Crime Laboratory NYS DNA Subcommittee May 19, 2010 Barry Duceman, Ph.D New York State Police.
Separating Familial Mixtures, One Genotype at a Time Northeastern Association of Forensic Scientists November, 2014 Hershey, PA Ria David, PhD, Martin.
Artifacts and noise in DNA profiling Forensic Bioinformatics ( Dan E. Krane, Wright State University, Dayton, OH Forensic DNA Profiling.
Cybergenetics Webinar January, 2015 Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA Cybergenetics © How TrueAllele ® Works (Part 4)
Unleashing Forensic DNA through Computer Intelligence Forensics Europe Expo Forensic Innovation Conference April, 2013 London, UK Mark W Perlin, PhD, MD,
Rapid DNA Response: On the Wings of TrueAllele Mid-Atlantic Association of Forensic Scientists May, 2015 Cambridge, Maryland Martin Bowkley, Matthew Legler,
Getting Past First Bayes with DNA Mixtures American Academy of Forensic Sciences February, 2014 Seattle, WA Mark W Perlin, PhD, MD, PhD Cybergenetics,
Implications of database searches for DNA profiling statistics Forensic Bioinformatics ( Dan E. Krane, Wright State University, Dayton,
Virginia TrueAllele ® Validation Study: Casework Comparison Presented at AAFS, February, 2013 Published in PLOS ONE, March, 2014 Mark W Perlin, PhD, MD,
Murder in McKeesport October 25, 2008 Tamir Thomas.
Open Access DNA Database Duquesne University March, 2013 Pittsburgh, PA Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA Cybergenetics ©
Exploring Forensic Scenarios with TrueAllele ® Mixture Automation 59th Annual Meeting American Academy of Forensic Sciences February, 2007 Mark W Perlin,
Objective DNA Mixture Information in the Courtroom: Relevance, Reliability & Acceptance NIST International Symposium on Forensic Science Error Management:
Simple Reporting of Complex DNA Evidence: Automated Computer Interpretation Promega 14th International Symposium on Human Identification Pointe Hilton.
What can go wrong with DNA profiling Dan E. Krane, Wright State University, Dayton, OH Forensic DNA Profiling Video Series Forensic Bioinformatics (
Observer effects in DNA profiling Dan E. Krane, Wright State University, Dayton, OH Forensic DNA Profiling Video Series Forensic Bioinformatics (
Data summary – “alleles” Threshold Over threshold, peaks are labeled as allele events All-or-none allele peaks, each given equal status Allele Pair 8,
Disputed DNA Stats for a Low-level Sample: A Case Study By Dan Krane – Carrie Rowland –
Seventh Annual Prescriptions for Criminal Justice Forensics Program Fordham University School of Law June 3, 2016 DNA Panel.
Black Boxes and Due Process: Transparency in Expert Software Systems AAFS 2016 By Nathan Adams – Dan Krane –
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Four person DNA mixture
DNA technology in court
DNA: TrueAllele® Statistical Analysis, Probabilistic Genotyping
A Match Likelihood Ratio for DNA Comparison
Statistical Weights of DNA Profiles
Forensic Stasis in a World of Flux
Overcoming Bias in DNA Mixture Interpretation
Validating TrueAllele® genotyping on ten contributor DNA mixtures
How to Defend Yourself Against DNA Mixtures
Statistical Data Analysis
Explaining the Likelihood Ratio in DNA Mixture Interpretation
DNA identification pathway
Distorting DNA evidence: methods of math distraction
On the threshold of injustice: manipulating DNA evidence
Suffolk County TrueAllele® Validation
Solving Crimes using MCMC to Analyze Previously Unusable DNA Evidence
Investigative DNA Databases that Preserve Identification Information
DNA Identification: Stochastic Effects
Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA, USA
Forensic match information: exact calculation and applications
severed carotid artery
DNA identification pathway
Statistical Data Analysis
The Triumph and Tragedy of DNA Evidence
DNA Identification: Mixture Interpretation
Exonerating the Innocent with Probabilistic Genotyping
Testifying about probabilistic genotyping results
David W. Bauer1, PhD Nasir Butt2, PhD Jeffrey Oblock2
Using probabilistic genotyping to distinguish family members
Presentation transcript:

Probabilistic genotyping Dan E. Krane, Wright State University, Dayton, OH Forensic DNA Profiling Video Series Forensic Bioinformatics (www.bioforensics.com)

Do these profiles match? But ambiguities can arise… Evidence

Why has this become an issue? More challenging evidence samples Touch DNA Guns, steering wheels, doorknobs, etc. Resulting DNA profiles often: Small amounts of DNA Complex mixtures (3 or more persons) Degradation (differential degradation) Minor components in major/minor mixtures Stochastic effects! Existing test kits were not designed to test these kinds of samples Existing statistical methods used in the US cannot simultaneously handle drop-out and an unknown number of contributors

Stochastic: From greek στόχος (stokhos) “aim” or “guess” Having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely When a system's outcome is determined not just by the predictable performance characteristics of the system, but by random elements as well.

The four stochastic effects 1 2 3 4 Peak height imbalance Increased stutter Drop-out Drop-in No stochastic effects The danger of dropout How do you know when you’ve got something missing? How do you know what it is that you’ve not got? Answer = threshold

The stochastic threshold The amount of template DNA where random factors influence test results as much as the actual template. Exaggerated peak height imbalance Exaggerated stutter Allelic drop-in Allelic drop-out • Sampling error is at the heart of it all

STR Kit Amplification with conventional SOP and with LCN protocol Input DNA Data from Debbie Hobson (FBI) – LCN Workshop AAFS 2003 SOP 1ng PHR = 87% PHR = 50% 50 µL PCR Allele Drop Out LCN Allele Drop In Peak Height Imbalance 8pg 5 µL PCR

How have labs dealt with low levels of DNA? RULES and THRESHOLDS Based on VALIDATION STUDIES (experiments) Developmental validation (Manufacturer) Internal validation (Crime Lab) Documented in INTERPRETATION GUIDELINES Specific to Crime Lab Specific to test platform (test kit, instrumentation, etc.)

Analytical and Stochastic Thresholds Drop-out possible? Set at 200 RFU Detection threshold Real or noise? Set at 50 RFU

Peak Height Ratios and Stutter Height of lower peak divided by higher peak as percentage = PHR Peak Height Ratio Height of -4 peak compared to height of parent peak If -4 peak exceeds a certain value then it is considered a real allele -4 peak

Importance of Interpretation Guidelines Labs are required to establish thresholds and rules based on validation research in order to be accredited The values for these thresholds may differ significantly from one lab to another Even for the same test kit and instrument platform Labs are expected to follow their Interpretation Guidelines to the letter Departures from a lab’s Interpretative Guidelines is typically a fruitful area of cross-examination

LCN statistics No generally accepted method for attaching weight to mixed samples with an unknown number of contributors where dropout may have occurred. No stats = not admissible.

Likelihood ratios (LRs) Compares two alternative hypothesis “Prosecution” explanation Hp (or H1) “Defense” explanation Hd (or H2) LRs are better able to deal with continuous data Enables scientist to model stochastic effects and complex mixtures Complicated – need computer assistance Track record: Widely used in UK, Europe, Australia & New Zealand Not much in US (other than Paternity Index)

Prosecution explanation of the DNA Defense explanation of the DNA DNA evidence is: A mixture of two persons consisting of victim and defendant Pr(E|Hp) Likelihood ratio = Pr(E|Hd) DNA evidence is: A mixture of two persons consisting of victim and an unknown person Defense explanation of the DNA

Support for PROSECUTION explanation Defense explanation of the DNA 1 10 0.1 100 0.01 1,000 0.001 10,000 0.0001 100,000 0.00001 <0.000001 1,000,000+ “VERY STRONG” Support for PROSECUTION explanation Defense explanation of the DNA Prosecution explanation of the DNA

INCONCLUSIVE PROSECUTION DEFENSE 1 10 0.1 100 0.01 1,000 0.001 10,000 0.0001 100,000 0.00001 <0.000001 1,000,000+ Evidence Genotype Population Genotype INCONCLUSIVE

Who stole my biscuit? PROSECUTION DEFENSE 1 10 0.1 100 0.01 1,000 0.001 10,000 0.0001 100,000 0.00001 <0.000001 1,000,000+ Evidence Genotype Population Genotype Who stole my biscuit?

Some DNA profiles can be interpreted confidently What features make you confident? Peak heights and shapes Number of alleles Peak height balance Trend in peak heights Baseline noise levels Stutter peaks What else? 18

But ambiguities can arise… What can be done with difficult samples? But ambiguities can arise… Evidence

Software Models Lab Retriever (Rudin et.al.) LRmix Studio (Haned et.al.) Forensic Statistical Tool (OCME NY) LikeLTD (Balding) SEMI-CONTINUOUS MODELS Do NOT take peak height into account ArmedXpert (Niche Vision) DNA View (Brenner) STRMix (Buckleton et.al.) TrueAllele (Perlin) CONTINUOUS MODELS Take peak height into account

STRMix and TrueAllele use MCMC Never give the same numerical answer twice Because of MCMC Run very same data twice – get different LRs LR is 2.1523 x 1014 (215 trillion) LR is 2.0499 x 1014 (204 trillion)

Where do things stand? President’s Council of Advisors on Science and Technology (PCAST) 2016 report “It is often impossible to tell with certainty which alleles are present in a mixture or how many separate individuals contributed to the mixture, let alone accurately infer the DNA profile of each individual.” “Objective analysis of complex DNA mixtures with probabilistic genotyping software is a relatively new and promising approach.” On September 20, 2016 the President’s Council of Advisors on Science and Technology (PCAST) released a report to the President of the United States addressing Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods. One goal of the “PCAST report” was to determine if there were additional steps that could be taken that could help ensure the validity of forensic evidence used in the Nation’s legal system. One issue that received careful consideration was the interpretation of complex-mixed DNA samples. In section 5.1 (pg 75) of the report, the PCAST notes that “DNA analysis of complex mixtures – defined as mixtures with more than two contributors – is inherently difficult and even more for small amounts of DNA (SWGDAM). Such samples result in a DNA profile that superimposes multiple individual DNA profiles. Interpreting a mixed profile is different for multiple reasons: each individual may contribute two, one or zero alleles at each locus; the alleles may overlap with one another; the peak heights may differ considerably, owing to differences in the amount and state of preservation of the DNA from each source; and the “stutter peaks” that surround alleles (common artifacts of the DNA amplification process) can obscure alleles that are present or suggest alleles that are not present (Butler, J.M., 2015). It is often impossible to tell with certainty which alleles are present in a mixture or how many separate individuals contributed to the mixture, let alone accurately to infer the DNA profile of each individual (Thompson, W.C., 2009).”

Where do things stand? President’s Council of Advisors on Science and Technology (PCAST) 2016 report “At present, published evidence supports the foundation validity of analysis, with some programs, of DNA mixtures of 3 individuals in which the minor contributor constitutes at least 20 percent of the intact DNA in the mixture and in which the DNA amount exceeded the minimum required level for the method.” These difficulties surrounding the interpretation and significance of complex mixtures have led to the development of “probabilistic genotyping” computer programs that apply various algorithms to interpret these mixtures. OCME’s FST can be considered a probabilistic genotyping program. While several probabilistic genotyping programs appear to show promise, the PCAST report notes that “Objective analysis of complex DNA mixtures with probabilistic genotyping software is a relatively new and promising approach. Empirical evidence is required to establish the foundational validity of each such method within specified ranges. At present, published evidence supports the foundation validity of analysis, with some programs, of DNA mixtures of 3 individuals in which the minor contributor constitutes at least 20 percent of the intact DNA in the mixture and in which the DNA amount exceeded the minimum required level for the method. The range in which foundational validity has been established is likely to grow as adequate evidence for more complex mixtures is obtained and published” (pg 82).

Challenges about black boxes Black box: “A device which performs intricate functions but whose internal mechanisms may not readily be inspected or understood.” Conflict between protection of intellectual property and the constitutional right to confront an opposing witness. Steele, Christopher D., and David J. Balding. "Statistical evaluation of forensic DNA profile evidence." Annual Review of Statistics and Its Application 1 (2014): 361-384.

Probabilistic genotyping Dan E. Krane, Wright State University, Dayton, OH Forensic DNA Profiling Video Series Forensic Bioinformatics (www.bioforensics.com)

Post-test on “Probabilistic genotyping” Why can’t random match probability (RMP) statistics be used for samples with an unknown number of contributors? Why can’t combined probability of inclusion statistics be used for samples where drop-out may have occurred? How do you convert an RMP statistic to a likelihood ratio (LR)? What features of an electropherogram do probabilistic genotyping approaches consider? For what kinds of results have probabilistic genotyping approaches been foundationally validated for use?