DNA LiRa 2.0 R. Puch-Solis, M. Barron, R. Young, H. Tazeem

Slides:



Advertisements
Similar presentations
DNA Identification: Mixture Weight & Inference
Advertisements

Overcoming DNA Stochastic Effects 2010 NEAFS & NEDIAI Meeting November, 2010 Manchester, VT Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA Cybergenetics.
Creating informative DNA libraries using computer reinterpretation of existing data Northeastern Association of Forensic Scientists November, 2011 Newport,
Brief History of Forensic DNA Typing
Forensic DNA Analysis (Part II)
Sampling distributions of alleles under models of neutral evolution.
Lecture 12: Autosomal STR DNA Profiling
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
IGES 2003 How many markers are necessary to infer correct familial relationships in follow-up studies? Silvano Presciuttini 1,3, Chiara Toni 2, Fabio Marroni.
Participants Dept. of Mathematical Sciences, Aalborg University: E.Susanne Christensen, Susanne G. Bøttcher Dept. of Forensic Genetics, University of Copenhagen:
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
TrueAllele ® Casework Validation on PowerPlex ® 21 Mixture Data Australian and New Zealand Forensic Science Society September, 2014 Adelaide, South Australia.
Using TrueAllele ® Casework to Separate DNA Mixtures of Relatives California Association of Criminalists October, 2014 San Francisco, CA Jennifer Hornyak,
Chapter 6 Biology of STRs: Stutter Products, Non-template Addition, Microvariants, Null Alleles, and Mutation Rates ©2002 Academic Press.
Expert Systems for Automated STR Analysis SWGDAM Quantico, VA Mark W. Perlin January, 2003.
Statistical weights of mixed DNA profiles Forensic Bioinformatics ( Dan E. Krane, Wright State University, Dayton, OH Forensic DNA.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Computer Interpretation of Uncertain DNA Evidence National Institute of Justice Computer v. Human June, 2011 Arlington, VA Mark W Perlin, PhD, MD, PhD.
DNA evidence The DNA Double Helix Consists of so-called nucleobases always in pairs A-T, C-G. One part of the pair is inherited from the mother, the other.
Forensic Biology by Richard Li
Chapter : DQA1/PM Chapter 18: Autosomal STR Profiling.
Automated STR Data Analysis: Validation Studies Automated Analysis Databasing Validation Casework Studies Mark W. Perlin (Cybergenetics, Pittsburgh, PA)
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
TrueAllele ® Genetic Calculator: Implementation in the NYSP Crime Laboratory NYS DNA Subcommittee May 19, 2010 Barry Duceman, Ph.D New York State Police.
Commonly Used Short Tandem Repeat Markers
Chapter 7 Forensic Issues: Degraded DNA, PCR Inhibition, Contamination, and Mixed Samples ©2002 Academic Press.
Separating Familial Mixtures, One Genotype at a Time Northeastern Association of Forensic Scientists November, 2014 Hershey, PA Ria David, PhD, Martin.
Cybergenetics Webinar January, 2015 Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA Cybergenetics © How TrueAllele ® Works (Part 4)
An Expert System for Scoring DNA Database Profiles Dr. Mark W. Perlin Cybergenetics Pittsburgh, PA.
Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats.
Murder in McKeesport October 25, 2008 Tamir Thomas.
Jack Ballantyne and Mark Perlin International Conference on Inference and Statistics, July , Seattle, WA.
Exploring Forensic Scenarios with TrueAllele ® Mixture Automation 59th Annual Meeting American Academy of Forensic Sciences February, 2007 Mark W Perlin,
Simple Reporting of Complex DNA Evidence: Automated Computer Interpretation Promega 14th International Symposium on Human Identification Pointe Hilton.
Seventh Annual Prescriptions for Criminal Justice Forensics Program Fordham University School of Law June 3, 2016 DNA Panel.
Three generations of DNA testing
Statistical Analysis of DNA
Computer aided teaching of statistics: advantages and disadvantages
A Match Likelihood Ratio for DNA Comparison
Statistical Weights of DNA Profiles
Validating TrueAllele® genotyping on ten contributor DNA mixtures
Statistics 200 Objectives:
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Distorting DNA evidence: methods of math distraction
On the threshold of injustice: manipulating DNA evidence
DNA-Based Pedigree Analysis of Chinook Salmon from the Yakima River
Limitations of Hierarchical and Mixture Model Comparisons
Forensic Biology by Richard Li
Solving Crimes using MCMC to Analyze Previously Unusable DNA Evidence
Predict Failures with Developer Networks and Social Network Analysis
A priori probabilities in Y23 mixture analysis: Non-contributor experiments using simulated powerplex® Y23 Y-STR mixtures  D. Moore, T. Clayton, J. Thomson 
DNA Identification: Stochastic Effects
Forensic DNA Analysis.
Michael Cullen, Stephen P
NIST validation studies on the 3500 Genetic Analyzer
Forensic significance and Population structure based on the 11-loci SWGDAM recommended Y-STR haplotypes in some Nigerian Population.
M. Omedei, S. Gino, S. Inturri, S. Pasino, C. Robino 
C. Davies, J. Thomson, F. Kennedy 
Thomas Willems, Melissa Gymrek, G
DNA Fingerprinting and Forensic Analysis
The Triumph and Tragedy of DNA Evidence
Probabilistic Genotyping to the Rescue for Pinkins and Glenn
Oskar Hansson, Peter Gill 
Evaluating DNA profiles using peak heights, allowing for dropin, dropout and stutters  Roberto Puch-Solis  Forensic Science International: Genetics Supplement.
Developmental validation of a fully automated genotyping assay capable of detecting length and sequence variation in the CODIS STR loci  David D. Duncan,
How degraded is our DNA? A review of single source live case work samples with optimal DNA inputs processed with the PowerPlex® ESI17 Fast kit  D. Moore,
DNA Identification: Mixture Interpretation
Testifying about probabilistic genotyping results
Tianjian Zhou U of Chicago/UT Austin
Presentation transcript:

DNA LiRa 2.0 R. Puch-Solis, M. Barron, R. Young, H. Tazeem PROTECT DNA LiRa 2.0 R. Puch-Solis, M. Barron, R. Young, H. Tazeem T. Clayton, J. Thomsom 9th November 2016 PROTECT

The Team R. Puch-Solis Statistician M. Barron PROTECT The Team R. Puch-Solis Statistician M. Barron R. Young Software Developers H. Tazeem T. Clayton DNA Interpretation Lead J. Thomson DNA Technical Lead T. Gravesen Statistician (Collaborator) PROTECT

Overview 1. Statistical Evaluation Requirement PROTECT Overview 1. Statistical Evaluation Requirement 2. Software architecture 3. Bayesian Networks 4. Gamma models 5. Preliminary result 6. Further work 7. Conclusion PROTECT

Requirement 1. Up to four person mixtures 2. Low level DNA PROTECT Requirement 1. Up to four person mixtures 2. Low level DNA 3. Multiple profiles (replicates) 4. Dropin 5. Dropout 6. Stutters 7. About 100 users in several sites 8. About ten thousand cases per year 9. LRs and Mixture Deconvolution PROTECT

Software Architecture PROTECT Software Architecture Webpage Calculation Servers Scalable Calculations Database ··· Profiles Database Accessible from any LGC site and externally through VPN PROTECT

Allele Height + All Stutter Heights PROTECT Stutters Allele Back Stutter One STR less Double Back Stutter Two STR less Forward Stutter One STR more Stutter 15 16 17 18 D3 0.007 0.083 0.006 D22 0.009 0.107 0.057 Back Stutter Proportion = Back Stutter Height Allele Height + All Stutter Heights PROTECT

Dropin peaks can be at the same height as stutters PROTECT Stutters – D22 Dropin peaks can be at the same height as stutters PROTECT

ω, parameters Main task 8 Stain Profile Putative Genotype Mixing Proportion Other Parameters 8

PROTECT Gamma Dist. per Peak The probability density of a stain profile is obtained by multiplying the length of each read line and the area of the blue triangle PROTECT

Extending the methodology PROTECT Extending the methodology Number of Parameters Question Two issues Model all stutters while using this methodology Size of Conditional Probability Tables PROTECT

Genotypes of Contributor 1 PROTECT Genotypes of Contributor 1 Peak heights Genotypes of Contributor 2 PROTECT

Size of Conditional Probability Tables PROTECT Size of Conditional Probability Tables O3 conditional probability table is of size 36 × 2 = 1,458 n1,3 n1,4 O3 n2,3 n2,4 n3,3 n3,4 PROTECT

Size of Conditional Probability Tables PROTECT Size of Conditional Probability Tables O3 conditional probability table is of size 312 × 2 = 1,062,882 n1,2 n1,3 n1,4 n1,5 O3 n2,2 n2,3 n2,4 n2,5 n3,2 n3,3 n3,4 n3,5 Clique tables in a junction tree becomes too large! Go Back to Standard method: List Genotypes PROTECT

To extend the methodology PROTECT Number of parameters Stutter Proportion Scale Par. Gamma Dist. Shape Parameter Height of an Allele To extend the methodology It is not possible to estimate from stain profiles (112 parameters) Estimate from profiles of known origin (dilution series) Following Puch-Solis et al. (2013) Evaluating forensic DNA profiles using peak heights, allowing for multiple donors, allelic dropout and stutters PROTECT

Estimating Parameters D3 95% Prob. Int. 99% Prob. Int. Back Stutter Height DNA Qty Proxy Gamma regression through the origin All stutters and alleles

Preliminary Result Promega ESI 17 Profile (16 loci + Amelogenin) Target Mixing Proportion: (0.8,0.2) Estimated: (0.805,0.195) Correct Genotype Pair In position 1 in 11 loci In position 2 in 4 loci In position 3 in 1 locus The overall testing will contain hundreds of single profiles and 2,3 and four person mixtures. However ...

Mixture of Distributions D2S1 Back Stutter Height DNA Qty Proxy Allele 16 Allele 25 Allele Dependence

Back Stutter Proportion Allele Effect D2S1 Back Stutter Proportion Allele

Back Stutter Proportion Motif Effect D2S1 Length of LUS Back Stutter Proportion LUS: Longest Uninterrupted Sequence Brookes et al. (2012) Bright, Curran, Buckleton (2014). Modelling PowerPlex Y stutters and Artifacts

Source: Gettings et al. (2016), STRBase Motifs D2S1 Allele Motif LLUS 17 [TGCC]4[TTCC]13 [TGCC]5[TTCC]12 [TGCC]6[TTCC]11 13 12 11 20 [TGCC]6[TTCC]14 [TGCC]7[TCCC][TTCC]12 [TGCC]7[TTCC]10[GTCC][TTCC]2 [TGCC]7[TTCC]13 [TGCC]7[TTCC]2[TTTC][TTCC]10 [TGCC]8[TTCC]12 14 10 23 [TGCC]6[TTCC]14[GTCC][TTCC]2 [TGCC]7[TTCC]13[GTCC][TTCC]2 [TGCC]7[TTCC]16 [TGCC]9[TTCC]14 16 Source: Gettings et al. (2016), STRBase

Based on Estimation of Stut. Prop. as a function of LLUS PROTECT D3, Allele 15 DNA Qty Proxy Back Stutter Height 95% Prob. Int. 99% Prob. Int. Based on Estimation of Stut. Prop. as a function of LLUS Higher coefficient of variation at low levels 21 PROTECT

Mixture of Gammas If an allele has two motifs with different length of LUS, the pdf of a stutter: Gamma pdf for motif 1 Gamma pdf for motif 2 Prevalence of motif 1 in a population Prevalence of motif 2 in a population The addition of two mixed distribution is a mixed distribution

Extraction Negative profiles PROTECT Dropin Dropin Heights pdf Extraction Negative profiles PROTECT

Dropin Stain Profile Putative donor is 10,11 PROTECT Dropin Stain Profile Putative donor is 10,11 Peak 13 explain as a dropin Probability density of the stain profile is multiplied by: 1. Probability of a dropin (about 0.02) 2. Probability that the dropin is allele 13 3. Probability density of the dropin height PROTECT

Thank you for your kind attention. PROTECT Conclusions 1. Model is currently being implemented in LiRa 2.0 2. Method also includes uncertain peak (other artefacts) 3. Extensive validation using hundreds of mixtures of known origin 4. It is a collage of several models. Paper in preparation. Thank you for your kind attention. PROTECT

PROTECT References Gettings et al. (2016) Sequence variation of 22 autosomal STR loci detected by next generation sequencing, Forensic Sci Int Genet. 21 Graversen & Lauritzen (2015). Computational Aspects of DNA Mixture Analysis – Exact Inferece Using Auxiliary Variables in a Bayesian Network. Stat Comput. 25, pp 527-541 Puch-Solis (2014). A dropin peak height model. Forensic Sci Int Genet. 11, pp 80-84 Brookes et al. (2012). Characterising stutter in forensic STR multiplexes. Forensic Sci Int Genet. 6, pp 58-63. Bright et al. (2014). Modelling PowerPlex Y Stutters and Artifacts. Forensic Sci Int Genet. 11, pp 126-136. Puch-Solis et al. (2013). Evaluating forensic DNA profiles using peak heights, allowing for multiple donors, allelic dropout and stutters. Forensic Sci Int Genet. 7, pp 555-563. PROTECT