DNA LiRa 2.0 R. Puch-Solis, M. Barron, R. Young, H. Tazeem PROTECT DNA LiRa 2.0 R. Puch-Solis, M. Barron, R. Young, H. Tazeem T. Clayton, J. Thomsom 9th November 2016 PROTECT
The Team R. Puch-Solis Statistician M. Barron PROTECT The Team R. Puch-Solis Statistician M. Barron R. Young Software Developers H. Tazeem T. Clayton DNA Interpretation Lead J. Thomson DNA Technical Lead T. Gravesen Statistician (Collaborator) PROTECT
Overview 1. Statistical Evaluation Requirement PROTECT Overview 1. Statistical Evaluation Requirement 2. Software architecture 3. Bayesian Networks 4. Gamma models 5. Preliminary result 6. Further work 7. Conclusion PROTECT
Requirement 1. Up to four person mixtures 2. Low level DNA PROTECT Requirement 1. Up to four person mixtures 2. Low level DNA 3. Multiple profiles (replicates) 4. Dropin 5. Dropout 6. Stutters 7. About 100 users in several sites 8. About ten thousand cases per year 9. LRs and Mixture Deconvolution PROTECT
Software Architecture PROTECT Software Architecture Webpage Calculation Servers Scalable Calculations Database ··· Profiles Database Accessible from any LGC site and externally through VPN PROTECT
Allele Height + All Stutter Heights PROTECT Stutters Allele Back Stutter One STR less Double Back Stutter Two STR less Forward Stutter One STR more Stutter 15 16 17 18 D3 0.007 0.083 0.006 D22 0.009 0.107 0.057 Back Stutter Proportion = Back Stutter Height Allele Height + All Stutter Heights PROTECT
Dropin peaks can be at the same height as stutters PROTECT Stutters – D22 Dropin peaks can be at the same height as stutters PROTECT
ω, parameters Main task 8 Stain Profile Putative Genotype Mixing Proportion Other Parameters 8
PROTECT Gamma Dist. per Peak The probability density of a stain profile is obtained by multiplying the length of each read line and the area of the blue triangle PROTECT
Extending the methodology PROTECT Extending the methodology Number of Parameters Question Two issues Model all stutters while using this methodology Size of Conditional Probability Tables PROTECT
Genotypes of Contributor 1 PROTECT Genotypes of Contributor 1 Peak heights Genotypes of Contributor 2 PROTECT
Size of Conditional Probability Tables PROTECT Size of Conditional Probability Tables O3 conditional probability table is of size 36 × 2 = 1,458 n1,3 n1,4 O3 n2,3 n2,4 n3,3 n3,4 PROTECT
Size of Conditional Probability Tables PROTECT Size of Conditional Probability Tables O3 conditional probability table is of size 312 × 2 = 1,062,882 n1,2 n1,3 n1,4 n1,5 O3 n2,2 n2,3 n2,4 n2,5 n3,2 n3,3 n3,4 n3,5 Clique tables in a junction tree becomes too large! Go Back to Standard method: List Genotypes PROTECT
To extend the methodology PROTECT Number of parameters Stutter Proportion Scale Par. Gamma Dist. Shape Parameter Height of an Allele To extend the methodology It is not possible to estimate from stain profiles (112 parameters) Estimate from profiles of known origin (dilution series) Following Puch-Solis et al. (2013) Evaluating forensic DNA profiles using peak heights, allowing for multiple donors, allelic dropout and stutters PROTECT
Estimating Parameters D3 95% Prob. Int. 99% Prob. Int. Back Stutter Height DNA Qty Proxy Gamma regression through the origin All stutters and alleles
Preliminary Result Promega ESI 17 Profile (16 loci + Amelogenin) Target Mixing Proportion: (0.8,0.2) Estimated: (0.805,0.195) Correct Genotype Pair In position 1 in 11 loci In position 2 in 4 loci In position 3 in 1 locus The overall testing will contain hundreds of single profiles and 2,3 and four person mixtures. However ...
Mixture of Distributions D2S1 Back Stutter Height DNA Qty Proxy Allele 16 Allele 25 Allele Dependence
Back Stutter Proportion Allele Effect D2S1 Back Stutter Proportion Allele
Back Stutter Proportion Motif Effect D2S1 Length of LUS Back Stutter Proportion LUS: Longest Uninterrupted Sequence Brookes et al. (2012) Bright, Curran, Buckleton (2014). Modelling PowerPlex Y stutters and Artifacts
Source: Gettings et al. (2016), STRBase Motifs D2S1 Allele Motif LLUS 17 [TGCC]4[TTCC]13 [TGCC]5[TTCC]12 [TGCC]6[TTCC]11 13 12 11 20 [TGCC]6[TTCC]14 [TGCC]7[TCCC][TTCC]12 [TGCC]7[TTCC]10[GTCC][TTCC]2 [TGCC]7[TTCC]13 [TGCC]7[TTCC]2[TTTC][TTCC]10 [TGCC]8[TTCC]12 14 10 23 [TGCC]6[TTCC]14[GTCC][TTCC]2 [TGCC]7[TTCC]13[GTCC][TTCC]2 [TGCC]7[TTCC]16 [TGCC]9[TTCC]14 16 Source: Gettings et al. (2016), STRBase
Based on Estimation of Stut. Prop. as a function of LLUS PROTECT D3, Allele 15 DNA Qty Proxy Back Stutter Height 95% Prob. Int. 99% Prob. Int. Based on Estimation of Stut. Prop. as a function of LLUS Higher coefficient of variation at low levels 21 PROTECT
Mixture of Gammas If an allele has two motifs with different length of LUS, the pdf of a stutter: Gamma pdf for motif 1 Gamma pdf for motif 2 Prevalence of motif 1 in a population Prevalence of motif 2 in a population The addition of two mixed distribution is a mixed distribution
Extraction Negative profiles PROTECT Dropin Dropin Heights pdf Extraction Negative profiles PROTECT
Dropin Stain Profile Putative donor is 10,11 PROTECT Dropin Stain Profile Putative donor is 10,11 Peak 13 explain as a dropin Probability density of the stain profile is multiplied by: 1. Probability of a dropin (about 0.02) 2. Probability that the dropin is allele 13 3. Probability density of the dropin height PROTECT
Thank you for your kind attention. PROTECT Conclusions 1. Model is currently being implemented in LiRa 2.0 2. Method also includes uncertain peak (other artefacts) 3. Extensive validation using hundreds of mixtures of known origin 4. It is a collage of several models. Paper in preparation. Thank you for your kind attention. PROTECT
PROTECT References Gettings et al. (2016) Sequence variation of 22 autosomal STR loci detected by next generation sequencing, Forensic Sci Int Genet. 21 Graversen & Lauritzen (2015). Computational Aspects of DNA Mixture Analysis – Exact Inferece Using Auxiliary Variables in a Bayesian Network. Stat Comput. 25, pp 527-541 Puch-Solis (2014). A dropin peak height model. Forensic Sci Int Genet. 11, pp 80-84 Brookes et al. (2012). Characterising stutter in forensic STR multiplexes. Forensic Sci Int Genet. 6, pp 58-63. Bright et al. (2014). Modelling PowerPlex Y Stutters and Artifacts. Forensic Sci Int Genet. 11, pp 126-136. Puch-Solis et al. (2013). Evaluating forensic DNA profiles using peak heights, allowing for multiple donors, allelic dropout and stutters. Forensic Sci Int Genet. 7, pp 555-563. PROTECT