DNA Identification: Mixture Weight & Inference Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA TrueAllele® Lectures Fall, 2010 Cybergenetics © 2003-2010 Cybergenetics © 2003-2010
Mixture Weight: Uncertain Quantity Infer mixture weight from STR experiments: • quantitative peak data • contributor genotypes Pr(W=w | data, G1=g1, G2=g2, …) hierarchical Bayesian model Perlin MW, Legler MM, Spencer CE, Smith JL, Allan WP, Belrose JL, Duceman BW. Validating TrueAllele® DNA mixture interpretation. Journal of Forensic Sciences. 2011;56(November):in press. Cybergenetics © 2003-2010
Mixture Weight Model kth contributor experiment data dk,1 dk,2 … dk,N locus weights Wk,1 Wk,2 … Wk,N template weight Wk prior probability W0 Cybergenetics © 2003-2010
Experiment Estimate sum of peak heights from kth contributor wk = from all contributors Cybergenetics © 2003-2010
D16S539 Cybergenetics © 2003-2010
Three Alleles Allele 11 12 13 Quantity 500 6,750 250 11 12 13 Cybergenetics © 2003-2010
Experiment Estimate Allele 11 12 13 Quantity 500 6,750 250 500 + 250 500 + 6,750 + 250 750 11 12 13 = = 10% 7,500 Cybergenetics © 2003-2010
D7S820 Cybergenetics © 2003-2010
Four Alleles Allele 8 10 12 13 Quantity 4,000 250 3,000 8 10 12 13 Cybergenetics © 2003-2010
Experiment Estimate Allele 8 10 12 13 Quantity 4,000 250 3,000 250 + 250 4,000 + 250 + 3,000 + 250 8 10 12 13 500 = = 6.7% 7,500 Cybergenetics © 2003-2010
Overlapping Alleles Cybergenetics © 2003-2010
Template Average mean variance Cybergenetics © 2003-2010
Template Mixture Weight Probability Distribution mean = 6.7% standard deviation = 0.9% Cybergenetics © 2003-2010
Central Limit Theorem • more data experiments for a template provide greater mixture weight precision • double the precision by doing four times the number of experiments • combine evidence from multiple experiments to obtain a more informative result Cybergenetics © 2003-2010
Probability Solution interacting random variables w | d, g1, g2, … g1 | d, g2, w, … g2 | d, g1, w, … zi | d, g1, g2, w, … find probability distributions by iterative sampling Gelfand, A. and Smith, A. (1990). Sampling based approaches to calculating marginal densities. J. American Statist. Assoc., 85:398-409. Cybergenetics © 2003-2010
Markov Chain Monte Carlo
Prior Probability genotype mixture weight DNA quantity variance parameters
Joint Likelihood Function data pattern variation
Posterior Probability genotype mixture weight data variation
Generally Accepted Method mixture weight genotype James Curran. A MCMC method for resolving two person mixtures. Science & Justice. 2008;48(4):168-77. data variation
Hierarchical Bayesian Model with MCMC Solution • standard approach in modern science • describes uncertainty using probability • the "new calculus" • replaces hard calculus with easy computing • can solve virtually any problem • well-suited to interpreting DNA evidence