DNA Identification: Stochastic Effects Mark W Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA TrueAllele® Lectures Fall, 2010 Cybergenetics © 2003-2010 Cybergenetics © 2007-2010
PCR is a random process COPY NO COPY PCR efficiency is not 100% efficient. A strand copies with probability p, and doesn't copy with probability 1-p. Cybergenetics © 2007-2010
STR peak is a random variable p = 80%, n = 6 2,900 1,298 One amplification Another amplification Cybergenetics © 2007-2010
STR peak height measurement reflects probability distribution mean = 2,145 stdev = 284 Cybergenetics © 2007-2010
Relative peak certainty: coefficient of variation stdev = 12 mean = 85 CV = 14% standard deviation CV = mean value Cybergenetics © 2007-2010
Four times the peak height, gives twice the peak certainty stdev = 25 mean = 350 CV = 7% standard deviation CV = mean value Cybergenetics © 2007-2010
Peaks are probabilities Cybergenetics © 2007-2010
STR data is a random variable Cybergenetics © 2007-2010
Genotype pattern vs. peak data Cybergenetics © 2007-2010
Calculate stochastic effects Computers can solve for genotype probabilities and other random variables, like peak variation By modeling peak variation as just another parameter, computers can calculate DNA stochastic effects Cybergenetics © 2007-2010
Some apply a threshold Over threshold, peaks are treated as allele events. CPI model of data fit - all given equal likelihood. Previous threshold guidelines Under threshold, alleles do not exist. list of included alleles Cybergenetics © 2007-2010 11
Since peaks are probabilities, thresholds introduce error Cybergenetics © 2007-2010
False allele exclusion rate
Probability preserves information Cybergenetics © 2007-2010
Data probability • arises from PCR randomness • models stochastic effects • helps explain allele drop out • compares with genotype patterns • preserves identification information • established normative science Cybergenetics © 2007-2010