Thinking About DNA Database Searches William C. Thompson Dept. of Criminology, Law & Society University of California, Irvine
Value of DNA Match for Proving Identity Prior Odds x Likelihood Ratio = Posterior Odds May be very low 1 x RMP + FPP* *Actually RMP + [FPP x (1-RMP)], see Thompson, Taroni & Aitken, :1 million x 1 billion:1 = 1000:1 1:1 million x 1 million:1 = 1:1 1:1000 x 10,000:1 = 10:1
Mysterious Clusters and the Law of Truly Large Numbers In a truly large sample space, seemingly unusual events are bound to occur –E.g., double lottery winners; cancer clusters –See, Diaconis & Mosteller (1989). Methods for studying coincidences, JASA,
Taking Account of Coincidence When Searching Truly Large DNA Databases Should the frequency of the matching profile be presented to the jury? Standard answers: –No NRC I – test additional loci; report only freq. of those NRC II—multiply freq. by N (for database) –Yes Friedman/Donnelley—present LR but keep in mind prior odds may be very low Prosecutors Everywhere—jury should hear most impressive number possible “because it’s relevant”
My Solution: Present Profile Frequency Only When It Equals the RMP* Multiple Tests of Different Hypotheses –Search unsolved crime evidence against offender database –For each offender, p(match|not source) = frequency Multiple Tests of Same Hypothesis –Search suspect against unsolved crime database to see if he matches any unsolved crime –For this suspect, p(match|not source) = Freq. x N *RMP = p(match|suspect not the source)
My Solution: Present Profile Frequency Only When It Equals the RMP* Testing relatives of people who almost match –For most suspects, p(match|not source) = frequency of matching profile –For relatives of people who almost match, p(match|not source) >>>> frequency –Therefore it is misleading to present the frequency of the matching profile in cases where the suspect is selected because a relative almost matches
Database Searches and the Birthday Problem The probability that a randomly chosen person will have my birthday is 1 in 365 The probability that any two people in a room share a birthday can be far higher –With 23 people in a room, the likelihood that two will share a birthday exceeds 1 in 2 –With 60 people in the room, the probability is nearly 1 in 1
Database Searches and the Birthday Problem Suppose the probability of a random match between any two DNA profiles is between 1 in 10 billion and 1 in 1 trillion What is the probability of finding a match between two such profiles in a database of: –1,000 –100,000 –1,000,000
Approximate likelihood that two profiles in a DNA database will match Database Size1 in 10 billion 1 in 100 billion 1 in 1 trillion in 20,0001 in 200,0001 in 2 million 10,0001 in 2001 in in 20, ,0001 in 2.51 in 201 in 200 1,000,0001 in 1 1 in 2.5 Profile Frequency
Why present a birthday statistic in database cases? Because it is relevant…