Presentation is loading. Please wait.

Presentation is loading. Please wait.

Philip Dawid University of Cambridge TexPoint fonts used in EMF.

Similar presentations


Presentation on theme: "Philip Dawid University of Cambridge TexPoint fonts used in EMF."— Presentation transcript:

1 INTERPRETING COMPLEX DNA PROFILE EVIDENCE:  BAYESIAN NETWORKS TO THE RESCUE
Philip Dawid University of Cambridge TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

2 Difficulties of Formalizing Reasoning
Classical logic does not readily handle “non-monotonic” reasoning Reasoning with uncertainty is especially delicate but specification and manipulation of probabilities appears problematic

3 Example: “Explaining Away”
Burglar alarm is ringing Break-in? Earthquake? Radio reports earthquake in vicinity report earthquake earthquake alarm alarm break-in So report break-in ???

4 PROBABILISTIC REASONING IN INTELLIGENT SYSTEMS Networks of Plausible Inference Pearl 1988

5 Go with the (causal) flow ?

6 BAYESIAN NETWORKS Handle complex problems involving probabilistic uncertainty Modular structure Intuitive graphical representation Precise semantics relevance (conditional independence) Correct accounting for evidence Computational algorithms elegant and efficient

7 AN APPLICATION Forensic Identification DNA Profiling
Disputed Paternity

8 FORENSIC USES FOR DNA PROFILES
Murder/Rape/…: Is A the culprit? Paternity: Is A the father of B? Immigration: Is A the mother of B? How are A and B related? Disasters: 9/11, tsunami, Romanovs,…

9 DNA Profile From blood, saliva, semen, hair root,…
Can be amplified from a single cell Record genotypes for 12–20 DNA markers unlinked (different chromosomes)

10 A typical DNA profile

11 Short Tandem Repeat markers:
hypervariable “junk” (nuclear) DNA **|GTAC|GTAC|GTAC|GTAC|**  4 repeats (allele) genotype, e.g. 7/13 or 14

12 D7S820 D7S880 is one of the 13 core CODIS STR genetic loci. This DNA is found on human chromosome 7. The DNA sequence of a representative allele of this locus is shown below. The tetrameric repeat sequence of D7S280 is GATA. Different alleles of this locus have from 6 to 15 tandem repeats of the GATA sequence. Repeat number = 12 – or possibly 14?? 001 AATTTTTGTATTTTTTTTAGAGACGGGGTTTCACCATGTTGGTCAGGCTGACTATGGAGT 061 TATTTTAAGGTTAATATATATAAAGGGTATGATAGAACACTTGTCATAGTTTAGAACGAA 121 CTAACGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAT 181 TGATAGTTTTTTTTTATCTCACTAAATAGTCTATAGTAAACATTTAATTACCAATATTTG 241 GTGCAATTCTGTCAATGAGGATAAATGTGGAATCGTTATAATTCTTAAGAATATATATTC 301 CCTCTGAGTTTTTGATACCTCAGATTTTAAGGCC

13 Disputed Paternity (Essen-Möller 1938)
We have DNA data D from a disputed child c, its mother m and the putative father pf If the true father tf is not pf, he is a “random” alternative father af Straightforward to compute the evidence (LIKELIHOOD RATIO) in favor of paternity (Essen-Möller 1938)

14 Disputed Paternity LIKELIHOOD RATIO (Essen-Möller 1938)

15 MISSING DNA DATA What if we can not obtain DNA from the suspect ? (or other relevant individual?) Sometimes we can obtain indirect information by DNA profiling of relatives But analysis is complex and subtle…

16 Network Representation
We have DNA data D from a disputed child c, its mother m and the putative father pf child founder query founder If pf is not the true father tf, this is a “random” alternative father af , query hypothesis Building blocks: founder, child

17 Disputed Paternity Case
founder founder query hypothesis child Building blocks: founder, child, query

18 Complex Paternity Case
We have DNA from a disputed child c1 and its mother m1 but not from the putative father pf. We do have DNA from c2 an undisputed child of pf, and from her mother m2 as well as from two undisputed full brothers b1 and b2 of pf. founder child query hypothesis Building blocks: founder, child, query

19 Criminal Identification Case
A body has been found, burnt beyond recognition, but there is reason to believe it might be that of a missing criminal CR. DNA is available from the body, from the wife of CR, and from two children c1 and c2 of CR and wife founder founder query founder hypothesis child child Building blocks: founder, child, query

20 Object-Oriented Bayesian Network
HUGIN 6 Each building block (founder / child / query) in a pedigree can be an INSTANCE of a generic CLASS network — which can itself have further structure The pedigree is built up using simple mouse clicks to insert new nodes/instances and connect them up Genotype data are entered and propagated using simple mouse clicks

21 Under the microscope… Each CLASS is itself a Bayesian Network, with internal structure Recursive: can contain instances of further class networks Communication via input and output nodes

22 Single-marker analysis
(multiply LR’s across markers) 12 .0003 13 .0018 14 .1009 15 .1004 16 .1949 17 .2834 18 .2162 19 .0866 20 .0137 21 .0015 22 Marker vWA (Austro-German population allele frequencies)

23 Lowest Level Building Blocks
DNA MARKER having associated repertory of alleles together with their frequencies gene GENOTYPE consisting of maximum and minimum of paternal and maternal genes genotype gene: mendel MENDELIAN SEGREGATION Child’s gene copies paternal or maternal gene, according to outcome of fair coin flip

24 founder FOUNDER INDIVIDUAL represented by a pair of genes pgin and mgin (instances of gene) sampled independently from population distribution, and combined in instance gt of genotype gene genotype

25 child CHILD INDIVIDUAL
paternal [maternal] gene selected by instances fmeiosis [mmeiosis] of mendel from father’s [mother’s] two genes, and combined in instance cgt of genotype mendel genotype

26 query query QUERY INDIVIDUAL Choice of true father’s paternal gene tfpg [maternal gene mfpg] as either that of f1 or that of f2, according as tf=f1? is true or false.

27 Complex Paternity Case
Measurements for 12 DNA markers on all 6 individuals Enter data, “propagate” through system founder child Overall Likelihood Ratio in favour of paternity: 1300 query hypothesis

28 MORE COMPLEX DNA CASES Mutation Silent/missed alleles,…
Mixed crime stains rape scuffle Multiple perpetrators and stains Database search Contamination, laboratory errors Mixed stains in O. J. Simpson case. Interpretation contested between Defence and Prosecution – but both got it wrong! OJS case also featured alternative explanation – framed by police.

29 MUTATION mendel + appropriate network mut to describe mutation process

30 e.g. proportional mutation:
founder Prob(otherg) ~ mutation rate mut – or build other, more realistic, models

31 SILENT ALLELES Code by additional allele (99) gene genotype
unobserved + inherited e.g. 5 = 5/5 or 5/s Code by additional allele (99) gene genotype gene probsil – 9 values from 0 to 0.01: silent := Binomial (1, probsil) gene := max (99 * silent, gene0_gene) genotype gtmax := if (gtmax0 == 99, gtmin, gtmax0)

32 unobserved + non-inherited
MISSED ALLELES unobserved + non-inherited geneobs genotype geneobs

33 COMBINATION Can combine any or all of above features (and others), by using all appropriate subnetworks Can use any desired pedigree network no visible difference at top level Simply enter data (and desired parameter-values) and propagate…

34 Effect of accounting for silent allele
Simple paternity testing Paternity testing with additional measured individuals

35 (Austro-German population allele frequencies)
12 .0003 13 .0018 14 .1009 15 .1004 16 .1949 17 .2834 18 .2162 19 .0866 20 .0137 21 .0015 22 Marker vWA (Austro-German population allele frequencies)

36 Simple paternity testing – allowing for silent alleles

37 Paternal incompatibility
mgt = 12/20 pfgt = 13 cgt = 12 with mutation ~ 0.005 pr(silent) LR 3.8 26 30 0.0001 125 127 0.001 203 The mother's and child's genotypes are the same as in Example 9.1, while the putative father's observed allele is now the relatively rare allele 13, with p13 = 0:2%. The combined effects of silence and missingness are displayed in Table 5. The impact of introducing the possibility of silence is overwhelming: for example, when pr(silent) = 0:01% the paternity ratio is 125. Compared with Example 9.1, the greater rarity of the putative father's observed allele now makes the presence of a silent allele still more plausible. However the sheer magnitude of this eect is perhaps unexpected. p12 = – rare allele

38 Maternal incompatibility
mgt = pfgt = 18 cgt = 18 The mother must have passed a silent allele to the child who must have inherited allele 18 from his father pr(silent) LR Impossible 4.6 0.0001 0.001 The undisputed mother is apparently incompatible with the child: she must therefore have a missed allele, or have transmitted a silent or mutated allele to her child. Given that p18 = 21% is much larger than any value considered for pr(silent) or pr(missed), we can be pretty sure, first that both pfgt and cgt are truly homozygous, and then that the child inherited allele 18 from its father. This has probability close to 1 under paternity, and to p18 = 0:2162 under non-paternity. Correspondingly the paternity ratio is close to 1=0: :6 for any combination of the above explanations. This can be confirmed by calculations (not shown), using our networks.

39 Paternity testing

40 Paternity testing with brother too

41 Consider additional evidence (likelihood ratio) LRB carried by the brother’s data B
Overall likelihood ratio is where D denotes data on triplet (pf, c, m)

42 Incompatible triplet mgt = 12/15 pfgt = 14 cgt = 12 B = 16/20 12/14 14
22 p(silent) LRD LRB 1 0.55 3334 0.5 1.00 1595 0.0001 2.5 404 0.001 7.5 46 *Maximum LRoverall is 1027, at p(silent) = * p12 = .0003 p22 = .0003

43 Compatible triplet mgt = 12/15 pfgt = 13 cgt = 12/13 B = 13 13/16
21/22 22 p(silent) LRD LRB 556 1 551 1.00 0.51 0.0001 528 1.02 0.52 0.001 410 1.11 0.61 There is no effect whatsoever when the brother is heterozygous with no allele in common with the child (bgt = f21; 22g); otherwise there is some effect, which is most apparent in column 5, where b is apparently homozygous but different from pfgt: it then becomes more plausible that pf is in fact heterozygous with one silent allele.

44 Extensions Estimation of mutation rates from paternity data
Peak area data mixtures contamination low copy number

45 Network to estimate mutation rate

46 Mixed crime trace Marker: D8 D18 D21 Alleles: 10 11 14 13 16 17 59 65
67 70 Peak Area (RFUs): 6416 383 5659 38985 1914 1991 1226 1434 8816 8894 Suspect alleles in yellow Excerpt of data on 6 markers from Evett et al. (1998)

47 Mixed crime trace – alleles only
Bayesian network that can be used to infer which of a suspect, victim and up to 6 possible unknown individuals might have contributed DNA to a mixed crime trace.

48 Mixed crime trace – peak areas

49 REFERENCES Dawid, A. P., Mortera, J., Pascali, V. L. and van Boxel, D. W. (2002). Probabilistic expert systems for forensic inference from genetic markers. Scand. J. Statist . 29, 577–595. Dawid, A. P. (2003). An object-oriented Bayesian network for estimating mutation rates. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, January 3–6 2003, Key West, Florida, edited by Christopher M. Bishop and Brendan J. Frey. ISBN Online at Mortera, J., Dawid, A. P. and Lauritzen, S. L. (2003). Probabilistic expert systems for DNA mixture profiling. Theor. Pop. Biol . 63, 191–205. Cowell, R. G., Lauritzen, S. L. and Mortera, J. (2006). Identification and separation of DNA mixtures using peak area information. Forensic Science International (to appear). Dawid, A. P., Mortera, J. and Vicard, P. (2007). Object-oriented Bayesian networks for complex forensic DNA profiling problems. Forensic Science International: Genetics 1 (to appear).

50 Mixed crime trace 25,000 170,000,000 Marker: D8 D18 D21 Alleles: 10 11
14 13 16 17 59 65 67 70 Peak area: 6416 383 5659 38985 1914 1991 1226 1434 8816 8894 + 3 more… LR (alleles only): 25,000 LR (peak areas too): 170,000,000

51 Thanks to: Julia Mortera Paola Vicard Steffen Lauritzen Robert Cowell and The Leverhulme Trust

52 and especially to JUDEA PEARL
who made it all possible


Download ppt "Philip Dawid University of Cambridge TexPoint fonts used in EMF."

Similar presentations


Ads by Google