Download presentation
Presentation is loading. Please wait.
1
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007
2
Association Mapping of Diseases SNPs Cases Controls Diploid: two sequences per individuals Problem: Where are (unobserved) disease mutations? This talk: Genealogy-based approach 01
3
Genealogy: Evolutionary History of Genomic Sequences Tells how individuals in a population are related Helps to explain diseases: disease mutations occur on branches and all descendents carry the mutations Problem: How to determine the genealogy for “unrelated” individuals? Not easy with recombination Individuals in current population Diseased (case) Healthy (control) Disease mutation
4
4 Recombination One of the principle genetic forces shaping sequence variations within species Two equal length sequences generate a third new equal length sequence in genealogy 110001111111001 000110000001111 Prefix Suffix 110000000001111 Breakpoint
5
Ancestral Recombination Graph (ARG) 100100 S1 = 00 S2 = 01 S3 = 10 S4 = 10 Mutations S1 = 00 S2 = 01 S3 = 10 S4 = 11 10010011 Recombination Assumption: At most one mutation per site 1 00 1 1 0 1010
6
6 Mapping Disease Gene with Inferred Genealogy “..the best information that we could possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard, 2005 But we do not know the true ARG! Goal: infer ARGs from sequences for association mapping –Not easy and often approximation is used (e.g. Zollner and Pritchard)
7
7 The ARG Approaches First practical ARG association mapping method (Minichiello and Durbin, 2006) –Use plausible ARGs: heuristic My work: Generate ARGs with a provable property, and works on a well-defined complex disease model –minARGs: Most parsimonious ARGs that use the minimum number of recombinations. –Uniform sampling of minARGs: generate one minARG from the space of all minARGs with equal probability. (Sampling is a scheme often used in genealogy-based approaches)
8
8 00000 01000 01100 01101 11100 00010 11011 00011 1 2 Counting minARGs by Dynamic Programming (This paper) N = 124*1 + 32*2 = 188 It turns out no other row choices contribute to the minARG space. 00000 01000 01100 01101 11100 00010 00011 11011 N1=124 Recursion 00000 01000 01100 11100 00010 11011 00011 01101 N2=32 Assume only input sequences are generated.
9
00000 01000 01100 01101 11100 00010 11011 00011 1 2 00000 01000 01100 11100 00010 11011 00011 01101 N2=32 1. Random value Rnd = 0.3 < 0.66 188 minARGs Select 11011 with prob = 124/188 = 0.66, and 01101 with prob = 32*2/188 = 0.34 2. Pick 11011 as last row to derive 3. Move to reduced matrix 00000 01000 01100 01101 11100 00010 00011 11011 N1=124 Idea: Use counting of minARGs in selecting the order of sequences to generate. Can be easily extend to weighted sampling, e.g. generate less frequent sequences later.
10
10 ARGs Represent a Set of Marginal Trees Clear separation of cases/controls: NOT expected for complex diseases! Case Control Possible disease mutation
11
Realities of Mappping Complex Diseases SNPs 1 2 Multiple disease mutations! Cases Controls Incomplete penetrance Diploid: two sequences per individuals Trying to find one tree branch which clearly separate cases and controls may not work for complex diseases! Solution: Inference on a well- defined disease model.
12
12 Complex Disease Model: How A Disease Affects Population (Zollner & Pritchard, 2005) Disease mutations: Poisson Process Two alleles: wild-type and mutant 0.02 0.05 0.07 0.06 0.08 0.1 0.01 0.03 Probability of disease mutations occur at the branch (computed from mutation rate and branch length) A formal model of the complex disease is needed to assess the significance of a chosen marginal tree for real data.
13
13 Disease Penetrance (Zollner & Pritchard) P A,1 : probability of a mutant sequence becomes a case P C,1 = 1.0 - P A,1 P A,0 : probability of a wild- type sequence becomes a case P C,0 = 1.0 - P A,0 0.02 0.05 0.07 0.06 0.08 0.1 0.01 0.03 cAse Control P A,1 = 0.8, P C,1 = 0.2P A,0 = 0.1, P C,0 = 0.9
14
14 Phenotype Likelihood: How Likely are Phenotypes Generated on a Marginal Tree? ( Zollner and Pritchard) The disease model specifies a probabilistic way of assigning phenotypes for a given tree. But we have many trees and at which tree disease mutations occurs? Given a tree T and case/control phenotypes of its leaves, what is the probability of observing on T? –High phenotype likelihood: disease mutations may occur in T –Computable in linear time and adopted in this work
15
15 This Paper: Expected Phenotype Likelihood We need to assess statistical significance of computed phenotype likelihood. –Null model: randomly permute case/control status of leaves in the given tree. –P-value by permutation tests: computational bottleneck! My result: O(n 3 ) algorithm computing expected value (and variance) of phenotype likelihood. –Exact, fully deterministic method. –But, computing P-value precisely and efficiently remains open.
16
16 This Paper: Diploid Penetrance Is Hard Diploid (e.g. humans): two sequences per individual Diploid penetrance: P A,00 : prob. Individual with two wild-type sequences becomes a case P A,01 : prob. Individual with one wild-type and one mutant becomes a case P A,11 : … Case Control Efficient computation of phenotype likelihood: stated but unresolved in Zollner and Pritchard My result: computing phenotype likelihood with diploid penetrance is NP-hard
17
Simulation Results Comparison: TMARG, LATAG (Z. P.), MARGARITA (M. D.). TMARG (my program) and MARGRITA are much faster (20 times or more) than LATAG. Important for whole genome scan. Average mapping error for 50 simulated datasets from Zollner and Pritchard Average over 50 genealogies Date: January, 2007
18
18 Acknowledgement Software available at: http://wwwcsif.cs.ucdavis.edu/~wuyu I want to thank –Dan Gusfield –Dan Brown –Chuck Langley –Yun S. Song
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.