Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.

Slides:



Advertisements
Similar presentations
A New Recombination Lower Bound and The Minimum Perfect Phylogenetic Forest Problem Yufeng Wu and Dan Gusfield UC Davis COCOON07 July 16, 2007.
Advertisements

Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Efficient Computation of Close Upper and Lower Bounds on the Minimum Number of Recombinations in Biological Sequence Evolution Yun S. Song, Yufeng Wu,
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Dynamic Bayesian Networks (DBNs)
Sampling distributions of alleles under models of neutral evolution.
Basics of Linkage Analysis
MALD Mapping by Admixture Linkage Disequilibrium.
Signatures of Selection
Combinatorial Algorithms and Optimization in Computational Biology and Bioinformatics Dan Gusfield occbio, June 30, 2006.
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Approaching the Long-Range Phasing Problem using Variable Memory Markov Chains Samuel Angelo Crisanto 2015 Undergraduate Research Symposium Brown University.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
From population genetics to variation among species: Computing the rate of fixations.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
ReCombinatorics: Phylogenetic Networks with Recombination CPM, June 18, 2008 Pisa, Italy Two recent results and Two Open Questions.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005.
Estimating and Reconstructing Recombination in Populations: Problems in Population Genomics Dan Gusfield UC Davis Different parts of this work are joint.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Algorithms for estimating and reconstructing recombination in populations Dan Gusfield UC Davis Different parts of this work are joint with Satish Eddhu,
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Habil Zare Department of Genome Sciences University of Washington
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Estimating and Reconstructing Recombination in Populations: Problems in Population Genomics Dan Gusfield UC Davis Different parts of this work are joint.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
California Pacific Medical Center
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Meiotic Recombination (single-crossover) PrefixSuffix  Recombination is one of the principal evolutionary forces responsible for shaping genetic variation.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Assessing the significance of (data mining) results Data D, an algorithm A Beautiful result A (D) But: what does it mean? How to determine whether the.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Modelling evolution Gil McVean Department of Statistics TC A G.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
The Haplotype Blocks Problems Wu Ling-Yun
Yufeng Wu and Dan Gusfield University of California, Davis
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Statistical Modeling of Ancestral Processes
ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination Dan Gusfield U. Oregon , May 8, 2012.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
David H. Spencer, Kerry L. Bubb, Maynard V. Olson 
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007

Association Mapping of Diseases SNPs Cases Controls Diploid: two sequences per individuals Problem: Where are (unobserved) disease mutations? This talk: Genealogy-based approach 01

Genealogy: Evolutionary History of Genomic Sequences Tells how individuals in a population are related Helps to explain diseases: disease mutations occur on branches and all descendents carry the mutations Problem: How to determine the genealogy for “unrelated” individuals? Not easy with recombination Individuals in current population Diseased (case) Healthy (control) Disease mutation

4 Recombination One of the principle genetic forces shaping sequence variations within species Two equal length sequences generate a third new equal length sequence in genealogy Prefix Suffix Breakpoint

Ancestral Recombination Graph (ARG) S1 = 00 S2 = 01 S3 = 10 S4 = 10 Mutations S1 = 00 S2 = 01 S3 = 10 S4 = Recombination Assumption: At most one mutation per site

6 Mapping Disease Gene with Inferred Genealogy “..the best information that we could possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard, 2005 But we do not know the true ARG! Goal: infer ARGs from sequences for association mapping –Not easy and often approximation is used (e.g. Zollner and Pritchard)

7 The ARG Approaches First practical ARG association mapping method (Minichiello and Durbin, 2006) –Use plausible ARGs: heuristic My work: Generate ARGs with a provable property, and works on a well-defined complex disease model –minARGs: Most parsimonious ARGs that use the minimum number of recombinations. –Uniform sampling of minARGs: generate one minARG from the space of all minARGs with equal probability. (Sampling is a scheme often used in genealogy-based approaches)

Counting minARGs by Dynamic Programming (This paper) N = 124*1 + 32*2 = 188 It turns out no other row choices contribute to the minARG space N1=124 Recursion N2=32 Assume only input sequences are generated.

N2=32 1. Random value Rnd = 0.3 < minARGs Select with prob = 124/188 = 0.66, and with prob = 32*2/188 = Pick as last row to derive 3. Move to reduced matrix N1=124 Idea: Use counting of minARGs in selecting the order of sequences to generate. Can be easily extend to weighted sampling, e.g. generate less frequent sequences later.

10 ARGs Represent a Set of Marginal Trees Clear separation of cases/controls: NOT expected for complex diseases! Case Control Possible disease mutation

Realities of Mappping Complex Diseases SNPs 1 2 Multiple disease mutations! Cases Controls Incomplete penetrance Diploid: two sequences per individuals Trying to find one tree branch which clearly separate cases and controls may not work for complex diseases! Solution: Inference on a well- defined disease model.

12 Complex Disease Model: How A Disease Affects Population (Zollner & Pritchard, 2005) Disease mutations: Poisson Process Two alleles: wild-type and mutant Probability of disease mutations occur at the branch (computed from mutation rate and branch length) A formal model of the complex disease is needed to assess the significance of a chosen marginal tree for real data.

13 Disease Penetrance (Zollner & Pritchard) P A,1 : probability of a mutant sequence becomes a case P C,1 = P A,1 P A,0 : probability of a wild- type sequence becomes a case P C,0 = P A, cAse Control P A,1 = 0.8, P C,1 = 0.2P A,0 = 0.1, P C,0 = 0.9

14 Phenotype Likelihood: How Likely are Phenotypes Generated on a Marginal Tree? ( Zollner and Pritchard) The disease model specifies a probabilistic way of assigning phenotypes for a given tree. But we have many trees and at which tree disease mutations occurs? Given a tree T and case/control phenotypes  of its leaves, what is the probability of observing  on T? –High phenotype likelihood: disease mutations may occur in T –Computable in linear time and adopted in this work

15 This Paper: Expected Phenotype Likelihood We need to assess statistical significance of computed phenotype likelihood. –Null model: randomly permute case/control status of leaves in the given tree. –P-value by permutation tests: computational bottleneck! My result: O(n 3 ) algorithm computing expected value (and variance) of phenotype likelihood. –Exact, fully deterministic method. –But, computing P-value precisely and efficiently remains open.

16 This Paper: Diploid Penetrance Is Hard Diploid (e.g. humans): two sequences per individual Diploid penetrance: P A,00 : prob. Individual with two wild-type sequences becomes a case P A,01 : prob. Individual with one wild-type and one mutant becomes a case P A,11 : … Case Control Efficient computation of phenotype likelihood: stated but unresolved in Zollner and Pritchard My result: computing phenotype likelihood with diploid penetrance is NP-hard

Simulation Results Comparison: TMARG, LATAG (Z. P.), MARGARITA (M. D.). TMARG (my program) and MARGRITA are much faster (20 times or more) than LATAG. Important for whole genome scan. Average mapping error for 50 simulated datasets from Zollner and Pritchard Average over 50 genealogies Date: January, 2007

18 Acknowledgement Software available at: I want to thank –Dan Gusfield –Dan Brown –Chuck Langley –Yun S. Song