CSB 20061 Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.

Slides:



Advertisements
Similar presentations
A New Recombination Lower Bound and The Minimum Perfect Phylogenetic Forest Problem Yufeng Wu and Dan Gusfield UC Davis COCOON07 July 16, 2007.
Advertisements

Efficient Computation of Close Upper and Lower Bounds on the Minimum Number of Recombinations in Biological Sequence Evolution Yun S. Song, Yufeng Wu,
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Fast Algorithms For Hierarchical Range Histogram Constructions
Reconstructing Ancestral Recombination Graphs - or Phylogenetic Networks with Recombination Dan Gusfield UC Davis Different parts of this work are joint.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Combinatorial Algorithms and Optimization in Computational Biology and Bioinformatics Dan Gusfield occbio, June 30, 2006.
1 Complexity of Network Synchronization Raeda Naamnieh.
1 A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield Department of Computer Science.
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Computing close bounds on the minimum number of recombinations Dan Gusfield UCD Y. Song, Y. F. Wu, D. Gusfield (ISMB2005) D. Gusfield, D. Hickerson (Dis.
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
Close Lower and Upper Bounds for the Minimum Reticulate Network of Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover recombination in populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Haplotyping via Perfect Phylogeny: A Direct Approach
ReCombinatorics: Phylogenetic Networks with Recombination CPM, June 18, 2008 Pisa, Italy Two recent results and Two Open Questions.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Combinatorial Approaches to Haplotype Inference Dan Gusfield CS, UC Davis.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Integer Programming for Phylogenetic and Population- Genetic Problems with Complex Data D. Gusfield, Y. Frid, D. Brown Cocoon’07, July 16, 2007.
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
Optimal Phylogenetic Networks with Constrained and Unconstrained Recombination (The root-unknown case) Dan Gusfield UC Davis.
Estimating and Reconstructing Recombination in Populations: Problems in Population Genomics Dan Gusfield UC Davis Different parts of this work are joint.
Computing close bounds on the minimum number of recombinations Dan Gusfield UCD Y. Song, Y. F. Wu, D. Gusfield (ISMB2005) D. Gusfield, D. Hickerson (Dis.
Optimal Phylogenetic Networks with Constrained and Unconstrained Recombination Dan Gusfield UC Davis Different parts of this work are joint with Satish.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Algorithms for estimating and reconstructing recombination in populations Dan Gusfield UC Davis Different parts of this work are joint with Satish Eddhu,
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Combinatorial Optimization in Computational Biology: three topics that use Perfect Phylogeny Dan Gusfield OSB 2008, Lijiang, China, November 1, 2008.
Evolutionary Algorithms BIOL/CMSC 361: Emergence Lecture 4/03/08.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Informative SNP Selection Based on Multiple Linear Regression
Estimating and Reconstructing Recombination in Populations: Problems in Population Genomics Dan Gusfield UC Davis Different parts of this work are joint.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Meiotic Recombination (single-crossover) PrefixSuffix  Recombination is one of the principal evolutionary forces responsible for shaping genetic variation.
COSC 5341 High-Performance Computer Networks Presentation for By Linghai Zhang ID:
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
The Haplotype Blocks Problems Wu Ling-Yun
by d. gusfield v. bansal v. bafna y. song presented by vikas taliwal
Yufeng Wu and Dan Gusfield University of California, Davis
Algorithms for estimating and reconstructing recombination in populations Dan Gusfield UC Davis Different parts of this work are joint with Satish Eddhu,
Of Sea Urchins, Birds and Men
Lecture 11: Tree Search © J. Christopher Beck 2008.
Algorithms for estimating and reconstructing recombination in populations Dan Gusfield UC Davis Different parts of this work are joint with Satish Eddhu,
Estimating Recombination Rates
ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination Dan Gusfield U. Oregon , May 8, 2012.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis

2 Haplotypes/Genotypes Diploid organisms have two copies of (not identical) chromosomes. A single copy is a haplotype, vector of 0,1. The mixed description is a genotype, vector of 0,1,2. At each site, –If both haplotypes are 0, genotype is 0 –If both haplotypes are 1, genotype is 1 –If one is 0 and the other is 1, genotype is 2 Key fact: easier to collect genotypes, but many downstream applications work better with haplotypes

3 Haplotyping Genotype Sites: Haplotype Haplotype Inference (HI) Problem: given a set of n genotypes, infer the real n haplotype pairs that form the given genotypes Phasing the 2s

4 Two-stage Approach Given a set of genotypes G, we are interested in downstream problems Many HI solutions for G Two stage: first infer the “correct” HI solution from the genotypes, then do the downstream analysis with the inferred haplotypes Haplotype inference: extensively studied and believed to be accurate to certain extent

5 One-stage Approach What effect does the haplotyping inaccuracy has on downstream questions? Our work: directly use genotype data for downstream problems –Without fixing a choice for the HI solution –Minimum recombination problem

6 Recombination: Single Crossover Recombination is one of the principle genetic force shaping variation within species Two equal length sequences generate a third equal length sequence Prefix Suffix breakpoint

7 Kreitman’s Data (1983) Question: what is the minimum number of recombinations needed to derive these sequences? Assume at most 1 mutation per site

8 Minimizing Recombination Compute the minimum number of recombinations (Rmin) for deriving a set of haplotypes, assuming at most 1 mutation per site –NP-hard in general –Heuristics –Lower bounds on Rmin

9 Lower Bounds on Genotypes For a particular recombination lower bound method L, what is the range of possible bounds for L over all possible HI solutions? –MinL(G): minimum L over all HI solutions for G. –MaxL(G): maximum L over all HI solutions for G. This paper: HK bound, connected component bound and relaxed haplotype bound. –Polynomial-time algorithms for MaxHK, MinCC. –Heuristic method for relaxed haplotype bound.

abcdefgabcdefg Incompatibility Graph (IG): A node each site, edge between incompatible pair M Lower Bound: Incompatibility Two sites (columns) p, q are incompatible if columns p,q contains all four ordered pairs (gametes): 00, 01, 10, 11 Sites p,q are incompatible  A recombination must occur between p,q

11 HK Bound (1985) Arrange the nodes of the incompatibility graph on the line in order that the sites appear in the sequence. HK bound = maximum number of non-overlapping edges in incompatibility graph (IG). Easy to compute for haplotype data HK Lower Bound = 1

12 IG for HI Solutions HK = 1 HI HK = 3 HI 2

13 HK Bounds on Genotypes Known efficient algorithm for MinHK(G) (Wiuf, 2004). This paper: polynomial-time algorithm for MaxHK(G)

14 Maximal Incompatibility Graph An edge between sites p and q if there is a phasing of p, q so p and q are incompatible –Each pair of sites is considered independently E(G): a maximum-sized set of non- overlapping edges in MIG(G) G MIG(G)E(G) = {12, 23, 35}

15 MaxHK(G) Claim: MaxHK(G) = |E(G)| MaxHK(G)  |E(G)| –MIG(G): supergraph of IG(H) for any HI solution H If we can find an HI solution H, whose every pair of sites in E(G) is incompatible, then HK(H)  |E(G)| Together, MaxHK(G) = |E(G)|

Phase sites from left to right. Each component in E(G) is a simple path Each site only constrained by at most one site to the left Finding such an H MIG(G)

Phasing G for Incompatibility ?0? 0??00 1?? ?0? 00?00 11? ? 0000? No matter how a previous site p is phased, can always phase this site q to make p, q incompatible

18 Haplotyping With Minimum Number of Recombinations Compute Rmin(G) –Haplotyping on a network with fewest recombinations NP-hard This paper: A branch and bound method computing exact Rmin(G) for data with small number of sites APOE data: 47 non-trivial genotypes, 9 sites –Our method: 2 minutes, Rmin(G) = 5

19 Application: Recombination Hotspot Recombination hotspot: regions where recombination rate is much higher than neighboring regions Previous study (Bafna and Bansal, 2005): a recombination lower bound with inferred haplotypes were used to identify recombination hotspots Our work: compute the exact Rmin(G) with genotypes for a sliding window of a small number of SNPs to detect recombination hotspots

20 Result from haplotypes (Bafna and Bansal, 2005) Result from original genotypes (this paper) MS32 data (Jeffreys, et al. 2001)

21 Other Applications Finding true Rmin from genotypes G –Two stage approach: run PHAS to get an HI solution H, and compute Rmin(H) –One stage approach: directly compute Rmin(G) Accuracy of haplotype inference on a minimum network Simulation results: comparable, slightly weaker and non-conclusive

22 Summary Main goal of this paper: develop computational tools for the minimum recombination problem with genotypes –Polynomial-time algorithm for MaxHK and MinCC problems –Practical heuristics for other problems –Simulation results to several application questions are not conclusive –Our tools facilitate the study of these problems

23 Thank You Software: available upon request