Recombination, Phylogenies and Parsimony

Slides:



Advertisements
Similar presentations
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Advertisements

Efficient Computation of Close Upper and Lower Bounds on the Minimum Number of Recombinations in Biological Sequence Evolution Yun S. Song, Yufeng Wu,
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
An introduction to maximum parsimony and compatibility
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
School of CSE, Georgia Tech
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover recombination in populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University.
My wish for the project-examination It is expected to be 3 days worth of work. You will be given this in week 8 I would expect 7-10 pages You will be given.
CSE182-L17 Clustering Population Genetics: Basics.
Estimating and Reconstructing Recombination in Populations: Problems in Population Genomics Dan Gusfield UC Davis Different parts of this work are joint.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Algorithms for estimating and reconstructing recombination in populations Dan Gusfield UC Davis Different parts of this work are joint with Satish Eddhu,
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Association Mapping by Local Genealogies Bioinformatics Research Center University of Aarhus Thomas Mailund.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover Recombination in Populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
Phylogenetic trees Sushmita Roy BMI/CS 576
Molecular phylogenetics
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Estimating and Reconstructing Recombination in Populations: Problems in Population Genomics Dan Gusfield UC Davis Different parts of this work are joint.
What is of interest to calculate ? for open problems Semple and Steel.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Meiotic Recombination (single-crossover) PrefixSuffix  Recombination is one of the principal evolutionary forces responsible for shaping genetic variation.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Minimal Recombinations Histories and Global Pedigrees Finding Minimal Recombination Histories Acknowledgements Yun Song - Rune Lyngsø - Mike Steel - Carsten.
Yufeng Wu and Dan Gusfield University of California, Davis
Algorithms for estimating and reconstructing recombination in populations Dan Gusfield UC Davis Different parts of this work are joint with Satish Eddhu,
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
WABI: Workshop on Algorithms in Bioinformatics
Gil McVean Department of Statistics
Multiple Alignment and Phylogenetic Trees
L4: Counting Recombination events
Estimating Recombination Rates
ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination Dan Gusfield U. Oregon , May 8, 2012.
The coalescent with recombination (Chapter 5, Part 1)
Outline Cancer Progression Models
Presentation transcript:

Recombination, Phylogenies and Parsimony 21.11.05 Overview: The History of a set of Sequences The Ancestral Recombination Graph (ARG) & the minimal ARG Dynamical programming algorithm for finding the minimal ARG Branch and Bound algorithm for minimal ARGs Domains of Application: Sequence Variation Fine scale mapping of disease genes Pathogen Evolution

Mutations, Duplications/Coalescents & Recombinations Duplication/ Coalescent Recombination At most one mutation per position. Make emphasis of labelling by ancestry. NEW FIGURE

“The minimal number of recombinations for a set of sequences” 1 Time accgttgataggaaat…………gta accgttgataggaaat…………gta accgttgataggaaat…………gta

Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 0  1 (1+b) b 3 (2+b) 6 2 NY FIGUR Lav ogsaa slide med hovedkonklusioner. 3 2 1 2

The 1983 Kreitman Data & the infinite site assumption (M. Kreitman 1983 Nature from Hartl & Clark 1999) Infinite Site Assumption (Otha & Kimura, 1971) Each position is at most hit by one mutation Recoded Kreitman data i. (0,1) ancestor state known. ii. Multiple copies represented by 1 sequences iii. Non-informative sites could be removed

Compatibility i. 3 & 4 can be placed on same tree without extra cost. 1 2 3 4 5 6 7 1 A T G T G T C 2 A T G T G A T 3 C T T C G A C 4 A T T C G T A i i i i. 3 & 4 can be placed on same tree without extra cost. ii. 3 & 6 cannot. Definition: Two columns are incompatible, if they are more expensive jointly, than separately on the cheapest tree. Compatibility can be determined without reference to a specific tree!! 1 4 3 2

Hudson & Kaplan’s RM 1985 (k positions can have at most (k+1) types without recombination) ex. Data set: A underestimate for the number of recombination events: ------------------- --------------- ------- --------- ------- ----- If you equate RM with expected number of recombinations, this could be used as an estimator. Unfortunately, RM is a gross underestimate of the real number of recombinations. Remove

Myers-Griffiths’ RM S Basic Idea: 1 (2002) S Basic Idea: 1 Define R: Rj,k is optimal solution to restricted interval., then: Do the fonts right!! Rj,i Bj,i j i k Rj,k

# of rec events obtained 11 sequences of alcohol dehydrogenase gene in Drosophila melanogaster. Can be reduced to 9 sequences (3 of 11 are identical). 3200 bp long, 43 segregating sites. Methods # of rec events obtained Hudson & Kaplan (1985) 5 Myers & Griffiths (2002) 6 Song & Hein (2002). Set theory based approach. 7 Song & Hein (2003). Current program using rooted trees. We have checked that it is possible to construct an ancestral recombination graph using only 7 recombination events.

Recombination Parsimony Hein, 1990,93 & Song & Hein, 2002+ 1 2 3 T i-1 i L Data Trees Illustrate better

Metrics on Trees based on subtree transfers. Trees including branch lengths Unrooted tree topologies Rooted tree topologies Tree topologies with age ordered internal nodes Pretending the easy problem (unrooted) is the real problem (age ordered), causes violation of the triangle inequality:

Tree Combinatorics and Neighborhoods Observe that the size of the unit-neighbourhood of a tree does not grow nearly as fast as the number of trees Due to Yun Song Allen & Steel (2001) Song (2003+)

1 4 2 3 5 6 7

The Good News: Quality of the estimated local tree ((1,2),(1,2,3)) True ARG 1 2 3 4 5 Reconstructed ARG 1 3 2 4 5 ((1,3),(1,2,3)) n=7 r=10 Q=75

The Bad News: Actual, potentially detectable and detected recombinations Leaves Root Edge-Length Topo-Diff Tree-Diff 2 1.0 2.0 0.0 .666 3 1.33 3.0 0.0 .694 4 1.50 3.66 0.073 .714 5 1.60 4.16 0.134 .728 6 1.66 4.57 0.183 .740 10 1.80 5.66 0.300 .769 15 1.87 6.50 0.374 .790 500 1.99 0.670 1 2 3 4 Minimal ARG True ARG 4 Mb

Branch and Bound Algorithm 289920 0 3 0 1 91 94 2 1314 1312 3 8618 9618 4 30436 30436 5 62794 62794 6 78970 79970 7 63049 63049 8 32451 32451 9 10467 3467 10 1727 1727 Lower bound ? Upper Bound Exact length k k-recombinatination neighborhood AC’s encountered on k-recombi. ARG 1. The number of ancestral sequences in the ACs. 2. Number of ancestral sequences in the ACs for neighbor pairs 3. AC compatible with the minimal ARG. 4. AC compatible with close-to-minimal ARG.

Recombination, Phylogenies and Parsimony Overview: The History of a set of Sequences The Ancestral Recombination Graph (ARG) & the minimal ARG Dynamical programming algorithm for finding the minimal ARG Branch and Bound algorithm for minimal ARGs Domains of Application: Sequence Variation Fine scale mapping of disease genes Pathogen Evolution

References Allen, B. and Steel, M., Subtree transfer operations and their induced metrics on evolutionary trees,Annals of Combinatorics 5, 1-13 (2001) Baroni, M., Grunewald, S., Moulton, V., and Semple, C. Bounding the number of hybridisation events for a consistent evolutionary history. Journal of Mathematical Biology 51 (2005), 171-182 Bordewich, M. and Semple, C. On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combintorics 8 (2004), 409-423 Griffiths, R.C. (1981). Neutral two-locus multiple allele models with recombination. Theor. Popul. Biol. 19, 169-186. J.J.Hein: Reconstructing the history of sequences subject to Gene Conversion and Recombination. Mathematical Biosciences. (1990) 98.185-200. J.J.Hein: A Heuristic Method to Reconstruct the History of Sequences Subject to Recombination. J.Mol.Evol. 20.402-411. 1993 Hein,J.J., T.Jiang, L.Wang & K.Zhang (1996): "On the complexity of comparing evolutionary trees" Discrete Applied Mathematics 71.153-169. Hein, J., Schierup, M. & Wiuf, C. (2004) Gene Genealogies, Variation and Evolution, Oxford University Press Hudson, 1993 Properties of a neutral allele model with intragenic recombination.Theor Popul Biol. 1983 23(2):183-2 Kreitman, M. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster.Nature. 1983 304(5925):412-7. Lyngsø, R.B., Song, Y.S. & Hein, J. (2005) Minimum Recombination Histories by Branch and Bound. Lecture Notes in Bioinformatics: Proceedings of WABI 2005 3692: 239–250. Myers, S. R. and Griffiths, R. C. (2003). Bounds on the minimum number of recombination events in a sample history. Genetics 163, 375-394. Song, Y.S. (2003) On the combinatorics of rooted binary phylogenetic trees. Annals of Combinatorics, 7:365–379 Song, Y.S., Lyngsø, R.B. & Hein, J. (2005) Counting Ancestral States in Population Genetics. Submitted. Song, Y.S. & Hein, J. (2005) Constructing Minimal Ancestral Recombination Graphs. J. Comp. Biol., 12:147–169 Song, Y.S. & Hein, J. (2004) On the minimum number of recombination events in the evolutionary history of DNA sequences. J. Math. Biol., 48:160–186. Song, Y.S. & Hein, J. (2003) Parsimonious reconstruction of sequence evolution and haplotype blocks: finding the minimum number of recombination events, Lecture Notes in Bioinformatics, Proceedings of WABI'03, 2812:287–302. Song YS, Wu Y, Gusfield D. Efficient computation of close lower and upper bounds on the minimum number of recombinations in biological sequence evolution.Bioinformatics. 2005 Jun 1;21 Suppl 1:i413-i422. Wiuf, C. Inference on recombination and block structure using unphased data.Genetics. 2004 Jan;166(1):537-45.