Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.

Slides:



Advertisements
Similar presentations
The multispecies coalescent: implications for inferring species trees
Advertisements

Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Efficient Computation of Close Upper and Lower Bounds on the Minimum Number of Recombinations in Biological Sequence Evolution Yun S. Song, Yufeng Wu,
Coalescent Module- Faro July 26th-28th 04 Monday H: The Basic Coalescent W: Forest Fire W: The Coalescent + History, Geography.
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
Recombination and genetic variation – models and inference
Sampling distributions of alleles under models of neutral evolution.
A New Model for Coalescent with Recombination Zhi-Ming Ma ECM2013 PolyU
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover recombination in populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
My wish for the project-examination It is expected to be 3 days worth of work. You will be given this in week 8 I would expect 7-10 pages You will be given.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Probabilistic methods for phylogenetic trees (Part 2)
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Combinatorics & the Coalescent ( ) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Molecular phylogenetics
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Calculating branch lengths from distances. ABC A B C----- a b c.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Minimal Recombinations Histories and Global Pedigrees Finding Minimal Recombination Histories Acknowledgements Yun Song - Rune Lyngsø - Mike Steel - Carsten.
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
L4: Counting Recombination events
Estimating Recombination Rates
Inferring phylogenetic trees: Distance and maximum likelihood methods
The coalescent with recombination (Chapter 5, Part 1)
Recombination, Phylogenies and Parsimony
Trees & Topologies Chapter 3, Part 2
Trees & Topologies Chapter 3, Part 2
Outline Cancer Progression Models
Presentation transcript:

Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.

Haploid Reproduction Model (i.e. no recombination) 122N i. Individuals are made by sampling with replacement in the previous generation. ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N. iii. The probability that k alleles have less than k-1 ancestors in previous generation is vanishing.

0 recombinations implies traditional phylogeny

Diploid Model with Recombination An individual is made by: 1.The paternal chromosome is taken by picking random father. 2.Making that father’s chromosomes recombine to create the individuals paternal chromosome. Similarly for maternal chromosome.

A recombinant sequence will have have two different ancestor sequences in the grandparent. The Diploid Model Back in Time.

The ancestral recombination graph N 1 Time

1- recombination histories I: Branch length change

1- recombination histories II: Topology change

1- recombination histories III: Same tree

1- recombination histories IV: Coalescent time must be further back in time than recombination time c r

Recombination Histories V: Multiple Ancestries.

Recombination Histories VI: Non-ancestral bridges

Summarising new phenomena in recombination-genealogies Consequence of 1 recombination Branch length change Topology change No change Time ranking of internal nodes Multiple Ancestries Non-ancestral bridges Recombination genealogies are called ”ancestral recombination graphs - ARGs” What is the probability of different histories?

r recombination pr. Nucleotide pair pr.generation. L: seq. length R = r*(L-1) Recombination pr. allele pr.generation. 2N e - allele number  := 4N*R -- Recombination intensity in scaled process. Adding Recombination sequence time Discrete time Discrete sequence Continuous time Continuous sequence 1/(L-1) 1/(2N e ) time sequence Recombination Event:  /2 Waiting time exp(  /2) Position Uniform Recombination versus Mutation: As events, they are identically position and time wise. Mutations creates a difference in the sequence Recombination can create a shift in genealogy locally

Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 1 2  3 2  6 2  3 (2+b)  1 (1+b)  0  b

Age to oldest most recent common ancestor From Wiuf and Hein, 1999 Genetics Scaled recombination rate -  Age to oldest most recent common ancestor 0 kb 250 kb

Properties of Neighboring Trees. (partially from Hudson & Kaplan 1985) Leaves Topo-Diff Tree-Diff

Grand Most Recent Common Ancestor: GMRCA (griffiths & marjoram, 96) i. Track all sequences including those that has lost all ancestral material. ii. The G-ARG contains the ARG. The graph is too large, but the process is simpler. Sequence number - k. Birth rate:  *k/2 Death rate: 123k E(events until {1}) = (asymp.) exp(  ) +  log(k)

Old +Alternative Coalescent Algorithm Adding alleles one-by-one to a growing genealogy Old

Spatial Coalescent-Recombination Algorithm (Wiuf & Hein 1999 TPB) 1. Make coalescent for position Wait Exp(Total Branch length) until recombination point, p. 3. Pick recombination point (*) uniformly on tree branches. 4. Let new sequence coalesce into genealogical structure. Continue 1-4 until p > L.

Properties of the spatial process i. The process is non-Markovian ii. The trees cannot be reduced to Topologies * = *

Compatibility A T G T G T C 2 A T G T G A T 3 C T T C G A C 4 A T T C G T A i i i i. 3 & 4 can be placed on same tree without extra cost. ii. 3 & 6 cannot Definition: Two columns are incompatible, if they are more expensive jointly, than separately on the cheapest tree. Compatibility can be determined without reference to a specific tree!!

Hudson& Kaplan’s R M (k positions can at most have (k+1) types without recombination) ex. Data set: A underestimate for the number of recombination events: If you equate R M with expected number of recombinations, this would be an analogue to Watterson’s estimators. Unfortunately, R M is a gross underestimate of the real number of recombinations.

Myers-Griffiths’ R M Basic Idea: 1 S Define R: R j,k is optimal solution to restricted interval., then: B j,i R j,k R j,i j k i

Recombination Parsimony T i-1iL 21 Data Trees Recursion:W(T,i)= min T’ {W(T’,i-1) + subst(T,i) + d rec (T,T’)} Initialisation:W(T,1)= subst(T,1) W(T,i) - cost of history of first i columns if local tree at i is T subst(T,i) - substitution cost of column i using tree T. d rec (T,T’) - recombination distance between T & T’

Metrics on Trees based on subtree transfers. Trees including branch lengths Unrooted tree topologies Rooted tree topologies Tree topologies with age ordered internal nodes Pretending the easy problem is the real problem, causes violation of the triangle inequality:

Observe that the size of the unit-neighbourhood of a tree does not grow nearly as fast as the number of trees. Allen & Steel (2001) Song (2003) Explicit computation No known formula

The 1983 Kreitman Data (M. Kreitman 1983 Nature from Hartl & Clark 1999)

Methods # of rec events obtained Hudson & Kaplan (1985)5 Myers & Griffiths (2002)6 Song & Hein (2002). Set theory based approach.7 Song & Hein (2003). Current program using rooted trees.7 11 sequences of alcohol dehydrogenase gene in Drosophila melangaster. Can be reduced to 9 sequences (3 of 11 are identical) bp long, 43 segregating sites. We have checked that it is possible to construct an ancestral recombination graph using only 7 recombination events.

1

2

3

4

5 6

7

Quality of the estimated local tree True ARG Reconstructed ARG (1,2) - (3,4,5) (1,2,3) - (4,5) (1,3) - (2,4,5) (1,2,3) - (4,5) n=7 Rho=10 Theta=75 Due to Yun Song

Actual, potentially detectable and detected recombinations True ARG Minimal ARG kb n=8  =15  =40 Leaves Topo-Diff Tree-Diff Due to Yun Song

s 1 = s 2 = s 3 = s 4 = s 5 = xx xx xx Ancestral states Yun Song

Ancestral configurations to 2 sequences with 2 segregating sites: k1k1 k2k2 (k 2 +1)*k 1 +1 possible ancestral columns.

Asymptotic growth? Enumerating ancestral states in minimal histories? Branch and bound method for computing the likelihood? Enumeration of Ancestral States (via counting restricted non-negative integer matrices with given row and column sums) Due to Yun Song

Ignoring recombination in phylogenetic analysis Mimics decelerations/accelerations of evolutionary rates. No & Infinite recombination implies molecular clock. General Practice in Analysis of Viral Evolution!!! RecombinationAssuming No Recombination

Simulated Example

Gene Conversion Recombination: Gene Conversion: Compatibilities among triples: + - -

Gene Conversions & Treeness Pairwise Distances as sequences gets longer and longer Recombination Gene Conversion Coalescent: Star tree:

Summary What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.