Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.

Slides:



Advertisements
Similar presentations
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Advertisements

Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Coalescent Module- Faro July 26th-28th 04 Monday H: The Basic Coalescent W: Forest Fire W: The Coalescent + History, Geography.
Evolution & Design Principles in Biology: a consequence of evolution and natural selection Rui Alves University of Lleida Course Website:
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
Recombination and genetic variation – models and inference
Sampling distributions of alleles under models of neutral evolution.
A New Model for Coalescent with Recombination Zhi-Ming Ma ECM2013 PolyU
N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a ~ p, of success 5/20/2015Comp 790– Continuous-Time.
Molecular Evolution Revised 29/12/06
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Islands in Africa: a study of structure in the source population for modern humans Rosalind Harding Depts of Statistics, Zoology & Anthropology, Oxford.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Continuous Coalescent Model
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Probabilistic methods for phylogenetic trees (Part 2)
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Combinatorics & the Coalescent ( ) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number.
Molecular phylogenetics
Gil McVean Department of Statistics, Oxford Approximate genealogical inference.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Genomic diversity and differentiation heading toward exam 3.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogeny Ch. 7 & 8.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
A Little Intro to Statistics What’s the chance of rolling a 6 on a dice? 1/6 What’s the chance of rolling a 3 on a dice? 1/6 Rolling 11 times and not getting.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Minimal Recombinations Histories and Global Pedigrees Finding Minimal Recombination Histories Acknowledgements Yun Song - Rune Lyngsø - Mike Steel - Carsten.
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Montgomery Slatkin  The American Journal of Human Genetics 
COALESCENCE AND GENE GENEALOGIES
L4: Counting Recombination events
Estimating Recombination Rates
Testing the Neutral Mutation Hypothesis
The coalescent with recombination (Chapter 5, Part 1)
Recombination, Phylogenies and Parsimony
Montgomery Slatkin  The American Journal of Human Genetics 
Trees & Topologies Chapter 3, Part 2
Trees & Topologies Chapter 3, Part 2
Outline Cancer Progression Models
The Divergence of Neandertal and Modern Human Y Chromosomes
The Divergence of Neandertal and Modern Human Y Chromosomes
Presentation transcript:

Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.

Haploid Reproduction Model (i.e. no recombination) 122N Individuals are made by sampling with replacement in the previous generation. The probability that 2 alleles have same ancestor in previous generation is 1/2N. The probability that k alleles have less than k-1 ancestors in previous generation is vanishing.

0 recombinations implies traditional phylogeny

Diploid Model with Recombination 12NmNm 12NfNf 12NmNm 12 NfNf FemalesMales

A recombinant sequence will have have two different ancestor sequences in the grandparent. The Diploid Model Back in Time.

1- recombination histories I: Branch length change

1- recombination histories II: Topology change

1- recombination histories III: Same tree

1- recombination histories IV: Coalescent time must be further back in time than recombination time c r

Recombination Histories V: Multiple Ancestries.

Recombination Histories VI: Non-ancestral bridges

Summarising new phenomena in recombination-phylogenies Consequence of 1 recombination Branch length change Topology change No change Time ranking of internal nodes Multiple Ancestries Non-ancestral bridges What is the probability of different histories?

Coalescence +Recombination (Hudson(1983)) r = probability for a recombination within a dinucleotide pr. generation.  = r*(L-1)*4N= Expected number of recombinations/(gene*4N generations). 1. Waiting time backward until first recombination is expo(  ) distributed. ex. gene 1000 bp r = 10 -8, N = 10 4, generation span 30 years. Waiting time for a recombination/coalescence: 10 5 /2*10 4 generations. 2. The position will be chosen uniformly on the gene.

Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 1 2  3 2  6 2  3 (2+b)  1 (1+b)  0  b

Back-in-Time Process Two kinds of operations on sequence sets going backward in time. Each sequence is consists of intervals and each interval is labelled with subsets of {1,..,k} - possibly the empty set. A recombination takes one sequence and a position and generates two sequences: Coalescent: {5} {1,2,4} {5} {Ø} {1,2,4}{1,4} {Ø} {1,4} {1,2,4}{1,4} {Ø} {5} {6}{Ø} {1,4} {1,2,4} {6} {1,2,4,6} {5,6} Rates: 1 2 k Example: Recombination:  *length of red Coalescent:

Grand Most Recent Common Ancestor: GMRCA (griffiths & marjoram, 96) i. Track all sequences including those that has lost all ancestral material. ii. The G-ARG contains the ARG. The graph is too large, but the process is simpler. Sequence number - k. Birth rate:  *k/2 Death rate: 123k E(events until {1}) = (asymp.) exp(  ) +  log(n)

Properties of Neighboring Trees. (partially from Hudson & Kaplan 1985) /\ /\ / \ /\ \ /\ /\ /\ \ \ ! Leaves Root Edge-Length Topo-Diff Tree-Diff

Old +Alternative Coalescent Algorithm Adding alleles one-by-one to a growing genealogy Old

Spatial Coalescent-Recombination Algorithm (Wiuf & Hein 1999 TPB) 1. Make coalescent for position Wait Expo(Total Branch length) until recombination point, p. 3. Pick recombination point (*) uniformly on tree branches. 4. Let new sequence coalesce into genealogical structure. Continue 1-4 until p > L.

Properties of the spatial process i. The process is non-Markovian ii. The trees cannot be reduced to Topologies * = *

How many Genetic Ancestors does a population have? MRCA No recombination Mitochondria Y-chromosome Recombination X + autosomal chromosomes Present sample: MRCA at each position: (Sample dependent) Recombination- Coalescence Equilibrium: (Sample independent )

Tracing one sequence back in time. From Wiuf & Hein 1997

One realisation of a set of ancestors Number of Ancestors seen as function of sequence length: Number of Segments seen as function of sequence length:

From Wiuf & Hein 1997

Number of ancestors to the Human Genome S  – number of Segments L  – amount of ancestral material on sequence 1.  = 4N e *r – N e : Effective population size, r: expected number of recombinations per generation. Theoretical Results E(S  ) = 1 +  E(L  ) = log(1+  ) P(number of segments in [0,  ]) as  -->infinity) > 0 Applications to Human Genome Parameters used 4N e Chromos. 1: 263 Mb. 263 cM Chromosome 1: Segments Ancestors All chromosomes Ancestors Physical Population Mill.

Gene Conversion Recombination: Gene Conversion:

From Wiuf & Hein 2000

Consequences of Recombination Incompatible Sites Varying Divergence Along Sequence Convergence from high correlation to no correlation along sequence. Topology Shifts along sequences.

Compatibility A T G T G T C 2 A T G T G A T 3 C T T C G A C 4 A T T C G T A i i i i. 3 & 4 can be placed on same tree without extra cost. ii. 3 & 6 cannot Definition: Two columns are incompatible, if they are more expensive jointly, than separately on the cheapest tree. Compatibility can be determined without reference to a specific tree!!

Hudson’s R M (k positions can at most have (k+1) types without recombination) ex. Data set: A underestimate for the number of recombination events: If you equate R M with expected number of recombinations, this would be an analogue to Watterson’s estimators. Unfortunately, R M is a gross underestimate of the real number of recombinations.

Recombination Parsimony T i-1iL 21 Data Trees Recursion:W(T,i)= min T’ {W(T’,i-i) + subst(T,i) + d rec (T,T’)} Fast heuristic version can be programmed.

Recombination Parsimony: Example - HIV Costs: Recombination Substitutions - (2-5)

Likelihood approach to recombination ATTCGTAATGTGT C ATGTGA T CTTCGA C Data Mutation Coalescence Recombination Griffiths,Tavaré (1994), Griffiths, Marjoram (1996) & Fearnhead, Donnelly (2001) i. Probability of Data as function of parameters (likelihood) ii. Statements about sequence history (ancestral analysis) iii. Hypothesis testing iv. Model Testing

Likelihood approach to recombination (Griffiths, Marjoram (1996) )

Summary What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.