Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.

Slides:



Advertisements
Similar presentations
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Advertisements

Coalescent Module- Faro July 26th-28th 04 Monday H: The Basic Coalescent W: Forest Fire W: The Coalescent + History, Geography.
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
Recombination and genetic variation – models and inference
Sampling distributions of alleles under models of neutral evolution.
Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.
A New Model for Coalescent with Recombination Zhi-Ming Ma ECM2013 PolyU
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a ~ p, of success 5/20/2015Comp 790– Continuous-Time.
Molecular Evolution Revised 29/12/06
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Effective Population Size Real populations don’t satisfy the Wright-Fisher model. In particular, real populations exhibit reproductive structure, either.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
2: Population genetics break.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Continuous Coalescent Model
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Combinatorics & the Coalescent ( ) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number.
Extensions to Basic Coalescent Chapter 4, Part 1.
Copyright © 2005 Pearson Education, Inc. publishing as Benjamin Cummings PowerPoint Lectures for Biology, Seventh Edition Neil Campbell and Jane Reece.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Gregory Shook. Darwin’s Handicaps Mendel’s work was published but ignored Didn’t know how traits are inherited Didn’t know how variation appeared.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Copyright © 2005 Pearson Education, Inc. publishing as Benjamin Cummings PowerPoint Lectures for Biology, Seventh Edition Neil Campbell and Jane Reece.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift A. Sampling Error.
Finite population. - N - number of individuals - N A and N a – numbers of alleles A and a in population Two different parameters: one locus and two allels.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
Chapter 13.  Living organisms are distinguished by their ability to reproduce their own kind.  Genetics: is the scientific study of heredity and variation.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
Phylogeny Ch. 7 & 8.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Genetics – Study of heredity is often divided into four major subdisciplines: 1. Transmission genetics, deals with the transmission of genes from generation.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Bellringer Why is genetic diversity beneficial to populations? How does sexual reproduction increase genetic diversity? How does meiosis increase genetic.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
MEIOSIS AND SEXUAL LIFE CYCLES CH 13. I. Overview of Reproduction Asexual reproduction: produces identical offspring (budding, cloning, binary fission/mitosis)
A Little Intro to Statistics What’s the chance of rolling a 6 on a dice? 1/6 What’s the chance of rolling a 3 on a dice? 1/6 Rolling 11 times and not getting.
Modelling evolution Gil McVean Department of Statistics TC A G.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Minimal Recombinations Histories and Global Pedigrees Finding Minimal Recombination Histories Acknowledgements Yun Song - Rune Lyngsø - Mike Steel - Carsten.
Trees & Topologies Chapter 3, Part 2. A simple lineage Consider a given gene of sample size n. How long does it take before this gene coalesces with another.
Meiosis and Sexual Life Cycles. A life cycle is the generation-to- generation sequence of stages in the reproductive history of an organism it starts.
Lecture 6 Genetic drift & Mutation Sonja Kujala
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers.
The coalescent with recombination (Chapter 5, Part 1)
Recombination, Phylogenies and Parsimony
Trees & Topologies Chapter 3, Part 2
Trees & Topologies Chapter 3, Part 2
Outline Cancer Progression Models
Presentation transcript:

Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of sequenced data Parameter Estimation Model Testing Coalescent Theory in Biology www. coalescent.dk TGTTGT CATAGT CGTTAT

Haploid Model Diploid Model Wright-Fisher Model of Population Reproduction i. Individuals are made by sampling with replacement in the previous generation. ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement Assumptions 1.Constant population size 2.No geography 3.No Selection 4.No recombination

P(k):=P{k alleles had k distinct parents} 1 2N 1 2N *(2N-1) *..* (2N-(k-1)) =: (2N) [k] (2N) k k -> any k -> k k -> k-1 Ancestor choices: k -> j For k << 2N: S k,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.

Mean, E(X 2 ) = 2N. Ex.: 2N = , Generation time 30 years, E(X 2 ) = years. Waiting for most recent common ancestor - MRCA P(X 2 = j) = (1-(1/2N)) j-1 (1/2N) Distribution until 2 alleles had a common ancestor, X 2 ?: P(X 2 > j) = (1-(1/2N)) j P(X 2 > 1) = (2N-1)/2N = 1-(1/2N) 1 2N j j

10 Alleles’ Ancestry for 15 generations

1. Simultaneous Events 2. Multifurcations. 3. Underestimation of Coalescent Rates Multiple and Simultaneous Coalescents

corresponds to 2N generations N 0 6 6/2N e t c :=t d /2N e Discrete  Continuous Time

The Standard Coalescent Two independent Processes Continuous: Exponential Waiting Times Discrete: Choosing Pairs to Coalesce WaitingCoalescing (4,5) (1,2)--(3,(4,5)) 1--2 {1}{2}{3}{4}{5} {1,2}{3,4,5} {1,2,3,4,5} {1}{2}{3}{4,5} {1}{2}{3,4,5}

Expected Height and Total Branch Length Expected Total height of tree: H k = 2(1-1/k) i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles. Expected Total branch length in tree, L k : 2*(1 + 1/2 + 1/ /(k-1)) ca= 2*ln(k-1) k 1/ /(k-1) Time Epoch Branch Lengths

Effective Populations Size, N e. In an idealised Wright-Fisher model: i. loss of variation per generation is 1-1/(2N). ii. Waiting time for random alleles to find a common ancestor is 2N. Factors that influences N e : i. Variance in offspring. WF: 1. If variance is higher, then effective population size is smaller. ii. Population size variation - example k cycle: N 1, N 2,..,N k. k/N e = 1/N /N k. N 1 = 10 N 2 = 1000 => N e = 50.5 iii. Two sexes N e = 4N f N m /(N f +N m )I.e. N f - 10 N m N e - 40

6 Realisations with 25 leaves Observations: Variation great close to root. Trees are unbalanced.

Sampling more sequences The probability that the ancestor of the sample of size n is in a sub-sample of size k is Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large.

Probability for two genes being identical: P(Coalescence < Mutation) = 1/(1+  ). m mutation pr. nucleotide pr.generation. L: seq. length µ = m*L Mutation pr. allele pr.generation. 2N e - allele number.  := 4N*µ -- Mutation intensity in scaled process. Adding Mutations sequence time Discrete time Discrete sequence Continuous time Continuous sequence 1/L 1/(2N e ) time sequence  /2 mutation coalescence Note: Mutation rate and population size usually appear together as a product, making separate estimation difficult. 1

Three Models of Alleles and Mutations. Infinite Allele Infinite Site Finite Site acgtgctt acgtgcgt acctgcat tcctgcat acgtgctt acgtgcgt acctgcat tcctggct tcctgcat i. Only identity, non-identity is determinable ii. A mutation creates a new type. i. Allele is represented by a line. ii. A mutation always hits a new position. i. Allele is represented by a sequence. ii. A mutation changes nucleotide at chosen position.   

Infinite Allele Model

Final Aligned Data Set: Infinite Site Model

{},, Ignoring mutation position Ignoring sequence label Ignoring mutation position Ignoring sequence label Labelling and unlabelling:positions and sequences 9 coalescence events incompatible with data 4 classes of mutation events incompatible with data The forward-backward argument

Infinite Site Model: An example Theta=

Impossible Ancestral States

Final Aligned Data Set: acgtgctt acgtgcgt acctgcat tcctgcat s s s Finite Site Model

Diploid Model with Recombination An individual is made by: 1.The paternal chromosome is taken by picking random father. 2.Making that father’s chromosomes recombine to create the individuals paternal chromosome. Similarly for maternal chromosome.

A recombinant sequence will have have two different ancestor sequences in the grandparent. The Diploid Model Back in Time.

1- recombination histories I: Branch length change

1- recombination histories II: Topology change

1- recombination histories III: Same tree

1- recombination histories IV: Coalescent time must be further back in time than recombination time c r

Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 1 2  3 2  6 2  3 (2+b)  1 (1+b)  0  b

Age to oldest most recent common ancestor From Wiuf and Hein, 1999 Genetics Scaled recombination rate -  Age to oldest most recent common ancestor 0 kb 250 kb

S  – number of Segments E(S  ) = 1 +  Number of genetic ancestors to the Human Genome sequence time R R R C C C Statements about number of ancestors are much harder to make. Simulations

A randomly picked ancestor: (ancestral material comes in batteries!) Mb Mb * kb *250 Parameters used 4N e Chromos. 1: 263 Mb. 263 cM Chromosome 1: Segments Ancestors All chromosomes Ancestors Physical Population Mill. Applications to Human Genome (Wiuf and Hein,97)

Ignoring recombination in phylogenetic analysis Mimics decelerations/accelerations of evolutionary rates. No & Infinite recombination implies molecular clock. General Practice in Analysis of Viral Evolution!!! RecombinationAssuming No Recombination

Simulated Example

Genotype and Phenotype Covariation: Gene Mapping Time Result:The Mapping Function Reich et al. (2001) Decay of local dependency A set of characters. Binary decision (0,1). Quantitative Character. Dominant/Recessive. Penetrance Spurious Occurrence Heterogeneity genotype Genotype  Phenotype phenotype Genetype -->Phenotype Function Sampling Genotypes and Phenotypes