N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a ~ p, of success 5/20/2015Comp 790– Continuous-Time.

Slides:



Advertisements
Similar presentations
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Advertisements

Sampling: Final and Initial Sample Size Determination
Sampling Distributions
Sampling Distributions (§ )
Sampling distributions of alleles under models of neutral evolution.
Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.
Lecture 23: Introduction to Coalescence April 7, 2014.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Sampling Variability & Sampling Distributions.
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Mutual Information Mathematical Biology Seminar
Random-Variate Generation. Need for Random-Variates We, usually, model uncertainty and unpredictability with statistical distributions Thereby, in order.
Effective Population Size Real populations don’t satisfy the Wright-Fisher model. In particular, real populations exhibit reproductive structure, either.
1 Introduction to Estimation Chapter Introduction Statistical inference is the process by which we acquire information about populations from.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Continuous Coalescent Model
Statistics.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #18 3/6/02 Taguchi’s Orthogonal Arrays.
Statistics Lecture 5. Last class: measures of spread and box-plots Last Day - Began Chapter 2 on probability. Section 2.1 These Notes – more Chapter.
Continuous Random Variables and Probability Distributions
Tests of Hypothesis [Motivational Example]. It is claimed that the average grade of all 12 year old children in a country in a particular aptitude test.
Lecture II-2: Probability Review
Distribution Function properties. Density Function – We define the derivative of the distribution function F X (x) as the probability density function.
Separate multivariate observations
Extensions to Basic Coalescent Chapter 4, Part 1.
Random variables Petter Mostad Repetition Sample space, set theory, events, probability Conditional probability, Bayes theorem, independence,
02/25/05© 2005 University of Wisconsin Last Time Meshing Volume Scattering Radiometry (Adsorption and Emission)
Probability theory 2 Tron Anders Moger September 13th 2006.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
Random Sampling, Point Estimation and Maximum Likelihood.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
Extensions to Basic Coalescent Chapter 4, Part 2.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Lecture 2 Basics of probability in statistical simulation and stochastic programming Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius,
1 Chapter 8 Sampling Distributions of a Sample Mean Section 2.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
Chapter 8 Sampling Variability and Sampling Distributions.
Economics 173 Business Statistics Lecture 3 Fall, 2001 Professor J. Petry
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Generalized Linear Models (GLMs) and Their Applications.
1 Chapter 9: Sampling Distributions. 2 Activity 9A, pp
Sampling and estimation Petter Mostad
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
SS r SS r This model characterizes how S(t) is changing.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Selection and Recombination Temi avanzati di Intelligenza Artificiale - Lecture 4 Prof. Vincenzo Cutello Department of Mathematics and Computer Science.
A Little Intro to Statistics What’s the chance of rolling a 6 on a dice? 1/6 What’s the chance of rolling a 3 on a dice? 1/6 Rolling 11 times and not getting.
Where Are You? Children Adults.
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers.
Sampling Distribution
Sampling Distribution
The coalescent with recombination (Chapter 5, Part 1)
Trees & Topologies Chapter 3, Part 2
LECTURE 09: BAYESIAN LEARNING
Sampling Distributions (§ )
Simulation Berlin Chen
Chapter 11 Probability.
Presentation transcript:

N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a ~ p, of success 5/20/2015Comp 790– Continuous-Time Coalescence1

Review N-genes Likelihood k genes have a distinct lineage is: Manipulating a little Where, for large N, 1/N 2 is negligible 5/20/2015Comp 790– Continuous-Time Coalescence2 The 1 st gene can choose its parent freely, but the next k-1 must choose from the remainder Genes without a child

Approx N-gene Coalescence Approximate probability k-genes have different parents: The probability two or more have a common parent: Repeated distinct lineages for j generations leads to a geometric distribution, with 5/20/2015Comp 790– Continuous-Time Coalescence3 Recall that the 2-gene case had a similar form, but with 1 in place of the combinatorial. Here the combinatorial terms accounts for all possible k-choose-2 pairs, which are treated independently

Impact of Approximation Approximation is not “proper” for all values of k < 2N Considering the following values of N 5/20/2015Comp 790– Continuous-Time Coalescence4 N k

Fix N and Vary k Comparing the actual to the approximation 5/20/2015Comp 790– Continuous-Time Coalescence5

Concrete Example In a population of 2N = 10 the probability that 3 genes have one ancestor in the previous generation is: The probability that all 3 have a different ancestor is: The remaining probability is that the 3 genes have two parents in the previous generation 5/20/2015Comp 790– Continuous-Time Coalescence6 The 1 st gene can choose its parent freely, while the next 2 must choose the same one The i st gene can choose its parent from the 10, while the next 2 must choose the remainder

Example Continued The probability is that 2 or more genes have common parents in the previous generation is: By our approximation term the probability that two or more genes share a common parent is: Leads to a MRCA estimate of 5/20/2015Comp 790– Continuous-Time Coalescence7 The probability that 2 have common parents plus the probability all 3 have a common parent Error in approximation for k=3, 2N=10

For Large N and Small k For 2N > 100, the agreement improves, so long as k << 2N The advantage of the approximation is that it fit’s the “form” of a geometric distribution, an thus can be generalized to a continuous-time model 5/20/2015Comp 790– Continuous-Time Coalescence8

Continuous-time Coalescent In the Wright-Fisher model time is measures in discrete units, generations. A continuous time approximation is conceptually more useful, and via the given approximation, computationally simple Moreover, a continuous model can be constructed that is independent of the population size (2N), so long as our sample size, k, is much smaller (one of those rare cases where a small sample size simplifies matters) The only time we will need to consider population size (2N) is when we want to convert from time back into generations. 5/20/2015Comp 790– Continuous-Time Coalescence9

Continuous-time Derivation As before, let, where j is now time measured in generations It follows that j = 2Nt translates continuous time, t, back into generations j. In practice floor(2Nt) is used to assign a discrete generation number. The waiting time,, for k genes to have k – 1 or fewer ancestors is exponentially distributed,, derived from t = j/2N, M=2N and Giving: 5/20/2015Comp 790– Continuous-Time Coalescence10 The probability that k genes will have k-1 or fewer ancestors at some time greater than or equal to t

Visualization Plots of, for k = [3, 4, 5, 6] 5/20/2015Comp 790– Continuous-Time Coalescence11 k=3 k=4 k=5 k=6

Continuous Coalescent Time Scale In the continuous-time time constant is a measure of ancestral population size, with the original at time 0, ½ the original at time 0.5, and ¼ at 1.0 5/20/2015Comp 790– Continuous-Time Coalescence t 0 N 2N 2.6N Population size

A Coalescent Model The continuous coalescent lends itself to generative models The following algorithm constructs a plausible genealogy for n genes This model is backwards, it begins from the current populations and posits ancestry, in contrast to a forward algorithm like those used in the first lecture 5/20/2015Comp 790– Continuous-Time Coalescence13 1.Start with k = n genes 2.Simulate the waiting time,, to the next event, 3.Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformly among the pairs 4.Merge I and J into one gene and decrease the sample size by one, k  k -1 5.Repeat from step 2 while k > 1

Properties of a Coalescent Tree The height, H n, of the tree is the sum of time epochs, T j, where there are j = n, n-1, n-2, …, 2, 1 ancestors. The distribution of H n amounts to a convolution of the exponential variables whose result is: Where With 5/20/2015Comp 790– Continuous-Time Coalescence14 As n  ∞, E(H n )  2, and, if n=2, E(H 2 )=1. Thus, the waiting time for n genes to find their common ancestor is less than twice the time for 2!

5/20/2015Comp 790– Continuous-Time Coalescence15