Lecture 3: population genetics I: mutation and recombination

Slides:



Advertisements
Similar presentations
EVOLUTION OF POPULATIONS
Advertisements

Alleles = A, a Genotypes = AA, Aa, aa
Evolution of Populations
Sampling distributions of alleles under models of neutral evolution.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Lecture 9: Introduction to Genetic Drift February 14, 2014.
Modeling Populations forces that act on allelic frequencies.
Exam Thursday Covers material through Today’s lecture Practice problems and answers are posted Bring a calculator 5 questions, answer your favorite 4 Please.
Hardy-Weinberg Equilibrium
 Read Chapter 6 of text  Brachydachtyly displays the classic 3:1 pattern of inheritance (for a cross between heterozygotes) that mendel described.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Section 3 Characterizing Genetic Diversity: Single Loci Gene with 2 alleles designated “A” and “a”. Three genotypes: AA, Aa, aa Population of 100 individuals.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
14 Molecular Evolution and Population Genetics
Lecture 3: population genetics II: selection
2: Population genetics break.
Population Genetics What is population genetics?
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: drift and mutation.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Modeling evolutionary genetics Jason Wolf Department of ecology and evolutionary biology University of Tennessee.
Mendelian Genetics in Populations – 1
Population Genetics. Macrophage CCR5 CCR5-  32.
Population Genetics.
Population Genetics Reconciling Darwin & Mendel. Darwin Darwin’s main idea (evolution), was accepted But not the mechanism (natural selection) –Scientists.
Population Genetics Learning Objectives
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: models and drift.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
The Structure, Function, and Evolution of Biological Systems Instructor: Van Savage Spring 2010 Quarter 4/1/2010.
Broad-Sense Heritability Index
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: models and drift.
Medical Genetics 08 基因变异的群体行为 Population Genetics.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
The Evolution of Populations.  Emphasizes the extensive genetic variation within populations and recognizes the importance of quantitative characteristics.
Chapter 23 Notes The Evolution of Populations. Concept 23.1 Darwin and Mendel were contemporaries of the 19 th century - at the time both were unappreciated.
Population genetics and Hardy-Weinberg equilibrium.
Lab 6: Genetic Drift and Effective population size.
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift A. Sampling Error.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
1 Population Genetics Definitions of Important Terms Population: group of individuals of one species, living in a prescribed geographical area Subpopulation:
Population and Evolutionary Genetics
Lab 6: Genetic Drift and Effective Population Size
Evolution of Populations. The Smallest Unit of Evolution Natural selection acts on individuals, but only populations evolve – Genetic variations contribute.
Lecture 20 : Tests of Neutrality
NEW TOPIC: MOLECULAR EVOLUTION.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
Mechanisms of Evolution  Lesson goals:  1. Define evolution in terms of genetics.  2. Using mathematics show how evolution cannot occur unless there.
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
A Little Intro to Statistics What’s the chance of rolling a 6 on a dice? 1/6 What’s the chance of rolling a 3 on a dice? 1/6 Rolling 11 times and not getting.
Modelling evolution Gil McVean Department of Statistics TC A G.
The Hardy-Weinberg theorem describes the gene pool of a nonevolving population. This theorem states that the frequencies of alleles and genotypes in a.
Evolution of Populations. Individual organisms do not evolve. This is a misconception. While natural selection acts on individuals, evolution is only.
Evolution of Populations
Chapter 23 The Evolution of Populations. Modern evolutionary theory is a synthesis of Darwinian selection and Mendelian inheritance Evolution happens.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
LECTURE 9. Genetic drift In population genetics, genetic drift (or more precisely allelic drift) is the evolutionary process of change in the allele frequencies.
Measuring genetic variability Studies have shown that most natural populations have some amount of genetic diversity at most loci locus = physical site.
Lecture 3 - Concepts of Marine Ecology and Evolution II 3) Detecting evolution: HW Equilibrium Principle -Calculating allele frequencies, predicting genotypes.
Lecture 6 Genetic drift & Mutation Sonja Kujala
III. Modeling Selection
The Evolution of Populations
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
What is evolution? Change through time Decent with modification
The coalescent with recombination (Chapter 5, Part 1)
Presentation transcript:

Lecture 3: population genetics I: mutation and recombination Genome evolution Lecture 3: population genetics I: mutation and recombination

Population genetics Drift: The process by which allele frequencies are changing through generations Mutation: The process by which new alleles are being introduced Recombination: the process by which multi-allelic genomes are mixed Selection: the effect of fitness on the dynamics of allele drift Epistasis: the effects of fitness dependencies among different alleles “Organismal” effects: Ecology, Geography, Behavior

Wright-Fischer model for genetic drift individuals ∞ gametes N individuals ∞ gametes We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0) We can model the frequency as a Markov process on a variable X (the number of A alleles) with transition probabilities: Sampling j alleles from a population 2N population with i alleles. In larger population the frequency would change more slowly (the variance of the binomial variable is pq/2N – so sampling wouldn’t change that much) Loss 1 2N-1 2N Fixation

Mutations vs Drift Diversity (q)= chance of having same genotype on two random individuals Mutations are generating population diversity Drift is eliminating population’s diversity through fixation Mutation is happening is some biologically dependent rate m (more on that later in the course) Fixation is happening in a rate of ~4N generation How will the population look like given both forces?

Stationary distribution when drift is dominating If mutations is slow compared to drift, we can model the population as a single random variable. Then evolution is a Markov process on two or more states of that variables Simplest model: assume two alleles, and mutations probabilities: If the process is running long enough, we will converge to a stationary distribution: A a Remember – under these assumption, we are likely to sample the entire population at either A or a state. Think what conditions on the mutation rate can justify this model?

What happen when mutations are rapid? If mutations is rapid compared to drift, we lose all population structure This is just a random mixing process Evolution cannot work in this way – information must be propagated In practice, population maintain a non-trivial balance between mutation and drift But we do not know the mutation rate (or the effective population size)

A coalescent model approach: Infinite alleles model When alleles where measure at the protein levels, it was reasonable to assume mutations were generating new variants (isozymes) – never reversing or repeating a variants Adding mutations with probability m, the coalescent process is extended by killing lineages (time is speeded up by a 2N factor): Coalescent: mutation: Back in time “Coalescent with killing”

Hoppe’s Urn Probability model (Hoppe’s Urn): Selecting from an urn with one black ball of mass q and more balls with other colors and mass 1. Each time the black ball is selected, a new ball with a new color is added to the urn. If another color is selected, the selected ball and another ball from the same color are returned to the urn. Theorem: Hoppe’s Urn and the Coalescent with killing are equivalent Probability = 1/(n+q) Probability = q/(n+q) (The Chinese restaurant process)

Testing the infinite alleles model Theorem (Ewens sampling formula): Let ai be the number of alleles present i times in a sample of size n. When the scaled mutation rate is q=4Nm, A simplified statistics is the number of distinct alleles. This should have the expected value: Proof: At each step of the Hoppe’s process, we draw the black ball with probability:

Testing the infinite alleles model Not quite neutral Highly non neutral Figure 7.16,7.17 VNTR locus in humans: observed (open columns) and Ewens predicted allele counts. F computed from the number of Xdh alleles in 89 D. pseudoobscura lines gene: 52 had a common allele, 8 singletons. Compared to a simulation assuming the infinite allele model.

Infinite sites model In the infinite sites model, mutations occur at distinct sites, exactly once. This model is appropriate for long DNA sequences Theorem: Let m be the mutation rate for a locus under consideration, and set q=4Nm. Under the infinite sites model, the expected number of segregating sites is: Proof: Let tj be the amount of time in the coalescent during which there are j lineages. We showed earlier that tj has approximately an exponential distribution with mean 2/(j(j-1)). The total amount of time in the tree for a sample size n is: Mutations occur at rate 2Nm:

Infinite sites model Theorem: q=4Nm. Under the infinite sites model, the number of segregating sites Sn has Proof: Let sj be the number of segregating sites created when there were j lineages. While there are j lineages, we may get mutations at rate 2Nmj, and coalescence at rate j(j-1)/2. Mutations occur before coalescence with probability: k successes: It’s a shifted geometric distribution:

Watterson’s estimator, using the infinite site model We can estimate q=4Nm from an empirical Sn Theorem: For the Watterson’s estimator So we can build a model of the population from as little data as S What will happen if we want to incorporate more complex models? (e.g., expansion, migration?)

Finite alleles model If we think of a single DNA base, we only have 4 possible alleles Our model must the include recurrent mutations A G T C Even if we assume neutrality, our mutations can be come dependent We may have different rates at different sites We may have coupling of one base and the bases nearby We may need to consider insertions and deletions Importantly, if all these are neutral, then the basic coalescent structure is not affected The Poission process: Expected = lt

Using simulations The sampling procedure: Generate a large number of populations (using the model we presented) Compute the distribution of your statistics on this random case Compare it to the value you observe in your population if you find a significant bias, some modeling assumption must be wrong In principle, we can sample generation after generation, for sufficient time (how much?) Direct simulation using Wright-Fischer is painfully expensive (why?) If you are only interested in the current population, most of your coin tossing will be useless We can use the coalescent approach and just sample genealogies, going back in time For example, using the coalescent with killing Important: this is analogous to first sample a tree and then scatter the mutations there We can also think of simulation evolution while ignoring the population, based on the Markov process shown above (what are the limitations here?)

Recombination and linkage Assume two loci have alleles A1,A2, B1,B2 Only double Heterozygous can allow recombination to change allele frequencies: Linkage equilibrium: A1 B1 A2 B2 A1B1/ A2B2 A1 B2 A1B2/ A1B2 A2 B1 The recombination fraction r: proportion of recombinant gametes generated from double heterozygote For different chromosomes: r = 0.5 For the same chromosome, function of the distance and possibly other factors

Linkage disequilibrium (LD) A2 B1 A1 B1 A2 B2 A1 B2 r 1-r Recombination on any A1- / -B1 No recomb Next generation: Define the linkage disequilibrium parameter D as: D r=0.05 r=0.5 r=0.2 Generation

Linkage disequilibrium (LD) - example blood group genotypes M/N and S/s. Both alleles in Hardy-Weinberg For M/N – p1 = 0.5425 p2 = 0.4575 For S/s – q1 = 0.3080 q2 = 0.6920 Observed unlinked 334.2 484 MS 750.8 611 Ms 281.8 142 NS 633.2 773 Ns Linkage equilibrium highly unlikely!

Sources of Linkage disequilibrium LD in original population that was not stabilized due to low r Genetic coadaptation: regions of the genome that are not subject to recombination (for example, inverted chromosomal fragments) Admixture of populations with different allele frequencies:

Recombination rates in the human population: LD blocks

Recombination rates in the human population Recombination rates are highly non uniform – with major effects on genome structure!

Selection Fitness: the relative reproductive success of an individual (or genome) Fitness is only defined with respect to the current population. Fitness is unlikely to remain constant in all conditions and environments Sampling probability is multiplied by a selection factor 1+s Mutations can change fitness A deleterious mutation decrease fitness. It would therefore be selected against. This process is called negative or purifying selection. A advantageous or beneficial mutation increase fitness. It would therefore be subject to positive selection. A neutral mutation is one that do not change the fitness.

The Moran model Instead of working with discrete generation, we replace at most one individual at each time step A A A Replace by sampling from the current population a a X A A A a a a A A A A A A We assume time steps are small, what kind of mathematical models is describing the process?

Continuous time Markov processes Conditions on transitions: Markov Kolmogorov Theorem: exists (may be infinite) exists and finite

Rates and transition probabilities The process’s rate matrix: Transitions differential equations (backward form):

The Moran model Replace by sampling from the current X population Assume the rate of replacement for each individual is 1, We derive a model similar to Wright-Fischer, but in continuous time. A process on a random variable counting the number of allele A: Loss 1 i-1 i i+1 2N-1 2N Fixation “Birth” Rates: “Death”

Fixation probability Loss Fixation “Birth” Rates: “Death” 1 i-1 i i+1 2N-1 2N Fixation “Birth” Rates: “Death” In fact, in the limit, the Moran model converge to the Wright-Fischer model, for example: Theorem: When going backward in time, the Moran model generate the same distribution of genealogy as Wright-Fischer, only that the time is twice as fast Theorem: In the Moran model, the probability that A becomes fixed when there are initially I copies is i/2N Proof: like the proof for the Wright-Fischer model. The expected X value is unchanged since the probability of births and deaths is the same

Fixation time Expected fixation time assuming fixation Theorem: In the Moran model, let p = i / 2N, then: Proof: not here..