Lecture 3: population genetics II: selection

Slides:



Advertisements
Similar presentations
Option D: Evolution D4: The Hardy- Weinberg Principle.
Advertisements

Sampling distributions of alleles under models of neutral evolution.
Chapter 17 Population Genetics and Evolution, part 2 Jones and Bartlett Publishers © 2005.
14 Molecular Evolution and Population Genetics
Population Genetics What is population genetics?
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: drift and mutation.
Mendelian Genetics in Populations – 1
One-way migration. Migration There are two populations (x and y), each with a different frequency of A alleles (px and py). Assume migrants are from population.
Hardy-Weinberg Equation Measuring Evolution of Populations
AP Biology Measuring Evolution of Populations.
Population Genetics Learning Objectives
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: models and drift.
Genome Evolution © Amos Tanay, The Weizmann Institute Genome evolution Lecture 2: population genetics I: models and drift.
Measuring Evolution of Populations
AP Biology Measuring Evolution of Populations AP Biology There are 5 Agents of evolutionary change MutationGene Flow Genetic DriftSelection Non-random.
Measuring Evolution of Populations
1 1 Population Genetics. 2 2 The Gene Pool Members of a species can interbreed & produce fertile offspring Species have a shared gene pool Gene pool –
Population Genetics youtube. com/watch
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
Genes Within Populations
Lecture 3: population genetics I: mutation and recombination
AP Biology 5 Agents of evolutionary change MutationGene Flow Genetic DriftSelection Non-random mating.
POPULATION GENETICS 1. Outcomes 4. Discuss the application of population genetics to the study of evolution. 4.1 Describe the concepts of the deme and.
Selection Feb. 9, 2015 HUGEN 2022: Population Genetics J. Shaffer Dept. Human Genetics University of Pittsburgh.
Copyright © 2008 Pearson Education Inc., publishing as Pearson Benjamin Cummings Chapter 23 The Evolution of Populations.
We need a mathematical tool to measure how much the population is evolving. Numbers will enable us to evaluate, compare, and then predict evolutionary.
The Evolution of Populations
Chapter 23 – The Evolution of Populations
Chapter 22 Measuring Evolution of Populations Populations & Gene Pools  Concepts  a population is a localized group of interbreeding individuals 
Evolution of Populations. The Smallest Unit of Evolution Natural selection acts on individuals, but only populations evolve – Genetic variations contribute.
Measuring Evolution of Populations
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
Genome Evolution. Amos Tanay 2010 Genome evolution Lecture 4: population genetics III: selection.
Measuring Evolution of Populations
Evolution of Populations. Individual organisms do not evolve. This is a misconception. While natural selection acts on individuals, evolution is only.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Evolution of Populations Population- group of individuals of the same species that live in the same area and interbreed. Gene Pool- populations genetic.
Measuring Evolution of Populations
Measuring Evolution of Populations
Deterministic genetic models
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Population Genetics: Hardy-Weinberg Principle
The Evolution of Populations
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Hardy-Weinberg Part of Chapter 23.
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
The Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Measuring Evolution of Populations
Population Genetics: The Hardy-Weinberg Law
Presentation transcript:

Lecture 3: population genetics II: selection Genome evolution Lecture 3: population genetics II: selection

Population genetics Drift: The process by which allele frequencies are changing through generations Mutation: The process by which new alleles are being introduced Recombination: the process by which multi-allelic genomes are mixed Selection: the effect of fitness on the dynamics of allele drift Epistasis: the drift effects of fitness dependencies among different alleles “Organismal” effects: Ecology, Geography, Behavior

Wright-Fischer model for genetic drift individuals ∞ gametes N individuals ∞ gametes We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0) We can model the frequency as a Markov process on a variable X (the number of A alleles) with transition probabilities: Sampling j alleles from a population 2N population with i alleles. In larger population the frequency would change more slowly (the variance of the binomial variable is pq/2N – so sampling wouldn’t change that much) Loss 1 2N-1 2N Fixation

The Moran model Instead of working with discrete generation, we replace at most one individual at each time step A A A Replace by sampling from the current population a a X A A A a a a A A A A A A We assume time steps are small, what kind of mathematical models is describing the process?

Continuous time Markov processes Conditions on transitions: Markov Kolmogorov Theorem: exists (may be infinite) exists and finite

Rates and transition probabilities The process’s rate matrix: Transitions differential equations (backward form):

The Moran model Replace by sampling from the current X population Assume the rate of replacement for each individual is 1, We derive a model similar to Wright-Fischer, but in continuous time. A process on a random variable counting the number of allele A: Loss 1 i-1 i i+1 2N-1 2N Fixation “Birth” Rates: “Death”

Fixation probability Loss Fixation “Birth” Rates: “Death” 1 i-1 i i+1 2N-1 2N Fixation “Birth” Rates: “Death” In fact, in the limit, the Moran model converge to the Wright-Fischer model, for example: Theorem: When going backward in time, the Moran model generate the same distribution of genealogy as Wright-Fischer, only that the time is twice as fast Theorem: In the Moran model, the probability that A becomes fixed when there are initially I copies is i/2N Proof: like the proof for the Wright-Fischer model. The expected X value is unchanged since the probability of births and deaths is the same

Fixation time Expected fixation time assuming fixation Theorem: In the Moran model, let p = i / 2N, then: Proof: not here..

Selection Fitness: the relative reproductive success of an individual (or genome) Fitness is only defined with respect to the current population. Fitness is unlikely to remain constant in all conditions and environments Sampling probability is multiplied by a selection factor 1+s Mutations can change fitness A deleterious mutation decrease fitness. It would therefore be selected against. This process is called negative or purifying selection. A advantageous or beneficial mutation increase fitness. It would therefore be subject to positive selection. A neutral mutation is one that do not change the fitness.

Neutrality Don’t let it confuse you… Background Directed Purifying Adaptive Negative Positive Forces that drives genomic conservation Forces that drives genome change

Adaptive evolution in a tumor model Selection Human fibroblasts + telomerase Passaged in the lab for many months Spontaneously increasing growth rate V. Rotter

Selection in haploids: infinite populations, discrete generations This is a common situation: Bacteria gaining antibiotic residence Yeast evolving to adapt to a new environment Tumors cells taking over a tissue Allele Frequency Relative fitness Fitness represent the relative growth rate of the strain with the allele A It is common to use s as w=1+s, defining the selection coefficient Gamete after selection Generation t: Ratio as a function of time:

Selection in haploid populations: dynamics Growth = 1.5 We can model it in continuous time: Growth = 1.2 In infinite population, we can just consider the ratios:

Computing w Example (Hartl Dykhuizen 81): E.Coli with two gnd alleles. One allele is beneficial for growth on Gluconate. A population of E.coli was tracked for 35 generations, evolving on two mediums, the observed frequencies were: Gluconate: 0.4555  0.898 Ribose: 0.594  0.587 For Gluconate: log(0.898/0.102) - log(0.455/0.545) = 35logw log(w) = 0.292, w=1.0696 Compare to w=0.999 in Ribose.

Fixation probability: selection in the Moran model When population is finite, we should consider the effect of selection more carefully Loss 1 i-1 i i+1 2N-1 2N Fixation The models assume the fitness is the probability of the offspring to be viable. If it is not, then there will not be any replacement “Birth” Rates: “Death” Theorem: In the Moran model, with selection s>0

Fixation probability: selection in the Moran model Theorem: In the Moran model, with selection s>0 Note: Note: Variant (Kimura 62): The probability of fixation in the Wright-Fischer model with selection is: Reminder: we should be using the effective population size Ne

Fixation probability: selection in the Moran model Theorem: In the Moran model, with selection s>0 Proof: First define: Hitting time Fixation given initial i “A”s The rates of births is bi and of deaths is di, so the probability a birth occur before a death is bi/(bi+di). Therefore:

Fixation probabilities and population size

Selection and fixation Recall that the fixation time for a mutation (assuming fixation occurred) is equal the coalescent time: Theorem: In the Moran model: Theorem (Kimura): (As said: twice slower) Fixation process: 1.Allele is rare – Number of A’s are a superciritcal branching process” Selection 2. Alelle 0<<p<<1 – Logistic differential equation – generally deterministic 3. Alelle close to fixation – Number of a’s are a subcritical branching process Drift

Selection in diploids Assume: Genotype Fitness Frequency (Hardy Weinberg!) There are different alternative for interaction between alleles: a is completely dominant: one a is enough – f(Aa) = f(aa) a is Complete recessive: f(Aa) = f(AA) codominance: f(AA)=1, f(Aa)=1+s, f(aa)=1+2s overdominance: f(Aa) > f(AA),f(aa) The simple (linear) cases are no qualitatively different from the haploid scenario

Mutation-Selection balance When an allele is weakly deleterious, mutations can play a major role in driving allele frequencies Genotype New allele frequency, without mutation Fitness Frequency (HW) New allele frequency, assuming mutation A a ignore (q<<1) What is the equilibrium frequency of the deleterious allele?

Mutation-Selection balance: Huntington disease a neurological genetic disease appearing after age 35 Resulting from a dominant mutation – how does this disease survive in the human population? Although it may be fatal, the fitness is not very low due to the late age of onset (estimated w12=0.81) Human population: 70 per million (Europe) to 1 per million (Africa) h>0, and we can estimate the mutation rate at the Huntington locus, as hsq’ = 10-6 (1-0.81) = 1.9x107 to 70x10-6 (1-0.81) = 1.3x10-6

Mutation-Selection balance: Haldane-Muller The average fitness of the population, given recurrent mutations in rate m at a locus with negative fitness s. Assume perfect recessivity (h=0): Assuming partial dominance (h>0) The Haldane-Muller principle: the effect of mutation on the average population fitness depends only on the mutation rate, not on the fitness of the alleles!!

Overdominance A SNP affecting the beta-globin gene make the encoded protein defected. The resulted red blood cells are curved and elongated, and are removed from the circulation Homozygous for the mutation will usually die from anemia without intensive care Heterozygous individual will have mild anemia, but will deal better with the malaria parasite Plasmodium fliciparum (maybe because infeceted red cells become sickled) wiki (historical) Malaria distribution Sickle-cell anemia

Other types of selection Different fitness for different individuals. e.g., male vs. female For example male genes that take up female resources in mammals This was suggested to lead to the phenomenon of imprinting where cells are expressing only the maternal or paternal allele Imprinted genes are much like haploids

Other types of selection Frequency-, Density-dependent selection: when the fitness depend on the frequency of the allele or the population size. Fecundity selection: different reproductive potential for mating pairs. Effects of heterogeneous environment Effects that apply directly to the haplotype: gametic selection/meiotic drive (e.g., killing your homologous chromosome reproductive potential) Sexual selection: male advertising the reproductive potential, or confronting other males Kin selection: (“origin of altruism”)

Recombination and selection

Linkage and selection Linkage interfere with the purging of deleterious mutations and reduce the efficiency of positive selection! Beneficial Beneficial Beneficial Weakly deleterious Selective sweep or Hitchhiking effect or genetic draft (Gillespie) Hill-Robertson effect

Linkage and selection The variance in allele frequency is used to define the effective population size Simplistically, assume a neutral locus is evolving such that a selective sweep is affecting a fully linked locus at rate d. A sweep will fixate the allele with probability p, and we further assume that the sweep happens instantly: This is very rough, but it demonstrates the basic intuition here: sweeps reduce the effective selection in a way that can be quantified through reduction in the effective population size. C – the average frequency of the neutral allele after the sweep

Infinite alleles model Adding mutations with probability m, the coalescent process is extended by killing lineages (time is speeded up by a 2N factor): Coalescent: mutation: Probability model (Hoppe’s Urn): Selecting from an urn with one black ball of mass q and more balls with other colors and mass 1. Each time the black ball is selected, a new ball with a new color is added to the urn. If another color is selected, the selected ball and another ball from the same color are returned to the urn. Theorem: Hoppe’s Urn and the Coalescent with killing are equivalent Back in time

Infinite sites model In the infinite sites model, mutation occur at distinct sites. It is more adequate for the current datasets that include vast DNA sequences Theorem: Let u be the mutation rate for a locus under consideration, and set q=4Nu. Under the infinite sites model, the expected number of segregating sites is: Proof: Let tj be the amount of time in the coalescent during which there are j lineages. We showed earlier that tj has approximately an exponential distribution with mean 2/(j(j-1)). The total amount of time in the tree for a sample size n is: Mutations occur at rate 2Nu:

Infinite sites model Theorem: q=4Nu. Under the infinite sites model, the number of segregating sites Sn has Proof: Let sj be the number of segregating sites created when there were j lineages. While there are j lineages, we may get mutations at rate 2Nuj, and coalescence at rate j(j-1)/2. Mutations occur before coalescence with probability: k successes: It’s a shifted geometric distribution:

Watterson’s estimator, using the infinite site model We can estimate q=4Nu for Sn Theorem: For the watterson’s estimator It is possible to compute other statistics using the infinite sites model, and compare them to the neutral expectation. This can be very generally done today using sampling: Generate a large number of random genealogies (using the model we presented) Compute the distribution of your statistics on this random case Compare it to the value you observe in your population if you find a singifnicant bias, then the model is wrong, possibly the locus is not neutral