Molecular evolution: traditional tests of neutrality

Slides:



Advertisements
Similar presentations
Micro Evolution -Evolution on the smallest scale
Advertisements

Population Genetics 1 Chapter 23 in Purves 7 th edition, or more detail in Chapter 15 of Genetics by Hartl & Jones (in library) Evolution is a change in.
Population Genetics: Selection and mutation as mechanisms of evolution Population genetics: study of Mendelian genetics at the level of the whole population.
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.
Option D: Evolution D4: The Hardy- Weinberg Principle.
Alleles = A, a Genotypes = AA, Aa, aa
Sampling distributions of alleles under models of neutral evolution.
BIOE 109 Summer 2009 Lecture 5- Part I Hardy- Weinberg Equilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Population Genetics. Mendelain populations and the gene pool Inheritance and maintenance of alleles and genes within a population of randomly breeding.
14 Molecular Evolution and Population Genetics
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Population Genetics What is population genetics?
Scott Williamson and Carlos Bustamante
Brachydactyly and evolutionary change
Population Genetics Learning Objectives
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Lecture 21: Tests for Departures from Neutrality November 9, 2012.
Broad-Sense Heritability Index
The Hardy-Weinberg Theorem describes a nonevolving population
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
Evolution Chapters Evolution is both Factual and the basis of broader theory What does this mean? What are some factual examples of evolution?
Lab 11 :Test of Neutrality and Evidence for Selection.
Lecture 3: population genetics I: mutation and recombination
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
Selectionist view: allele substitution and polymorphism
Lecture 20 : Tests of Neutrality
NEW TOPIC: MOLECULAR EVOLUTION.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Lab 11 :Test of Neutrality and Evidence for Selection
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Modelling evolution Gil McVean Department of Statistics TC A G.
The Hardy-Weinberg theorem describes the gene pool of a nonevolving population. This theorem states that the frequencies of alleles and genotypes in a.
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Evolution and Population Genetics
Hudson Kreitman Aguadé 1987
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers.
Causes of Variation in Substitution Rates
Signatures of Selection
Hardy-Weinberg Theorem
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
The Neutral Theory M. Kimura, 1968
The Evolution of Populations
Evolution as Genetic Change
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Diversity of Individuals and Evolution of Populations
Testing the Neutral Mutation Hypothesis
Is the CFTR allele maintained by mutation/selection balance?
I. Population Evolution
Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium
Hardy Weinberg: Population Genetics
Population genetics and Hardy-Weinberg
The Mechanisms of Evolution
Modern Evolutionary Biology I. Population Genetics
Modern Evolutionary Biology I. Population Genetics
Hardy-Weinberg Principle
Evolution by Genetic Drift : Main Points (p. 231)
Modern Evolutionary Biology I. Population Genetics
Population Genetics: The Hardy-Weinberg Law
Population Genetics Population
Evolution by Genetic Drift : Main Points (p. 231)
Presentation transcript:

Molecular evolution: traditional tests of neutrality Computational Biology 6.047 10/09/08 Guest Lecture: Molecular evolution: traditional tests of neutrality Dr. Daniel Neafsey Research Scientist, Broad Institute

Mutation+Selection=Evolution Relative importance of each for maintaining variation in population?

Early Criticism of Darwin Blending inheritance, ‘gemmules’ Fleeming Jenkin (1867): Var[X(t+1)] = ½ Var[X(t)] X =

Mendelian Inheritance published 1865-66, rediscovered 1900 Law of Segregation: allelic variation offspring receive 1 allele from each parent dominance/recessivity parental alleles ‘segregate’ to form gametes Law of Independent Assortment

Simple case: no selection The Hardy-Weinberg Law (1908) Requires: infinite population size random mating non-overlapping generations no selection, mutation, or migration

The Hardy-Weinberg Law Genotype: AA Aa aa Frequency at time 0: u0 v0 w0 u0 + v0 + w0 = 1 frequency of A (p0) = u0 + v0/2 frequency of a (q0) = w0 + v0/2 p0 + q0 = 1

The Hardy-Weinberg Law Genotype: AA Aa aa Frequency at time 0: u0 v0 w0 Mating Pair Frequency Offspring AA Aa aa AA x AA u02 1 0 0 AA x Aa u0v0 ½ ½ 0 Aa x AA u0v0 ½ ½ 0 Aa x Aa v02 ¼ ½ ¼ Frequency of AA in next generation: u1 = u02 + u0v0 + 1/4 v02 = (u0 + v0 /2)2 = p02

The Hardy-Weinberg Law If assumptions met: allele frequencies don’t change after a single generation of random mating, genotype frequencies are: u = p2 v = 2pq w = q2 entire system characterized by one parameter (p) Deviation from expectations indicates failure of 1 or more assumptions—selection?

HW application: Sickle cell anemia Observed Expected Counts Counts SS 834 Ss 161 ss 5 =SS =ss 2pq *1000= 129 p = Ö0.834 = 0.91 q= Ö0.005 = 0.071

Approach: Detect selection through comparison to neutral expectation Kimura: neutral theory Ewens: sampling formula Coalescence

Neutral Theory History Motoo Kimura (1924-1994) 1968: a large proportion of genetic change is not driven by selection Adapted diffusion approximations to genetics Dealt with finite pops Paph. Kimura Birdmoor x Kimura’s Flight ‘In-Charm’

Genetic Drift no drift infinite pop drift finite pop

Neutral allele diffusion (from Kimura 1954)

Ewens sampling formula (1972) built on foundations of diffusion theory extended idea of ‘identity by descent’ (ibd) sample-based shifted focus to inferential methods introduced ‘infinite alleles’ model How much variation do we expect in a sample?

Infinite alleles model infinite number of states into which an allele can mutate, therefore each mutation assumed unique (protein-centric) 2Nm new alleles introduced each generation, derived from existing alleles initial allele frequency = 1/(2N) every allele eventually lost

Infinite alleles model Under diffusion, probability of an allele whose frequency is between x and x+dx is: where N = population size m = mutation rate

Expected Site Frequencies E(ni) i

Ewens Sampling formula Probability that a sample of n gene copies contains k alleles and that there are a1, a2, …, an alleles represented 1,2, …,n times in the sample: where and aj is the number of alleles found in j copies

The Coalescent Alternate, ‘backwards’ approach to generating expected allele frequency distributions t4 t3 t2 i = 3 or 1 infer tree structure (genealogy), because tree structure dictates pattern of polymorphism in data i = 2 i = 1

The Coalescent How far back in time did a sample share a common ancestor? Tpop» 4N generations Tsamp time present

Coalescent inference summary statistics obviate need to actually sum over all genealogies Sample of size 2: P(coal) = 1/2N t2 Assuming 2N is large, t2 is exponentially distributed Probability of k mutations occuring before coalescene is then geometrically distributed p(mutation|event)=2u/(2u+1/2N) p(coal|event)=1/2N/(1/2N+2u) Probability of k mutation events before two sequences coalesce P(mutation|event) P(coalescence|event)

Turning neutral models into tests of neutrality Three polymorphism summary statistics: S no. of segregating sites in sample p avg. no. of pairwise differences hi no. of sites that divide the sample into i and n-i sequences

Turning neutral models into tests of neutrality S no. of segregating sites in sample p avg. no. of pairwise differences ni no. of sites that divide the sample into i and n-i sequences

Turning neutral models into tests of neutrality Q = 4Nm Q Estimator p

Frequency-based neutrality tests Tajima (1989) proposed: Fu and Li (1993) proposed:

Frequency-based neutrality tests S = h1 h1 maximized p minimized S = h[n/2] h1 minimized p maximized Negative Positive

Neutral Expectation (no selection, no structure, constant population size) D, D*, F* » 0

Positive Selection (Sweep) Negative D, D*, F*

Balancing Selection Positive D, D*, F*

Population Structure/Subdivision Positive D, D*, F*

Population Expansion Negative D, D*, F*

Polymorphism vs. Divergence Species A poly div Species B Divergence between species should reflect variation within species

HKA Test apply chi-squared test to summary statistics of polymorphism, divergence Conclusion: Adh exhibits excessive polymorphism

Polymorphism/divergence with a twist: site classes Synonymous changes: don’t affect amino acid UCU Þ UCC=Serine Nonsynonymous (replacement) changes: new amino acid UCU Þ UUC= Phenylalanine

MK Test MK test requires only 1 locus, but polymorphism data from 2 species. Adh exhibits an excessive proportion of replacement fixed differences.

Rate-based selection metric: dN/dS dN = no. nonsynonymous changes/ no. nonsynonymous sites dS = no. synonymous changes/ no. synonymous sites Counting codon ‘sites’ example: CAT Histidine is encoded by only one other codon: CAC CAT P(TÞC) = fractional syn sites P(TÞG or A) = fractional nonsyn sites full nonsyn sites fractional site

Rate-based selection metric: dN/dS dN/dS < 1 purifying selection dN/dS = 1 neutral expectation dN/dS > 1 positive selection

Rate-based selection metric: dN/dS Can be calculated using various methods Goldman & Yang implementation (PAML): nucleotide changes modelled as continuous-time Markov chain with state space = 61 codons 0: if the two codons differ at > 1 position pj: synonymous transversion kpj: synonymous transition wpj: nonsynoymous transversion wkpj: nonsynonymous transition instantaneous rate matrix pi(j) is stationary frequency of codon j qij =

Rate-based selection metric: dN/dS Are syn sites really neutral?

Codon Bias and Translation Codon bias: the unequal usage of synonymous codons -Thought to reflect selection for optimal translational efficiency and/or translational accuracy. Low Codon Bias (low tx efficiency) % Genes Distribution of Codon Bias Estimates for 6,453 Cryptococcus Genes High (high tx efficiency) 1.0

Correlates with dN/dS (or just dN) expression level (-) dispensability (+) protein abundance (-) codon bias (-) gene length (+) number of protein-protein interactions (-) centrality in interaction network (-)

Neutrality Tests Summary Allelic frequency spectrum tests (Tajima’s D) Polymorphism/divergence tests (HKA, MK) Rate-based metric: dN/dS

The future: empirical tests based on genomic data that are not dependent on demographic assumptions (Pardis Sabeti) tests that incorporate biophysical properties of amino acids into calculation of syn, nonsyn changes?