Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.

Slides:



Advertisements
Similar presentations
Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach.
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Measuring the degree of similarity: PAM and blosum Matrix
DNA sequences alignment measurement
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
14 Molecular Evolution and Population Genetics
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
From population genetics to variation among species: Computing the rate of fixations.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Scott Williamson and Carlos Bustamante
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Probabilistic methods for phylogenetic trees (Part 2)
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
In the deterministic model, the time till fixation depends on the selective advantage, but fixation is guaranteed.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Tree Inference Methods
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
The Biology and Genetic Base of Cancer. 2 (Mutation)
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Introduction to Bioinformatics.
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Pairwise Sequence Analysis-III
Estimating evolutionary parameters for Neisseria meningitidis Based on the Czech MLST dataset.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
Phylogeny Ch. 7 & 8.
N=50 s=0.150 replicates s>0 Time till fixation on average: t av = (2/s) ln (2N) generations (also true for mutations with negative “s” ! discuss among.
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
NEW TOPIC: MOLECULAR EVOLUTION.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Evolutionary Models CS 498 SS Saurabh Sinha. Models of nucleotide substitution The DNA that we study in bioinformatics is the end(??)-product of evolution.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Modelling evolution Gil McVean Department of Statistics TC A G.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Maximum likelihood (ML) method
Pipelines for Computational Analysis (Bioinformatics)
Distances.
Models of Sequence Evolution
Goals of Phylogenetic Analysis
Lecture 3.
What are the Patterns Of Nucleotide Substitution Within Coding and
Summary and Recommendations
Pedir alineamiento múltiple
Lecture 11 – Increasing Model Complexity
Section 20.4 Mutations and Genetic Variation
Summary and Recommendations
Presentation transcript:

Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology The University of Hong Kong Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong August 17 - August 29, 2009

Why? “Understanding the selective pressures that have shaped genetic variation is a central goal in the study of evolutionary biology” Pond et al

Dynamics of evolution The diversity exhibited by a population reflects the organisms natural history The genetic diversity of a population is a combination of: - biological properties (e.g. mutation rates, generation time) - evolutionary forces (e.g. molecular adaptation, genetic drift) Three principal mechanisms are responsible for viral genetic variation - mutation - selection - recombination 3

Mutations As nonsynonymous (β) mutations directly alter proteins (& potentially their function) they are more likely to affect organism fitness than synonymous (α) mutations that leave the amino acid sequence unchanged 4

Mutations Mutations that result in amino acid changes are non-synonymous Mutations that do not result in amino acid changes are silent or synonymous 5

Selective pressures Selective pressure on coding sequences can be calculated by comparison of the relative rates of α & β mutations The ratio ω = β/α (also referred to as dN/dS or K A /K S ) is a standard measure of selective pressure 6

Selective pressures ω ≈ 1 indicates neutral evolution, ω 1 positive (or diversifying) selection To infer selective pressures it is necessary to be able to accurately estimate nonsynonymous & synonymous rates – this is where models come in (discussed later) 7

Evolutionary rates and Selection Mutations have evolutionary consequences ONLY if they are successfully transmitted to the next generation MUTATION RATE: Number of nucleotide alterations per round of replication SUBSTITUTION (or EVOLUTION) RATE: Number of nucleotide alterations fixed in a population per unit of time The rate of evolution of a virus reflects the relative proportion of advantageous, neutral or deleterious evolutionary forces exerted on it 8

Selective pressures Under negative selection less ‘fit’ nonsynonymous subst. accumulate more slowly than synonymous subst. Alternatively expressed, negative selection exerts pressure to remove deleterious subst. from a population Positive selection acts to fix more ‘fit’ or advantageous subst. in a population 9

Evolutionary models Necessary for accurate rate estimation Current models either take the nucleotide or the codon as the unit of evolution The structure of the genetic code determines that realistic models of evolution should consider triplets of nucleotides (i.e. codons) to be the basic unit of evolution 10

Nucleotide based models Nucleotide substitution models – each nucleotide position of an alignment is treated independently Codon position substitution models – partitions nucleotide data so that codon positions 1, 2 & 3 may have different parameters – SRD06 model has two categories 1+2 & 3 11

Codon based models A model of DNA sequence evolution applicable to coding regions Uses the codon, as opposed to the nucleotide, as the unit of evolution Accounts for dependencies among nucleotides within a codon Most commonly used are GY94 (Goldman & Yang) and MG94 (Muse & Gaut) 12

Nucleotide substitution models as an example

Models of nucleotide evolution Several probabilistic models of evolution have been developed to convert observed nucleotide distances into measures of actual evolutionary distances The relative complexity of these models is a function of the extent of the biological, biochemical ad evolutionary assumptions (i.e. parameters) they incorporate Substitutions are usually described as probabilities of mutational events, mathematically modeled by matrices of relative rates: 14

Jukes-Cantor (JC) First proposed model It assumes that the four bases have equal frequencies and all substitutions are equally likely 15

Kimura’s 2 parameter Transitions are generally more frequent than transversions K2P model assumes that the rate of transitions per site (α) differs from the rate of transversions per site (β) 16

If some substitutions are more common in one sequence than others, some substitutions may be more frequent than others F81 model allows the frequency (π) of the four nucleotides to be different Felsenstein (1981) 17

Hasegawa, Kishino and Yano The HKY85 model allows rates of transitions and transversions to differ and base frequencies to vary 18

General Time Reversible The GTR/REV model allows each possible substitution to have its own probability Substitutions are reversible (i.e. substitutions from i to j has the same probability as a substitution from j to i) 19

After Whelan et al

Rate heterogeneity Different regions of RNA/DNA may have different probabilities of change, and variable rates of substitution can have considerable impact on sequence divergence Typically, a gamma distribution is used to describe heterogeneity in nucleotide substitution rate across sequences The range of rate variation among sites is dictated by the shape parameter α of the distribution 21

Beware of recombination!! Many phylogenetic methods implicitly assume that all sites in a sequence share a common evolutionary history However, recombination can violate this assumption by allowing sites to move freely between different genetic backgrounds This may cause different sections of an alignment to lead to contradictory estimates of the tree and subsequently confuse model inferences 22

Global vs. Local ω models Global – fits a single model to a given alignment & tree (i.e. all branches are equal) Local – can a unique set of substitution rates to every branch in a tree 23

Acknowledgements HKU: Vijaykrishna Dhanasekaran & Justin Bahl for help with preparing the presentation & practical component Estimating selection pressures on alignments of coding sequences: Analyses using HyPhy. Edited by Sergei L. Kosakovsky Pond, Art F.Y. Poon, and Simon D.W. Frost 24