Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.

Slides:



Advertisements
Similar presentations
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Sampling distributions of alleles under models of neutral evolution.
Natural Selection on the Olfactory Receptor Gene Family in Humans and Chimpanzee Chloe Lee.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
From population genetics to variation among species: Computing the rate of fixations.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Molecular phylogenetics
In the deterministic model, the time till fixation depends on the selective advantage, but fixation is guaranteed.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
PAML: Phylogenetic Analysis by Maximum Likelihood Ziheng Yang Depart of Biology University College London
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Rates and Fitness Effects of Mutations Adam Eyre-Walker (University of Sussex)
Comp. Genomics Recitation 3 The statistics of database searching.
Calculating branch lengths from distances. ABC A B C----- a b c.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Estimating evolutionary parameters for Neisseria meningitidis Based on the Czech MLST dataset.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
Rooting Phylogenetic Trees with Non-reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
N=50 s=0.150 replicates s>0 Time till fixation on average: t av = (2/s) ln (2N) generations (also true for mutations with negative “s” ! discuss among.
Asymmetric Sequence Divergence of Duplicate Genes Experimented By: Gavin Conant and Andreas Wagner Presented By: Jennifer Case and Jonathan Hobbs.
Sequence Alignment.
NEW TOPIC: MOLECULAR EVOLUTION.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Modelling evolution Gil McVean Department of Statistics TC A G.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
Introduction to Bioinformatics Resources for DNA Barcoding
Fig. 1. Genomic structure of the csd gene in A
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Pipelines for Computational Analysis (Bioinformatics)
Distances.
Models of Sequence Evolution
MUTATIONS.
Molecular basis of evolution.
What are the Patterns Of Nucleotide Substitution Within Coding and
DNA Mutations.
Pedir alineamiento múltiple
DN/dS.
MUTATIONS.
MUTATIONS.
Mutation Notes.
Lecture 11 – Increasing Model Complexity
Presentation transcript:

Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem

What does selection “look” like? Yokoyama S et al. PNAS 2008;105: When moving into new dim-light environments, vertebrate ancestors adjusted their dim-light vision by modifying their rhodopsins Functional changes have occurred Biologically significant shifts have occurred multiple times How do we know whether these shifts are adaptive or random?

Neutral Selection Mutations will occur evenly throughout the genome. Pseudogenes? Introns? Promoters? Coding Regions?

Codon Degeneracy

AA #3 AA #2 AA #1 Wobble effect – an AA coded for by more than one codon 1 st position = strongly conserved 2 nd position = conserved 3 rd position = “wobbly” Pos #3 Pos #2 Pos #1

Synonymous vs Non-synonymous Synonymous: no AA change Non-synonymous: AA change

Synonymous vs Non-synonymous

dN/dS ratios N = Non-synonymous change S = Synonymous change dN = rate of Non-synonymous changes dS = rate of Synonymous changes dN / dS = the rate of Non-synonymous changes over the rate of Synonymous changes

Selection and dN/dS dN / dS == 1 => neutral selection dN / dS negative selection dN / dS >= 1 => positive selection No selective pressure Selective pressure to stay the same Selective pressure to change

Why Selection? Identify important gene regions Find drug resistance Locate thrift genes or mutations

dN/dS Problem Analyzes whole gene or large segments But, selection occurs at amino acid level This method lacks statistical power Thus the purpose of this paper

SLAC single likelihood ancestor counting The basic idea: Count the number of synonymous and nonsynonymous changes at each codon over the evolutionary history of the sample NN [D s | T, A] NS [D s | T, A]

SLAC E40KL10I

SLAC Strengths: Computationally inexpensive More powerful than other counting methods in simulation studies Weaknesses: We are assuming that the reconstructed states are correct Adding the number of substitutions over all the branches may hide significant events Simulation studies shows that SLAC underestimates substitution rate Runtime estimates Less than a minute for sequence datasets

FEL fixed effects likelihood The basic idea: Use the principles of maximum likelihood to estimate the ratio of nonsynonymous to synonymous rates at each site

FEL Likelihood Ratio Test H o : α = β H a : α ≠ β fixed

FEL Strengths: In simulation studies, substitution rates estimated by FEL closely approximate the actual values Models variation in both the synonymous and nonsynonymous substitution rates Easily parallelized, computational cost grows linearly Weaknesses: To avoid estimating too many parameters, we fix the tree topology, branch lengths and rate parameters Runtime Estimates: A few hours on a small cluster for several hundred sequences

REL random effects likelihood The basic idea: Estimate the full likelihood nucleotide substitution model and the synonymous and nonsynonymous rates simultaneously. Compromise: Use discrete categories for the rate distributions

REL 1.Posterior Probability 2.Ratio of the posterior and prior odds having ω > 1

REL Strengths: Estimates synonymous, nonsynonymous and nucleotide rates simultaneously Most powerful of the three methods for large numbers sequences Weaknesses: Performs poorly with small numbers of sequences Computationally demanding Runtime Estimates: Not mentioned

Simulation Performance 64 sequences 8 sequences

Selection and dN/dS dN / dS == 1 => neutral selection dN / dS negative selection dN / dS >= 1 => positive selection No selective pressure Selective pressure to stay the same Selective pressure to change