What is positive selection?

Slides:



Advertisements
Similar presentations
1 Number of substitutions between two protein- coding genes Dan Graur.
Advertisements

Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Natural Selection on the Olfactory Receptor Gene Family in Humans and Chimpanzee Chloe Lee.
Phylogenetic reconstruction
1 … and what about positive Darwinian selection?.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Heuristic alignment algorithms and cost matrices
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
1 Functional prediction in proteins (purifying and positive selection)
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Probabilistic methods for phylogenetic trees (Part 2)
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Maximum Likelihood Molecular Evolution. Maximum Likelihood The likelihood function is the simultaneous density of the observation, as a function of the.
5-3 Inference on the Means of Two Populations, Variances Unknown
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Characterizing the Phylogenetic Tree-Search Problem Daniel Money And Simon Whelan ~Anusha Sura.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Calculating branch lengths from distances. ABC A B C----- a b c.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Molecular and Genomic Evolution Getting at the Gene Pool.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
N=50 s=0.150 replicates s>0 Time till fixation on average: t av = (2/s) ln (2N) generations (also true for mutations with negative “s” ! discuss among.
Paper Review on Cross- species Microarray Comparison Hong Lu
Asymmetric Sequence Divergence of Duplicate Genes Experimented By: Gavin Conant and Andreas Wagner Presented By: Jennifer Case and Jonathan Hobbs.
NEW TOPIC: MOLECULAR EVOLUTION.
Molecular evolution Part I: The evolution of macromolecules.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Evolution of individual genes in humans
Modelling evolution Gil McVean Department of Statistics TC A G.
Lecture 21: Introduction to Phylogenetics November 9, 2015.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Sequence similarity, BLAST alignments & multiple sequence alignments
Adaptive evolution of genes underlying schizophrenia
Fig. 1. Genomic structure of the csd gene in A
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
Pipelines for Computational Analysis (Bioinformatics)
Distances.
Models of Sequence Evolution
Patterns in Evolution I. Phylogenetic
Molecular Clocks Rose Hoberman.
Molecular Evolution.
Summary and Recommendations
Volume 11, Issue 2, Pages (February 2012)
DN/dS.
Volume 9, Issue 9, Pages (September 2016)
Volume 13, Issue 23, Pages (December 2003)
Summary and Recommendations
Volume 11, Issue 2, Pages (February 2012)
Phylogenetic analysis of AquK2P.
Presentation transcript:

What is positive selection? dN = rate of nonsynonymous substitution dS = rate of synonymous substitution Let  ratio = ratio of dN/dS

Positive selection occurs when the  ratio exceeds unity __________________________________ Type of Selection Outcome Purifying selection dN/dS < 1 No selection dN/dS = 1 Positive selection dN/dS > 1

How do we test for positive selection? 1. Estimate means and variances of dN and dS for all pair-wise species comparisons. 2. Use t-test to determine if dN and dS differ significantly.

Some problems… 1. Averages over all amino acid positions in a protein.

Some problems… 1. Averages over all amino acid positions in a protein. 2. Averages over all lineages.

Some problems… 1. Averages over all amino acid positions in a protein. 2. Averages over all lineages. 3. Can detect positive selection only when it is very strong and consistent through evolutionary time.

Some more problems… Ignores the phylogenetic framework in which adaptive molecular evolution occurs!

Suppose a significant  ratio is detected between cow and pig 1 2 3 4 5 6 1. Cow --- ns ns ns 3.45 ns 2. Deer --- ns ns ns ns 3. Whale --- ns ns ns 4. Hippo --- ns ns 5. Pig --- ns 6. Camel --- ns = non significant

A phylogenetic perspective  > 1?  > 1?   Cow  > 1?  Deer Whale  > 1? Hippo  Pig Camel Outgroup

An example: lysozyme evolution in colobine monkeys • Colobine monkeys are leaf-eaters that have evolved a complex foregut (like ruminants). • Stomach expresses a high level of the bacteriolytic enzyme, lysozyme.

Phylogeny of Colobines and Cercopithecines Foregut fermentation evolved  Hanuman langur Purple-faced langur Dusky Langur Francois’ Langur Proboscis monkey Guereza colobus Angolan colobus Patas monkey Vervet Talapoin Rhesus macaque Allen’s monkey Olive baboon Sooty mangabey Chimpanzee Colobines Cercopithecines from Messier & Stewart (1997)

Phylogeny of Colobines and Cercopithecines Hanuman langur Purple-faced langur Dusky Langur Francois’ Langur Proboscis monkey Guereza colobus Angolan colobus Patas monkey Vervet Talapoin Rhesus macaque Allen’s monkey Olive baboon Sooty mangabey Chimpanzee  = 4.7  Colobines Cercopithecines from Messier & Stewart (1997)

A Maximum-Likelihood (ML) approach to the detection of positive selection

A Maximum-Likelihood (ML) approach to the detection of positive selection ML methods evaluate the probability (i.e., likelihood) of obtaining a set of DNA sequences given: a specific phylogenetic tree an explicit model of nucleotide substitution.

Some details of the model…  Implemented in the PAML package of Yang (1997)  Uses a Markov process to describe substitutions between sense codons  Parameters include: transition/transversion ratio () codon frequencies () branch lengths scaled for time (t)

Testing for positive selection involves comparing two models: Model M7: Assumes  ratios follow a beta distribution (i.e., constrained in the interval 0-1).

Testing for positive selection involves comparing two models: Model M7: Assumes  ratios follow a beta distribution (i.e., constrained in the interval 0-1). Model M8: Adds a second class of sites to M7 at which  ratios can exceed unity (i.e., positive selection).

Statistical testing can be done by likelihood ratio tests (LRTs) 1. Obtain log likelihood score from model M7, ℓM7 (null model).

Statistical testing can be done by likelihood ratio tests (LRTs) 1. Obtain log likelihood score from model M7, ℓM7 (null model). 2. Obtain log likelihood score from model M8, ℓM8 (positive selection).

Statistical testing can be done by likelihood ratio tests (LRTs) 1. Obtain log likelihood score from model M7, ℓM7 (null model). 2. Obtain log likelihood score from model M8, ℓM8 (positive selection). 3. Test for significance: X 2 = 2 (ℓM8 – ℓM7 ) with 1 d.f.

Advantages of ML approach 1. Allows for formal statistical testing by likelihood ratio tests.

Advantages of ML approach 1. Allows for formal statistical testing by likelihood ratio tests. 2. Allows for individual codons subject to positive selection to be identified.

Advantages of ML approach 1. Allows for formal statistical testing by likelihood ratio tests. 2. Allows for individual codons subject to positive selection to be identified. 3. Allows for positive selection to be inferred along individual branches of a phylogeny.

Application to the pantophysin gene in marine gadid fishes  Pantophysin is an integral membrane protein localized to small (<100 nm) cytoplasmic microvesicles  Believed to function in a variety of intracellular shuttling pathways  Exact function remains unknown

Transmembrane structure of pantophysin V N E E I F A S F N Y P F R L M T S I V A L S Q P S P P S D V C Lumen of microvesicle G T P V Q Y T R K C T D S Q K N G A G V T T E S W N N F K C T I L V A G S Y N G I D S T T T P V T S S L H G A G M S E K G Y F G R S F W L A V A N T S S A T I S S A G S V F V V I A L L F Microvesicle membrane F S W G I F L F S L F L F Y A N I S L L L T A I E A A L L S T W V L V G R V F S I L L N F Y D C P L V V W F L G P I P F G Y E E R K K H S E Q P E D A P L L T T N E P T-COOH P Y K P A A G R R F H K S R F G G Q A Cytoplasm V L Q N V V D M-NH2

Transmembrane structure of pantophysin V N E E I F A S F N Y P F R L M T S I V A L S Q P S Intra- vesicular domains P P IV1 S D V C Lumen of microvesicle G T P V Q Y T R K C T D S Q K N G A G V T T E S W N IV2 N F K C T I L V A G S Y N G I D S T T T P V T S S L H G A G M S E K G Y F G R S F W L A V A N T S S A T I S S A G S V Trans- Membrane (TM) domains F V V I A L L F Microvesicle membrane F S W G I F L F S L F L F Y A N I S L L L T A I E A A L L S T W V L V G R V F S I L L N F Y D C P L V V W F L G P I P F G Y Cytoplasmic (Cyt) domains E E R K K H S E Q P E D A P L L T T N E P T-COOH P Y K P A A G R R F H K S R F G G Q A Cytoplasm V L Q N V V D M-NH2

Genealogy of PanI alleles in the Atlantic cod PanIA alleles (N = 64) BA105A Genealogy of PanI alleles in the Atlantic cod BA108A BA107A BS39A BA143A BA112A BS21A BS29A IC74A IC70A BS71A BA115A BA126A BS72A BA128A BS49A BS81A BA132A BS53A BS87A NS1A NS12A NS79A NS73A NS28A PanIA alleles (N = 64) 100 NS34A NS91A NS41A NS74A NS58A IC30A IC2A NF42A NF24A NF142A NF94A NF88A NF162A NF158A BA140A BA138A BA149A BS20A NF17A NS83A NF73A BS31A BS64A NS70A NS68A IC8A IC6A IC9A IC80A IC41A IC42A IC78A IC61A NF6A NF11A NF56A NF36A BA105B BA108B BA107B BA112B NS1B BA128B BA126B BA115B NF88B BA138B BA132B BA143B BA140B BA149B BS20B BS21B BS31B BS29B BS39B BS53B BS49B PanIB alleles (N = 64) BS64B BS71B BS81B BS72B BS87B NS12B NS68B NS41B NS28B NS70B NS91B NS74B IC6B IC2B IC8B IC30B IC25B IC41B 100 IC42B IC61B IC70B IC78B IC74B NF11B IC80B NF36B NF24B 1 change NF73B NF56B NS34B NF158B NF94B NF17B NF6B NS58B NF162B NS73B NS83B NS79B NF42B Gadus ogac NF142B

Amino acid differences between PanIA and PanIB alleles  V N E E I F A S F N Y P F R L M T S I V A L S Q P S P P   S D V C Lumen of microvesicle G T P V Q Y T R K C IV1 T D S Q K N G A G V T T E S W N N F K C T I L V A G S Y N G I  D S T T T P V T S S L H G A G M S E K G Y F G R S F W L A V A N T S S A T I S S A G S V F V V I Microvesicle membrane A L L F F S W G I F L F S L F L F Y A N I S L L L T A I E A A L L S T W V L V G R V F S I L L N F Y D C P L V V W F L G P I P F G Y E E R K K H S E Q P E D A P L L T T N E P T-COOH P Y K P A A G R R F H K S R F G G Q A Cytoplasm V L Q N V V D M-NH2

Amino acid substitutions within PanI allelic classes ___________________________________________________________ Codon Amino Acid Distribution Allele Position Change Location Classificationa in sample PanIA 61 Lys to Gln IV1 Radical Fixed 64 Asn to Thr IV1 Radical Fixed 79 Ser to Thr IV1 Radical Fixed PanIB 43 Glu to Val IV1 Radical Fixed 61 Lys to Asn IV1 Radical Fixed 64 Asn to Asp IV1 Radical Fixed a following Taylor (1986)