1 Detecting selection using phylogeny. 2 Evaluation of prediction methods  Comparing our results to experimentally verified sites Positive (hit)Negative.

Slides:



Advertisements
Similar presentations
Micro Evolution -Evolution on the smallest scale
Advertisements

Evolution and proteins You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein For a mutation.
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Motivation “Nothing in biology makes sense except in the light of evolution” Christian Theodosius Dobzhansky.
Evoluzione genetica di HIV ed evoluzione clinica della malattia AIDS: due aspetti correlati? Carlo Federico Perno.
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach.
Darwin and His Theory of Evolution by Natural Selection
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Learning goals: Students will understand that 1) molecular mechanisms that preserve the fidelity of the genetic sequence have been favored by natural selection,
Molecular Evolution Revised 29/12/06
HIV/AIDS as a Microcosm for the Study of Evolution.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
1 Functional prediction in proteins (purifying and positive selection)
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.
In the deterministic model, the time till fixation depends on the selective advantage, but fixation is guaranteed.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Generating Diversity: how genes and genomes evolve Erin “They call me Dr. Worm” Friedman 29 September 2005.
Evolution Chapters Evolution is both Factual and the basis of broader theory What does this mean? What are some factual examples of evolution?
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction to Bioinformatics.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
How Does Antiretroviral Therapy Affect HIV Mutation and Vice Versa? Arlin Toro Devin Iimoto Devin Iimoto.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Introduction to bioinformatics Lecture 3 High-throughput Biological Data -data deluge, bioinformatics algorithms- and evolution C E N T R F O R I N T.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
N=50 s=0.150 replicates s>0 Time till fixation on average: t av = (2/s) ln (2N) generations (also true for mutations with negative “s” ! discuss among.
NEW TOPIC: MOLECULAR EVOLUTION.
Construction of Substitution matrices
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
Bioinformatics Overview
Evolutionary genomics can now be applied beyond ‘model’ organisms
Evolution of gene function
Causes of Variation in Substitution Rates
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
In-Text Art, Ch. 16, p. 316 (1).
Distances.
Genetic Variations with Populations
What are the Patterns Of Nucleotide Substitution Within Coding and
Molecular Evolution.
Summary and Recommendations
Dr Tan Tin Wee Director Bioinformatics Centre
Homology Modeling.
Pedir alineamiento múltiple
MULTIPLE SEQUENCE ALIGNMENT
Summary and Recommendations
AS Level Paper 1 and 2. A2 Level Paper 1 and 3 - Topics 1-4
Presentation transcript:

1 Detecting selection using phylogeny

2 Evaluation of prediction methods  Comparing our results to experimentally verified sites Positive (hit)Negative TrueTrue-positive True-negative FalseFalse-positive (false alarm) False-negative (miss) Our prediction gives: Is the prediction correct?

3 Calibrating the method  All methods have a parameter (cutoff) that can be calibrated to improve the accuracy of the method.  For example: the E-value cutoff in BLAST

4 Calibrating E-value cutoff Positive (hit)Negative TrueTrue-positive (real homolog( True-negative (real non-homolog) FalseFalse-positive (false alarm: not a homolog) False-negative (missed a homolog) Our prediction gives: Is the prediction correct? Is this a homolog?

5 Calibrating the E-value  What will happen if we raise the E-value cutoff (for instance – work with all hits with an E-value which is < 10) ? Positive (hit)Negative TrueTrue-positive True-negative FalseFalse-positive (false alarm) False-negative (miss) Our prediction gives: Is the prediction correct?

6 Calibrating the E-value  On the other hand – if we lower the E-value (look only at hits with E-value < ) Positive (hit)Negative TrueTrue-positive True-negative FalseFalse-positive (false alarm) False-negative (miss) Our prediction gives: Is the prediction correct?

7 Improving prediction  Trade-off between specificity and sensitivity

8 Sensitivity vs. specificity  Sensitivity =  Specificity = True positive True positive + False negative Represent all the proteins which are really homologous True negative True negative + False positive Represent all the proteins which are really NOT homologous How good we hit real homologs How good we avoid real non- homologs

9  Raising the E-value to 10: sensitivity specificity  Lowering the E-value to sensitivity specificity

10 Functional prediction in proteins (purifying and positive selection)

11 Darwin – the theory of natural selection  Adaptive evolution: Favorable traits will become more frequent in the population

12 Adaptive evolution  When natural selection favors a single allele and therefore the allele frequency continuously shifts in one direction

13 Kimura – the theory of neutral evolution  Neutral evolution: Most molecular changes do not change the phenotype Selection operates to preserve a trait (no change)

14 Purifying Selection  Stabilizes a trait in a population: Small babies  more illness Large babies  more difficult birth …  Baby weight is stabilized round 3-4 Kg

15 Purifying selection (conservation) - the molecular level  Histone 3

16 Synonymous vs. non-synonymous substitutions Purifying selection: excess of synonymous substitutions

17 Synonymous vs. non-synonymous substitutions Purifying selection: excess of synonymous substitutions Synonymous substitution: GUU  GUC Non-synonymous substitution: GUU  GCU

18 Conservation as a means of predicting function Infer the rate of evolution at each site Low rate of evolution  constraints on the site to prevent disruption of function: active sites, protein-protein interactions, etc.

19 Conservation as a means of predicting function HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiaeDDGAFAA S. pombeDDGALGE

20 Which site is more conserved? HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiaeDDGAFAA S. pombeDDGALGE

21 Use Phylogenetic information HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiaeDDGAFAA S. pombeDDGALGE A G A A A G A A A A G G

22 Prediction of conserved residues by estimating evolutionary rates at each site ConSurf/ConSeq web servers:

23 Working process Input a protein with a known 3D structure (PDB id or file provided by the user) Find homologous protein sequences (psi-blast) Perform multiple sequence alignment (removing doubles)Construct an evolutionary tree Project the results on the 3D structureCalculate the conservation score for each site

24 The Kcsa potassium channel  An outstanding mystery: how does the Kcsa Potassium channel conduct only K+ ions and not Na+?

25 The Kcsa potassium channel structure  The structure of the Kcsa channel was resolved in 1998  Kcsa is a homotetramer with a four-fold symmetry axis about its pore.

26 The Kcsa potassium selectivity filter  The selectivity filter identifies water molecules bound to K+  When water is bound to Na+: no passage

27 Conservation analysis of Kcsa  Use Consurf to study Kcsa conservation

28 ConSurf results

29 Conseq  ConSeq performs the same analysis as ConSurf but exhibits the results on the sequence.  Predict buried/exposed relation  exposed & conserved  functionally important site  buried & conserved  structurally important site

30 Conseq analysis Exposed & conserved  functionally important site Buried & conserved  structurally important site

31 Positive selection & drug resistance

32 Darwin – the theory of natural selection  Adaptive evolution: Favorable traits will become more frequent in the population

33 Adaptive evolution on the molecular level

34 Adaptive evolution on the molecular level Look for changes which confer an advantage

35 Na ï ve detection  Observe multiple sequence alignment: variable regions = adaptive evolution??

36 Na ï ve detection  The problem – how do we know which sites are simply sites with no selection pressure ( “ non-important ” sites) and which are under adaptive evolution?

37 Solution – look at the DNA synonymous non- synonymous

38 Solution – look at the DNA Purifying selection Syn > Non-syn Adaptive evolution = Positive selection Non-syn > Syn Neutral selection Syn = Non-syn

39 Also known as … Ka/Ks (or dn/ds, or ω)  Purifying selection: Ka < Ks (Ka/Ks <1)  Neutral selection: Ka=Ks (Ka/Ks = 1)  Positive selection: Ka > Ks (Ka/Ks >1) Non- synonymous mutation rate Synonymous mutation rate

40 Examples for positive selection  Proteins involved in immune system  Proteins involved in host-pathogen interaction ‘ arms-race ’  Proteins following gene duplication  Proteins involved in reproduction systems

41 Selecton – a server for the detection of purifying and positive selection

42 Detecting drug resistance using Selecton

43 HIV: molecular evolution paradigm Rapidly evolving virus: 1.High mutation rate (low fidelity of reverse transcriptase) 2.High replication rate

44 HIV Protease Protease is an essential enzyme for viral replication Drugs against Protease are always part of the “cocktail”

45 Ritonavir Inhibitor  Ritonavir (RTV) is a specific protease inhibitor (drug) C 37 H 48 N 6 O 5 S 2

46 Drug resistance No drug Drug Adaptive evolution (positive selection)

47 Used Selecton to analyse HIV-1 protease gene sequences from patients that were treated with RTV only

48

49 Example: HIV Protease  Primary mutations  Secondary mutations  novel predictions (experimental validation)

50 Summary  Sequence analysis can provide valuable information about protein function  Conservation on the amino acid level  Positive “ Darwinian ” selection and purifying selection