1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity.

Slides:



Advertisements
Similar presentations
Neutral Theory of Molecular Evolution most base substitutions are selectively neutral drift dominates evolution at the molecular level Under drift, rate.
Advertisements

IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.
Motivation “Nothing in biology makes sense except in the light of evolution” Christian Theodosius Dobzhansky.
Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
Alignments and alignment reliability The first critical step in sequence analysis – the know how Eyal Privman and Osnat Penn Tel Aviv University COST Training.
Natural Selection on the Olfactory Receptor Gene Family in Humans and Chimpanzee Chloe Lee.
1 … and what about positive Darwinian selection?.
Molecular Evolution Revised 29/12/06
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
1 Detecting selection using phylogeny. 2 Evaluation of prediction methods  Comparing our results to experimentally verified sites Positive (hit)Negative.
Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Similar Sequence Similar Function Charles Yan Spring 2006.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
1 Functional prediction in proteins (purifying and positive selection)
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Materials and Methods Abstract Conclusions Introduction 1. Korber B, et al. Br Med Bull 2001; 58: Rambaut A, et al. Nat. Rev. Genet. 2004; 5:
1 Dan Graur Rates of Nucleotide Substitution. 2 r = Rate of substitution per site per year K = Number of substitutions per site per year.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
An Introduction to Bioinformatics
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
In the deterministic model, the time till fixation depends on the selective advantage, but fixation is guaranteed.
Protein Sequence Alignment and Database Searching.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction to Bioinformatics.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Using blast to study gene evolution – an example.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
N=50 s=0.150 replicates s>0 Time till fixation on average: t av = (2/s) ln (2N) generations (also true for mutations with negative “s” ! discuss among.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Asymmetric Sequence Divergence of Duplicate Genes Experimented By: Gavin Conant and Andreas Wagner Presented By: Jennifer Case and Jonathan Hobbs.
Sequence Alignment.
NEW TOPIC: MOLECULAR EVOLUTION.
Construction of Substitution matrices
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
PatchFinder. The ConSurf web-server calculates the evolutionary rate for each position in the protein. Surface clusters of spatially close & conserved.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
1 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
Causes of Variation in Substitution Rates
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Pipelines for Computational Analysis (Bioinformatics)
Methods of molecular phylogeny
What are the Patterns Of Nucleotide Substitution Within Coding and
Presentation transcript:

1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity can imply homology Identity and Homology

2 HW Clarifications Insertions and Deletions

3 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4 Empirical findings of conservation variation among sites: Functional/Structural sites evolve slowerthan nonfunctional/nonstructural sites

5 Conservation = functional/structural importance

6 Histone 3 protein

7 Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. : **** Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos CGSHLVEALYLVCGERGFFYTPKARREVEG **************:***** ** :*::* Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos PQVG---ALELAGGPGAGGLEGPPQKRGIV.**. ** * * ***** Xenopus EQCCHSTCSLFQLENYCN Bos EQCCASVCSLYQLENYCN **** *.***:******* Alignment pre-pro-insulin

8 <>

9

10   Conserved sites: Important for the function or structure Important for the function or structure Not allowed to mutate Not allowed to mutate “Slow evolving” sitesLow rate of evolution   Variable sites: Less important (usually) Change more easily “Fast evolving” sitesHigh rate of evolution Conservation based inference

11 Detecting conservation: Detecting conservation: Evolutionary rates d Rate = distance/time Distance = number of substitutions per site Time = 2*#years (doubled because the sequences evolved independently)

12 Rate computation HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiae DDGAFAA S. pombe DDGALGE MSAPhylogeny Evolutionary Model

13 Site-specific rate computation tool

14 Locating the active site of Pyruvate kinase Glycolysis pathway

15

16

17

18 Conservation scores:  The scores are standardized: the average score of all residues is 0, and the standard deviation is 1  Negative values: slowly evolving (= low evolutionary rate). conserved sites The most conserved site in the protein has the lowest score The most conserved site in the protein has the lowest score  Positive values: rapidly evolving (= fast evolutionary rate). variable sites The most variable site in the protein has the highest score The most variable site in the protein has the highest score Scores are relative to the protein and cannot be compared between different proteins!!!

19

20 SWISS-PROT

21 Combining protein structure  Each protein has a particular 3D structure that determines its function  Protein structure is better conserved than protein sequence and more closely related to function  Analyzing a protein structure is more informative than analyzing its sequence for function inference

22 Protein core: structurally constrained - usually conserved Active site: functionally constrained - usually conserved Surface: tolerant to mutations - usually variable Core Surface Conservation in the structure Active site

23 Same algorithm as ConSeq, but here the results are projected onto the 3D structure of the protein

24 The structure-function of the potassium channel transmembrane region cytoplasm

25

26

27

28

29 ConSeq/ConSurf user intervention (advanced options) ConSeq/ConSurf user intervention (advanced options) 1. Choosing the method for calculating the amino-acid conservation scores: (Bayesian/Max’ Likelihood) 2. Entering your own MSA file 3. Performing the MSA using: (MUSCLE/CLUSTALW) 4. Collecting the homologs from: (SWISS-PROT/UniProt) 5. Max. number of homologs: (50) 6. No. of PSI-BLAST iterations: (1) 7. PSI-BLAST 3-value cutoff: (0.001 ) 8. Model of substitution for proteins: (JTT/Dayhoff/mtREV/cpREV/WAG) 9. Entering your own PDB file 10. Entering your own TREE file

30 Codon-level selection  ConSeq/ConSurf: Compute the evolutionary rate of amino-acid sites → the data are amino acids Compute the evolutionary rate of amino-acid sites → the data are amino acids Compute only the rate of non-synonymous substitutions Compute only the rate of non-synonymous substitutions UUU → UUC (Phe → Phe ): synonymous UUU → CUU (Phe → Leu): non-synonymous

31 For most proteins, the rate of synonymous substitutions is much Higher than the non-synonymous rate purifying selection This is called purifying selection (= conservation in ConSeq/Surf ) Synonymous vs. non-synonymous substitutions

32 There are rare cases where the non- synonymous rate is much higher than the synonymous rate positive (Darwinian) selection This is called positive (Darwinian) selection Synonymous vs. nonsynonymous substitutions

33 Examples:  Pathogen proteins evading the host immune system  Proteins of the immune system detecting pathogen proteins  Pathogen proteins that are drug targets  Proteins that are products of gene duplication  Proteins involved in the reproductive system Positive Selection The hypothesis: promotes the fitness of the organism

34 Computing synonymous and non- synonymous rates Evolutionary Model Codon MSA Phylogeny

35 Inferring positive selection Look at the ratio between the non-synonymous rate (K a ) and the synonymous rate (K s )

36 Inferring positive selection Ka/Ks < 1purifying selection Ka/Ks > 1positive selection Ka/Ks = 1no selection (neutral)

37  Our evolutionary model assumes there is positive selection in the data  By chance alone we expect our model to find a few sites with Ka/Ks >1  Is this really indicative of positive selection or plain randomness? Maybe there’s no positive selection after all? Evolutionary Model Codon MSA Phylogeny

38 Solution: statistically compare between hypotheses  H 0 : There’s no positive selection  H 1 : There is positive selection  H 0 : compute the probability (likelihood) of the data using a model that does not account for positive selection P-value > 0.05 accept H 0 < 0.05 reject H 0  Perform a statistical test to accept or reject H 0 (likelihood ratio test)  H 1 : compute the probability (likelihood) of the data using a model that does account for positive selection

39 Note: saturation of synonymous substitutions Human and wheat are too evolutionary remote saturation of synonymous substitutions Pick closer sequences for positive selection analysis Syn. Nonsyn.

40

41 Selecton input  Coding sequences - only ORFs  No stop codons  If an MSA is provided it must be codon aligned (RevTrans) (RevTrans)  The user must provide the sequences – no psi-blast option Codon-level sequences !!!

42 Positive selection in the primateTRIM5a

43 PrimateTRIM5a TRIM5α from humans, rhesus monkeys, and African green monkeys are all unable to restrict retroviruses isolated from their own species, yet are able to restrict retroviruses from the other species TRIM5α is an important natural barrier to cross-species retrovirus transmission TRIM5α is in an antagonistic conflict with the retroviral capsid proteins TRIM5α is under positive selection

44 Positive selection analysis

45 Positive selection analysis in Selecton H0H0 H1H1

46 Comparing H 0 and H 1 in Selecton

47 Comparing H 0 and H 1 in Selecton

48

49 Selecton results:

50

51 Results Humanrhesus swaps at sites 332, (SPRY) significantly elevate human resistance to HIV and rhesus resistance to SIV