SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application.

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein structure Friday, 10 February 2006 Introduction to Bioinformatics Brigham Young University DA McClellan
Similar Sequence Similar Function Charles Yan Spring 2006.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Introduction to Bioinformatics Algorithms Sequence Alignment.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Bioinformatics in Biosophy
Protein Tertiary Structure Prediction
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
1 Patterns of Substitution and Replacement. 2 3.
An Introduction to Bioinformatics
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Intelligent Systems for Bioinformatics Michael J. Watts
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Identification of specificity-determining positions in protein alignments Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information.
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Secondary structure prediction
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS BiC BioCentrum-DTU Technical University of Denmark 1/31 Prediction of significant positions in biological sequences.
Construction of Substitution Matrices
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
intro-VIRUSES Virus NamePDB ID HUMAN PAPILLOMAVIRUS 161DZL BACTERIOPHAGE GA1GAV L-A virus1M1C SATELLITE PANICUM MOSAIC VIRUS1STM SATELLITE TOBACCO NECROSIS2BUK.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Jacek Leluk, Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University CLASSIFICATION AND CHARACTERIZATION OF NATURAL PROTEIN.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
Construction of Substitution matrices
Blosum matrices What are they? Morten Nielsen BioSys, DTU
Step 3: Tools Database Searching
Protein Sequence Alignment Multiple Sequence Alignment
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Intersubunit contacts are often facilitated by specificity-determining positions Computational identification of protein positions that possibly account.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Protein Sequence Alignments
Protein Alignments: Clues to Protein Function
Prediction of Protein Structure and Function on a Proteomic Scale
There are four levels of structure in proteins
Chimeras Reveal a Single Lipid-Interface Residue that Controls MscL Channel Kinetics as well as Mechanosensitivity  Li-Min Yang, Dalian Zhong, Paul Blount 
Structure, Exchange Determinants, and Family-Wide Rab Specificity of the Tandem Helical Bundle and Vps9 Domains of Rabex-5  Anna Delprato, Eric Merithew,
Alignment IV BLOSUM Matrices
Example of regression by RBF-ANN
Presentation transcript:

SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application thereof to the MIP family of membrane transporters Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand Aleksandra B. Rakhmaninova

Large families of proteins: generally similar biochemical function but many different specificities… Example: ~800 transcription factors of the LacI family. Average sequence identity 30%. Bind different effectors and operators. Some effectors: lactose (LacI) D-fructose-6-phosphate (FruR) guanine, hypoxantine (PurR) cytidine, adenosine (CytR) trehalose-6-phosphate (TreR) D-gluconate (GntR) D-galactose (GalR) D-ribose (RbsR) maltose (MalR) raffinose (RafR) ……. Х??

Positions that account for specificity Assignment of specificity to new proteins Experiment Testing on families that include proteins with resolved 3D structure SDPpred Description of specificity groups : Group А: No. 1-10,13… Group В: No.12, 14-16… Group С: No … … Q9KDW MSPFLGEVIGTMILIILGGGVVAGVVLKGTK Q8Y6Z1 ----MIDTSLATQFLGEVIGTAILIILGAGVVAGVSLKRSK Q97JG MTIFFAELVGTLLLILLGDGVVANVVLKNSK GLPF_ECOLI MSQT---STLKGQCIAEFLGTGLLIFFGVGCVA--ALKVAG Q8ZJK5 MSQTA-SSTLKGQCIAEFLGTGLLIFFGAGCVA--ALKLAG GLPF_HAEIN MDKS-----LKANCIGEFLGTALLIFFGVGCVA—-ALKVAG GLPF_PSEAE MTTAAPTPSLFGQCLAEFLGTALLIFFGTGCVA--ALKVAG AQPZ_BRUME MLNKLSAEFFGTFWLVFGGCGSAILAA--AFP Q92NM MFRKLSVEFLGTFWLVLGGCGSAVLAA--AFP Q8UJW MGRKLLAEFFGTFWLVFGGCGSAVFAA--AFP AQPZ_ECOLI MFRKLAAECFGTFWLVFGGCGSAVLAA--GFP Alignment ?

SDP is not equivalent to a functionally important position! Specificity group = group of proteins that have the same specificity (experimental data, genome analysis, etc.) SDP = alignment position that is conserved within specificity groups but differs between them What are SDPs? (SDP = Specificity Determining Position)

Mutual information I p reflect the extent to which an alignment position tends to be a SDP. Statistical significance of I p. Expected mutual information I p exp of an alignment column. Z-score. ( Mirny&Gelfand, 2002, J Mol Biol, 321(1) ) Smoothed amino acid frequencies: a leucine is more a methionine than a valine, and any arginine has a dash of lysine… Are 5 SDP with Z-score >10.5 better than 10 SDP with Z-score >9.0? Bernoulli estimator for selection of proper number of SDPs ы N - number of groups, - fraction of proteins in group i. - ratio of occurrences of amino acid In group i in position p to the length of the whole alignment column, - frequency of amino acid in the whole alignment column in position p, Algorithm …

Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.

Web interface Input: multiple alignment of proteins divided into specificity groups === AQP === %sp|Q9L772|AQPZ_BRUME mlnklsaeffgtfwlvfggcgsa ilaa--afp elgigflgvalafgltvltmayavggisg--ghfnpavslgltv iiilgsts slap qlwlfwvaplvgavigaiiwkgllgrd %sp|P48838|AQPZ_ECOLI mfrklaaecfgtfwlvfggcgsa vlaa--gfp elgigfagvalafgltvltmafavghisg--ghfnpavtiglwa lvihgatd kfap qlwffwvvpivggiiggliyrtllekrd %tr|Q92ZW mfkklcaeflgtcwlvlggcgsa vlas--afp qvgigllgvsfafgltvltmaytvggisg--ghfnpavslglav iiilgsth rrvp qlwlfwiaplfgaaiagivwksvgeefrpvd === GLP === %sp|P11244|GLPF_ECOLI msqt---stlkgqciaeflgtglliffgvgcv aalkvag a-sfgqweisviwglgvamaiyltagvsg--ahlnpavtialwl glilaltd dgn g-vpr -flvplfgpivgaivgafayrkligrhlpcdicvveek--etttpseqkasl %sp|P44826|GLPF_HAEIN mdks-----lkancigeflgtalliffgvgcv …

Web interface Output Alignment of the family with the SDPs highlighted (Alignment view) Detailed description of each SDP (List of SDPs) Plot of probabilities, used by the Bernoulli estimator to set the cutoff (Probability plot view)

Examples: the LacI family of bacterial transcription factors Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups 10 residues contact NPF (analog of the effector) 6 residues make up intersubunit contacts 7 residues contact the operator sequence 7 residues in the effector contact zone (5Ǻ<d min <10Ǻ) 5 residues in the intersubunit contact zone (5Ǻ<d min <10Ǻ) 6 residues in the operator contact zone (5Ǻ<d min <10Ǻ) – 44 SDPs LacI from E.coli

Examples: bacterial membrane channels of the MIP family Training set: 17 sequences, average length 280 amino acids, 2 specificity groups: Aquaporines & glyceroaquaporines – 21 SDPs 8 residues contact glycerol (substrate) (d min <5Ǻ) 8 residues oriented to the channel 5 residues make up contacts with other subunits GlpF from E.coli

Why does the prediction make sense? LacI from E.coli Total 348 amino acids 44 SDP Non-contacting residues (distance to the DNA, effector, or the other subunit >10Ǻ) Contact zone (may be functional) Contacting residues (distance to the DNA, effector, or the other subunit <5Ǻ)

Why does the prediction make sense? GlpF from E.coli Total 281 amino acids 21 SDP Contacting residues (distance to the substrate, or another subunit <5Ǻ) Non-contacting residues (distance to the substrate, or another subunit >10Ǻ) Contact zone (may be functional)

GlpF from E.coli, a membrane channel from the MIP family: SDPs either interact with the substrate or are located on the outer surface of the monomer Structure of the GlpF monomerPredicted SDPs Glycerol

SDPs located on the outer surface of the GlpF monomer form subunit contacts Glu43 from all four subunits 20Leu, 24Ile, 108Tyr of one subunit, 193Ser from another subunit

SDPs located on the outer surface of the GlpF monomer (continued) Subunit ISubunit IISubunit IV ResidueAtomResidueAtomResidueAtom(Ǻ)(Ǻ) Glu43OE1Ser38O4.8 Glu43OE2Glu43OE24.1 Glu43CGTrp42CD13.7 Glu43OE2Glu43OE24.1 Subunit ISubunit II ResidueAtomResidueAtom(Ǻ)(Ǻ) Leu20CD2Ile158CD14.3 Leu20CD1Leu162CD24.5 Phe24CZIle158CG23.9 Phe24CZLeu186CD13.9 Phe24CE2Val189CG23.8 Phe24CE2Ile190CG13.7 Phe24CASer193CB3.9 Phe24OSer193OG4.2 Phe24OSer193CBCB3.3 Gly27OSer193O3.2 Cys28CASer193CA3.8 Tyr108OHSer193O2.6 Tyr108CE1Met194CE3.7 Tyr108CE1Leu197CD13.9

SDPs located on the outer surface of the GlpF monomer (continued) Structure of contacts in the type A cluster Structure of contacts in the type B cluster

Conclusions I. SDPpred: the SDP prediction method A method for identification of amino acid residues that account for differences in protein functional specificity –Does not rely on the protein 3D structure –Automatically determines the number of significant positions –Considers substitutions according to the chemical properties of substituted amino acids Results agree with available structural and experimental data Applicable to any protein family in a standard way Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.

Conclusions II. SDPs for GlpF from E.coli In protein families, whose members function as oligomers, predicted SDPs are often localized on the contact surface between subunits 5 “surface” SDPs in GlpF: 20Leu, 24Ile, 43Glu, 108Tyr, 193Ser. All of them participate in forming the quaternary structure  Evolutionary pressure on amino acids that establish intersubunit contacts correlates with evolutionary pressure on amino acids that account for the correct recognition of the substrate These residues form compact spatial clusters  “structural clasps” for recognition of proper subunits

Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand Aleksandra B. Rakhmaninova –Department of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia –Institute for Information Transmission Problems RAS, Moscow, Russia –State Scientific Center GosNIIGenetika, Moscow, Russia Acknowledgements –Leonid A. Mirny –Olga Laikova –Vsevolod Makeev –Roman Sutormin –Shamil Sunyaev –Aleksey Finkelstein