Repetitive Beta Folds Form, Function, and Properties.

Slides:



Advertisements
Similar presentations
Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT.
Advertisements

Protein Structure Prediction
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Strict Regularities in Structure-Sequence Relationship
Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Homology modelling ? X-ray ? NMR ?. Homology Modelling !
Carnegie Mellon School of Computer Science Copyright © 2004, Carnegie Mellon. All Rights Reserved. Biological Language Modeling Project Segmentation Conditional.
Recursive domains in proteins
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Protein Quaternary Fold Recognition Using Conditional Graphical Models
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Modules An Introduction to Bioinformatics.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.
Protein structures in the PDB
Homology modelling ? X-ray ? NMR ?. Homology Modelling !
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
Single Motif Charles Yan Spring Single Motif.
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
Macromolecular structure
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Bioinformatics 2 -- Lecture 8 More TOPS diagrams Comparative modeling tutorial and strategies.
Predicting The Beta-Helix Fold From Protein Sequence Data Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger MIT.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein and RNA Families
Structural proteomics
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Domain Database
Carnegie Mellon School of Computer Science 1 Protein Quaternary Fold Recognition Using Conditional Graphical Models Yan Liu IBM Research Jaime Carbonell.
Structural proteomics Handouts. Proteomics section from book already assigned.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Carnegie Mellon School of Computer Science 1 Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute Carnegie.
Sequence Alignment.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Remote Homology Detection: Beyond Hidden Markov Models Lenore Cowen CS Department Tufts University.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Carnegie Mellon School of Computer Science 1 Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute Carnegie.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Protein families, domains and motifs in functional prediction May 31, 2016.
Structural and sequence features of beta-turns in beta-hairpins
Matt Menke, Tufts Bonnie Berger, MIT Lenore Cowen, Tufts
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Repetitive Beta Folds Form, Function, and Properties

Overview of Presentation Introduction to traditional beta-helix fold –Structural and functional properties –Structure prediction by BetaWrap program Introduction to trimeric viral attachment fibers –Structural comparison to traditional beta-helices –Current computational approaches

Basic Parallel Beta-Helix Beta-helix is an all beta fold Three strands per rung Right-handed helix (RHBH) Also left-handed helices (LHBH) Function: Sugar cleavage Mainly occurs in bacterial pathogens King lab studies RHBH Tailspike Three faces form a prism

Analyzing Beta-Helices Solved structures –RHBH: 5 SCOP SuperFamilies –LHBH: 2 SCOP SuperFamilies –48 solved structures in PDB –8 HSSP representatives Predicting novel beta-helices –Homology modeling, threading, and HMMs do not successfully predict occurrence in cross-validation BetaWrap (King, Berger et al. 2001) successfully predicts RHBH

Lessons from BetaWrap Joint sequence-structure analysis important –Discovered conserved hairpin turn –Discovered internally packed asparagines Beta-strand packing interactions are important –BetaWrap energy function uses strand-to-strand packing probabilities Prediction is not enough –BetaWrap does not predict active site, etc. –Other methods (rotamer libraries etc.) may supplement initial prediction

Trimeric Viral Attachment Fiber Proteins

Trimeric Viral Attachment Fibers King’s interest in beta helix led to interest in two new folds 1.Triple beta-helix (TBH) 2.Triple beta-spiral (TBS) These two folds are our current research area –Consist of three identical interacting chains –TBH is structurally similar to beta-helix –TBS is structurally distinct Both folds characterized by unusual stability to heat, protease, and detergent

Triple Beta-Helix Described by van Raaij et al. in JMB (2001) HomoTrimeric (consists of three identical chains) Two solved structures –Portion of T4 short tail fibre SwissProt: P10390 –Cell puncturing device of T4 SwissProt: P16009

Triple Beta-Spiral Described by van Raaij et al. in Nature (1999) HomoTrimeric (three identical chains) Two solved structures –Human Adenovirus 2 Fibre SwissProt: P03275 –Reovirus Attachment Fibre SwissProt: P03528 Characterized by regular repeat pattern in literature

Preliminary Analysis TBS more regular than TBH –TBS characterized by sequence repeat –Can use standard regex techniques (like PROSITE) to find many putative TBSs –See TBH has so far defied basic characterization –Only two solved structures –The quasi-repeat is less regular than the TBS

Current Research What are we trying to do with TBH and TBS? –There are too few for rigorous prediction tool –Right now we are just “characterizing” them –Searching for sequence-structure patterns –Searching for unique properties –Searching for repetitive sequence motifs Regular Expression is first attempt Search with PSSM sequence profile

Repetitive Sequence Motif Search Existing Methods for repetitive motif search –RADAR (Heger & Holm) and others attempt this –Existing methods do not find the adeno repeat –TBH repeat is not regular enough to search Our approaches (tried so far…) –Basic regular expression (more in supplemental) –PSSM characterizing repeat (in progress)

Thank you Peter Weigele and Eben Scanlon

Supplemental Slides

TBS Information –Human Adenovirus 2 Fiber and Reovirus Attachment Protein σ1 have 27% sequence identity, 52% sequence similarity –Searching SwissProt for the Adenovirus repeat (regex) pattern with more than 6 occurrences finds 3158 matches –Searching SwissProt for the Reovirus repeat (regex) pattern finds matches –PDB ID’s are 1kke, 1qiu

TBH Information –The T4 Short tail fiber TBH and the T4 cell puncturing TBH have 32% sequence identity and 61% sequence similarity –There is no clear repeat pattern in TBH –Tried PSSM and HMM models with alignments derived from known repeat strands in TBH Have not yet figured out a way to restrict to matches with a large number of recurrent repeats Also may want to add a high non-affine gap penalty beyond a certain extension –PDB ID’s are 1H6W, 1K28 –Need to use PQS ( to get trimer imagehttp://pqs.ebi.ac.uk