Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT.

Slides:



Advertisements
Similar presentations
Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Advertisements

Secondary structure prediction from amino acid sequence.
Protein structure prediction 29/01/2015 Mail: Prof. Neri Niccolai Simone Gardini
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.
PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Functions Paper by Umar Syed and Golan Yona department of CS, Cornell.
Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
Protein Structure, Databases and Structural Alignment
Repetitive Beta Folds Form, Function, and Properties.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Homology modelling ? X-ray ? NMR ?. Homology Modelling !
Carnegie Mellon School of Computer Science Copyright © 2004, Carnegie Mellon. All Rights Reserved. Biological Language Modeling Project Segmentation Conditional.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Evolving Models of Biological Sequence Similarity Daniel P. Miranker The University of Texas at Austin [Chenetal98]
Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Proteins Secondary Structure Predictions Structural Bioinformatics.
Protein Structure and Function 1 , 2 , 3 , 4  Structure Viewing, interpreting structure Protein Characterization BIO520 BioinformaticsJim Lund.
PART II. Prediction of functional regions within disordered proteins Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Predicting The Beta-Helix Fold From Protein Sequence Data Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger MIT.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Remote Homology Detection: Beyond Hidden Markov Models Lenore Cowen CS Department Tufts University.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Proteins Structure Predictions Structural Bioinformatics.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Matt Menke, Tufts Bonnie Berger, MIT Lenore Cowen, Tufts
Sequence Based Analysis Tutorial
Protein Structure Prediction
Protein Structures.
Protein structure prediction.
Structure prediction: Folding proteins by pattern recognition
Presentation transcript:

Mathematical Challenges in Protein Motif Recognition Bonnie Berger MIT

Approaches to Structural Motif Recognition Alignments Multiple alignments & HMMs Threading Profile methods (1D, 3D) * Statistical methods

Structural Motif Recognition 1) Collect a database of positive examples of a motif (e.g., coiled coil, beta helix). 2) Devise a method to determine if an unknown sequence folds as the motif or not. 3) Verification in lab.

Our Coiled-Coil Programs PairCoil [Berger, Wilson, Wolf, Tonchev, Milla, Kim,1995] predicts 2-stranded CCs MultiCoil [Wolf, Kim, Berger, 1997] predicts 3-stranded CCs LearnCoil-Histidine Kinase [ Singh, Berger, Kim, Berger, Cochran, 1998 ] predicts CCs in histidine kinase linker domains LearnCoil-VMF [Singh, Berger, Kim, 1999] predicts CCs in viral membrane fusion proteins

Long Distance Correlations In beta structures, amino acids close in the folded 3D structure may be far away in the linear sequence

Biological Importance of Beta Helices Surface proteins in human infectious disease: virulence factors (plants, too) adhesins toxins allergens Amyloid fibrils (e.g., Alzheimer’s, Creutzfeld Jakob (Mad Cow) disease) Potential new materials

What is Known Solved beta-helix structures: 12 structures in PDB in 7 different SCOP families Related work: ID profile of pectate lyase (Heffron et al. ‘98) HMM (e.g., HMMER) Threading (e.g., 3D-PSSM)

Key Databases Solved structures: Protein Data Bank (PDB) (100’s of non-redundant structures) [ Sequence databases: Genbank (100’s of thousands of protein sequences) [ SWISSPROT (10’s of thousands of protein sequences) [

Performance: On PDB: no false positives & no false negatives. Recognizes beta helices in PDB across SCOP families in cross-validation. Recognizes many new potential beta helices. Runs in linear time (~5 min. on SWISS-PROT). [Bradley, Cowen, Menke, King, Berger: RECOMB 2001] BetaWrap Program

Histogram of protein scores for: beta helices not in database (12 proteins) non-beta helices in PDB (1346 proteins )

Single Rung of a Beta Helix

3D Pairwise Correlations Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B3 T2 B2 B1

3D Pairwise Correlations Stacking residues in adjacent beta-strands exhibit strong correlations Residues in the T2 turn have special correlations (Asparagine ladder, aliphatic stacking) B3 T2 B2 B1

Question: but how can we find these correlations which are a variable distance apart in sequence? [Tailspike, 63 residue turn]

Finding Candidate Wraps Assume we have the correct locations of a single T2 turn (fixed B2 & B3). Generate the 5 best-scoring candidates for the next rung. B2 B3 T2 Candidate Rung

Scoring Candidate Wraps (rung-to-rung) Similar to probabilistic framework plus: Pairwise probabilities taken from amphipathic beta (not beta helix) structures in PDB. Additional stacking bonuses on internal pairs. Incorporates distribution on turn lengths.

Scoring Candidate Wraps (5 rungs) Iterate out to 5 rungs generating candidate wraps: Score each wrap: - sum the rung-to-rung scores - B1 correlations filter - screen for alpha-helical content

Potential Beta Helices Toxins: Vaculating cytotoxin from the human gastric pathogen H. pylori Toxin B from the enterohemorrhagic E. coli strain O157:H7 Allergens: Antigen AMB A II, major allergen from A. artemisiifolia (ragweed) Major pollen allergen CRY J II, from C. japonica (Japanese cedar) Adhesins: AIDA-I, involved in diffuse adherence of diarrheagenic E. coli Other cell surface proteins: Outer membrane protein B from Rickettsia japonica Putative outer membrane protein F from Chlamydia trachomatis Toxin-like outer membrane protein from Helicobacter pylori

The Problem Given an amino acid residue subsequence, does it fold as a coiled coil? A beta helix? Very difficult: peptide synthesis (1-2 months) X-ray crystallization, NMR (>1 year) molecular dynamics Our goal: predict folded structure based on a template of positive examples.

Collaborators Math / CS Mona Singh Ethan Wolf Phil Bradley Lenore Cowen Matt Menke David Wilson Theo Tonchev Biologists Peter S. Kim Jonathan King Andrea Cochran James Berger Mari Milla