A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011.

Slides:



Advertisements
Similar presentations
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Advertisements

Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Profiles for Sequences
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Protein structure (Part 2 of 2).
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Comparative ab initio prediction of gene structures using pair HMMs
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
Structure Alignment in Polynomial Time Rachel Kolodny Stanford University Nati Linial The Hebrew University of Jerusalem.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Classification A comparison of function inference techniques.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Graph-based Analytics
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Protein Tertiary Structure Prediction
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Centre for Integrative Bioinformatics.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Transmembrane proteins in the Protein Data Bank: identification and classification Gabor, E. Tusnady, Zsuzanna Dosztanyi and Istvan Simon Bioinformatics,
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
The iPlant Collaborative
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Several motifs (  -sheet, beta-alpha-beta, helix-loop-helix) combine to form a compact globular.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Protein Structure Comparison
Multiple Alignment and Phylogenetic Trees
Prediction of Protein Structure and Function on a Proteomic Scale
Protein structure prediction.
Volume 26, Issue 3, Pages e2 (March 2018)
Volume 26, Issue 3, Pages e2 (March 2018)
Presentation transcript:

A network-based representation of protein fold space Spencer Bliven Qualifying Examination6/6/2011

Overview 1. Background & Motivation 2. Preliminary Research 3. Proposed Future Research

Fold Space What protein folds are possible? Discrete or Continuous? Both? Neither? What portion of fold space is utilized by nature? Long debated questions. Why? Understanding of structure-function relationship Protein design/engineering Protein evolution Classification

Previous Work Orengo, Flores, Taylor, Thornton. Protein Eng (1993) vol. 6 (5) pp Holm and Sander. J Mol Biol (1993) vol. 233 (1) pp Holm and Sander. Science (1996) vol. 273 (5275) pp Shindyalov and Bourne. Proteins (2000) vol. 38 (3) pp Hou, Sims, Zhang, Kim. PNAS (2003) vol. 100 (5) pp Taylor. Curr Opin Struct Biol (2007) vol. 17 (3) pp Sadreyev et al. Curr Opin Struct Biol (2009) vol. 19 (3) pp α α+β β α/β

Why can we do better? More structures Sampling of globular folds “saturated” Few novel folds being discovered Geometric arguments for saturation of small protein folds Recent all-vs-all computation Cluster sequence to 40% identity 17,852 representative (updated weekly) 189 million FATCAT rigid-body alignments entGrowthChart.do?content=total&se qid=100 Accessed 5/31/2011

Structural Similarity Graph Nodes: PDB chains, non-redundant to 40% Edges: FATCAT-rigid alignments “Significant” edges: p<0.001 Length > 25 Coverage > 50 Hierarchically cluster to reduce complexity in visualization a b a/b a+b Multi Membrane Small

Agreement with SCOP Classp<10 -6 Foldp<10 -7 Superfamilyp<10 -10

Continuity Grishin. J Struct Biol (2001) vol. 134 (2-3) pp Skolnick claims ≤ 7 intermediates between any proteins We observe network diameter=15 Can find interesting paths

C4C4 C5C5 C6C6 C7C7 Symmetry Beta Propellers

Symmetry Functionally important Protein evolution (e.g. beta-trefoil) DNA binding Allosteric regulation Cooperativity Widespread (~20% of proteins) Focus of algorithmic work FGF-1Lee & Blaber. PNAS 2011 TATA Binding Protein 1TGH Hemoglobin 4HHB

Cross-class example 3GP6.A PagP, modifies lipid A f.4.1 (transmembrane beta- barrel) 1KT6.A Retinol-binding protein b.60.1 (Lipocalins)

Summary of Preliminary Research Calculated all-vs-all alignment Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE. Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics (2010) vol. 26 (23) pp Built network of significant alignments Approximately matches SCOP classifications Improved structural alignment algorithms Identify symmetry, circular permutations, topology independent alignments Discussed more in report

Future Research Improve the network 1. Improve all-vs-all comparison algorithm 2. Tune parameters during graph generation Annotate the network & draw biological inferences 3. Annotate nodes with functional information 4. Compare with other networks Create new networks 5. Enhance structural comparison algorithms

1. Improve all-vs-all comparison algorithm Need domain decomposition Use Combinatorial Extension (CE)

2. Tune parameters during graph generation Don’t use p-values Shouldn’t compare p-values, statistically* Not normalized by secondary structure Not accurate due to multiple testing problem Use TM-score RMSD, normalized to the alignment length Determine optimal thresholds for determining “significance” For instance, train an SVG * Technically ok here, since one-to-one with the FATCAT score

FATCAT p-value by Class Perform poorly on all-alpha in “twilight zone” Terrible on membrane proteins Probably reflects non- structural considerations in SCOP assignment

(Dis)agreement with SCOP by Class

3. Annotate nodes with functional information SCOP/CATH classifications GO terms Metal binding Ligand binding Symmetry a b a/b a+b Multi Membrane Small

4. Compare with other networks Define other types of network over the set of protein representatives Protein-protein interactions Co-expression Correlate to the structural similarities Structural similarity Protein-protein interaction

5. Enhance structural comparison algorithms Improve automated pseudo-symmetry detection Find topology-independent relationships C3C3

Summary Fold space as network Improve network creation Annotate network with functional information Improve structural similarity detection

Remaining Challenges Short Term: Hierarchical clustering amplifies errors Bias towards short, helical alignments Better metric of clustering accuracy Correct p-value calculation (remove secondary structure bias), or use TM-value as threshold Long Term Including more functional characteristics (metal ions, GO terms, HDX profiles) Use other types of similarity to construct graph

Acknowledgments Bourne Lab Philip Bourne Andreas Prlić Lab & PDB members Qualifying Exam Committee Ruben Abagyan Patricia Jennings Andy McCammon Collaborators Philippe Youkharibache Jean-Pierre Changeux Rotation Advisors Pavel Pevzner Philip Bourne José Onuchic & Pat Jennings Mike MacCoss Virgil Woods