How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
05/27/2006 Modeling and Determining the Structures of Proteins and Macromolecular Assemblies Depts. of Biopharmaceutical Sciences and Pharmaceutical Chemistry.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Pfam(Protein families )
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Protein structure (Part 2 of 2).
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Modules An Introduction to Bioinformatics.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Protein Structure Prediction II
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Protein Structure Prediction and Analysis
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Bioinformatics Analysis of YqjG: an introduction and some questions YqjG: “Uncharacterized protein” from Escherichia coli UniProt ID = P42620 (YQJG_ECOLI)
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Problem Statement and Motivation Key Achievements and Future Goals Technical Approach Investigators: Yang Dai Prime Grant Support: NSF High-throughput.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
1 Introduction(1/2)  Eukaryotic cells can synthesize up to 10,000 different kinds of proteins  The correct transport of a protein to its final destination.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.
Classification of protein and domain families Sequence to function Protein Family Resources and Protocols for Structural and Functional Annotation of Genome.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Examining Protein Folding Process Simulation and.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
InterPro Sandra Orchard.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Protein families, domains and motifs in functional prediction May 31, 2016.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Lateral organization and electrostatic control of signaling
Sequence based searches:
Protein Structure Prediction and Protein Homology modeling
Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases
Reporter: Yu Lun Kuo (D )
Protein structure prediction.
Volume 109, Issue 6, Pages (September 2015)
CARPEL FACTORY, a Dicer Homolog, and HEN1, a Novel Protein, Act in microRNA Metabolism in Arabidopsis thaliana  Wonkeun Park, Junjie Li, Rentao Song,
Volume 46, Issue 2, Pages (April 2012)
Presentation transcript:

How to use computational tools to maximize the coverage of protein sequence/structure/function space Murray Lab: Nebojsa Mirkovic, Tonya Silkov, Hunjoong Lee, Frank Indiviglio, Janey Li Honig Lab: Markus Fischer and Donald Petrey PSI Bottlenecks 1) Not enough connection between modeling and biology/experiment 2) “Modelability” not used in defining families or a dynamic target selection strategy 3) Incomplete use of functional information in model building

denotes a phosphoinositide headgroup Phosphoinositide signaling processes

Intracellular membranes contain distinct lipid compositions and carry different charge densities Binding behavior of a +8e peptide to membranes carrying different negative charge densities Biophysical properties of cellular protein/membrane interactions

Motif 1Motif 2 C1/DAGC2/Ca 2+ Protein kinase C– , ,  PH/PIP 2 C2/Ca 2+ Phospholipase C–  PH/PIP 2 PX/PI3PPhospholipase D FYVE/PI3PPH/PIFGD1(a Rho/Rac GEF) Basic/PSPH/PIP 2 GPCR kinase C2/Ca 2+ NonpolarCytosolic phospholipase A2 ENTH/PIP 2 Prot/prot Epsin1, AP180 MyristateBasic/PSSrc, MARCKS, (HIV-1 Gag) Proteins that function in phosphoinositide pathways contain multiple membrane binding motifs Multiple inputs: Temporal and spatial control of subcellular targeting through coincidence counting

+25 mV -25 mV Many peripheral proteins, especially those involved in subcellular targeting, are either highly basic or charge polarized.

Quantitative physical theory for the interaction of proteins with membrane surfaces

Connection among biophysical properties, membrane binding behavior, and subcellular localization No calcium Calcium Phospholipase C  C2 domains Homology models of all isoforms 5-lipoxygenase C2 domain Homology model

Structural genomics and proteomics-level studies of lipid-interacting domains: Northeast Structural Genomics and Arabidopsis 2010 Apply what we have learned to whole families BAR domains C1 domains C2 domains ENTH domains FERM domains FYVE domains GRAM domains High-throughout comparative modeling: Leverage structure information PDZ domains PH domains PHD domains PX domains Sec14 domains START domains VHS domains

All lipid-binding domains in all model genomes Use what we have learned computationally and experimentally to develop: 1. More complete lists of peripheral proteins of known structure from the PDB; 2. Detect and model all instances of peripheral proteins in sequence databases; 3. Discover new instances, novel functionalities, new families; 4. Create databases to house this information; 5. Use this information to annotate protein sequences of unknown function.

PDB Structure Sequence Homologues Non-redundant & unsolved Models Model quality Secondary structure Multiple alignments Modeling alignments Homologous structures Data on homologues (species, IDs, coverage, length, e-value, seq. is.) Leverage: unique models MarkUs: Function annotation Family analysis Specialized databases Web-accessible models database DSSP PSI-BLAST Modeller or Nest PROSA, pG score ClustalW pG > 0.7 Target reprioritization Nebojsa Mirkovic Proteins 66:766 SkyLine: High-throughput comparative modeling “Modelability”: Create “reliable” models using known structures as templates

NESG Models Database Frank Indiviglio

Models Database: “Leverage”: Number and quality of 3D models produced from a set of structures as templates PSI1 and PSI2: NESG leverage ~220 sequence unique models Hunjoong Lee

Alternative models based on different PDB templates, reliability measures and sequence coverage

Additional search mechanisms: Expand methodology to the entire PDB, create specialized family and genome databases

2.3x10 -9 M 2.6x10 -9 M C2 domains from phospholipase C isoforms: Comparative functionality KdKd KdKd

8.9x10 -8 M → 6.2x10 -9 M 4.0x10 -8 M C2 domains from phospholipase C isoforms: Comparative functionality KdKd

2.3x10 -9 M Differences between d1 and d4 : Detection of specificity determinants leads to hypotheses for differential regulation 8.9x10 -8 M → 6.2x10 -9 M KdKd KdKd

FYVE domain family: Electrostatic properties of models correlate with in vitro binding measurements and subcellular localization: Comparison of different members Whole family modeling: FYVE domains

FYVE domain family: Electrostatic properties of models correlate with in vitro binding measurements and subcellular localization: Residue substitution of a single family member

Model/ComputationExperiment Structure There is no straightforward prescription: Each family has to be dealt with individually “Modelability”: Create “reliable” models using known structures as templates Dynamic target re-prioritization is an important strategy

START domain leverage Modelability (7378) versus 30% sequence identity (2767)

Characterize different START domains based on structural information Discriminate whether START domains bind cholesterol or PC (PI) or other ligands Provide leads for chemical library studies for function-interfering compounds Detailed computational analysis and function annotation Fine-grain structure analysis in the absence and presence of potential ligand Experimental characterization: Protein production, SPR analysis, cellular studies Collaborations with Experimental Groups Cho Lab: High-throughput analysis of Human and Arabidopsis START domains Clark Lab: Docking studies of ubiquinone into nematode START domain, electron transport

START domains in the Arabidopsis thaliana genome SkyLine produces quality models for 58 non-redundant sequences versus 35 Arabidopsis START domains detected by sequence searches (Genome Biology 5:R41) Key Findings (Tonya Silkov) 1.45 sequences are of the Birch antigen class 2. Two sequences correspond to AHA1 domains (Activator of Hsp90 ATPase) SCOP classifies AHA domains as belonging to the Birch antigen superfamily 3.Two sequences predicted in databases as integral membrane proteins of unknown function 4.Five sequences for related models apparently represent a group of uncharacterized plant START domains

Fig. 1 ENTH domainANTH domainVHS domain Cross-genomic studies Structure similarity among lipid-binding domains Tonya Silkov PIP 2

J Biol Chem. 278:28993 with Cho Lab Helix 0 ANTH ENTH ENTH and ANTH: similar topology, different membrane binding mechanism

Helix 0 From above Tonya Silkov ENTH ANTH ENTHANTH Cho Lab: First 25 amino acids are required for both PIP2 binding and membrane penetration. Produce enough protein to obtain crystals. Arabidopsis domain with novel dual ENTH and ANTH functionality

Fig. 1 ENTH domainANTH domainVHS domain A novel functional subclass of VHS domains Tonya Silkov

KIAA1530 (Homo sapiens) XP_ (Strongylocentrotus purpuratus) CAB71110 (Arabidopsis thaliana) XP_ (Gallus gallus) Tonya Silkov A new VHS-related family, “VR domains”, found in other genomes

Among this subset of VHS domains, the basic surface patch is conserved Hypothesis: It constitutes a phosphoinositide-specific binding site VR domain family of membrane-binding VHS domains Tonya Silkov Human and Arabidopsis constructs are being examined in the Cho lab

The ability to construct a quality model of a sequence is a more strategic definition of a protein family member Allows for the discovery of distantly related members With function annotation, allows for the discovery of new sub-groups Structures + Sequences -> Models + Function annotation (Markus) More comprehensive coverage of protein sequence/structure/function space By constantly updating resources as new information becomes available, we produce a more relevant (dynamic) target selection strategy