An analysis of pdb-care (PDB CArbohydrate REsidue check): a program to support annotation of complex carbohydrate structures in PDB files by Thomas Lütteke.

Slides:



Advertisements
Similar presentations
System Development Life Cycle (SDLC)
Advertisements

Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Chapter 9 Describing Process Specifications and Structured Decisions
Chapter 7 Using Data Flow Diagrams
The Protein Data Bank (PDB)
Chapter 9 Using Data Flow Diagrams
Chapter 7 Using Data Flow Diagrams
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
Chromosomes carry genetic information
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Prototype & Design Computer Inputs. How to Prototype & Design Computer Inputs Step 1: Review Input Requirements Step 2: Select the GUI Controls Step 3:
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
Being a binding site: Characterizing Residue-Composition of Binding Sites on Proteins joint work with Zoltán Szabadka and Gábor Iván, Protein Information.
1 Lecture 3: Introducing Data Flow Diagrams (DFDs) Section 1 - The Concept of Diagrams Why use Diagrams? Diagrams as Working Documents Systems Analysis.
Chapter 7 Using Data Flow Diagrams
Defining Digital Forensic Examination & Analysis Tools Brian Carrier.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Chapter 11 Describing Process Specifications and Structured Decisions Systems Analysis and Design Kendall and Kendall Fifth Edition.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CS1Q Computer Systems Lecture 8
Electrical and Computer Systems Engineering Postgraduate Student Research Forum 2001 Experimental measurements of dielectric and conduction properties.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Comparative genomics of zbtb7b between human and mouse.
RNA StructureandFunction Transcription Translation.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Fates of Proteins in Cells See also pages in Goodman.
Algorithms & Flowchart
Copyright © 2010 – MICS 2010, Curt Hill Instructor Tools: Test Data Generation Curt Hill Valley City State University.
Protein Manipulation Domain Specific Language Lu Sun Shuo Wu.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Slide 1 Use Case Packets.
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
CTAP 295: PORTFOLIO PRESENTATION Junior Honors Literature: The Scarlet Letter A.
Compiler Construction (CS-636)
A compiler is a computer program that translate written code (source code) into another computer language Associated with high level languages A well.
DNA Structure and Protein Synthesis (also known as Gene Expression)
Survey is a strategy that involves the collection of data from a pre-determined sample.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Bioinformatics Project BB201 Metabolism A.Nasser
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Biology Ch. 11 DNA and Genes DNA  DNA controls the production of proteins Living tissue is made up of protein, so DNA determines an organism’s.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Entry Task: Lab Notebook 2/18/15 1.What is your favorite thing about yourself? 2.Your least favorite? 3.What comes easy to you? 4.What do you find difficult?
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Modeling Non-Peptide Structures ChemBE 414/614 Guest Lecture.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Take a REST from manual searching: PDBe, programmatically
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Getting the Most out of the PDBe

Chapter Outline 14.1 Nucleic Acid Building Blocks
Coding Concepts (Basics)
Exception Handling Imran Rashid CTO at ManiWeber Technologies.
Molecular Biology of the Gene
Chapter 11 Describing Process Specifications and Structured Decisions
What IS DNA anyway? But First, Do Now:
EGR 2131 Unit 12 Synchronous Sequential Circuits
Presentation transcript:

An analysis of pdb-care (PDB CArbohydrate REsidue check): a program to support annotation of complex carbohydrate structures in PDB files by Thomas Lütteke and Claus-W von der Lieth By David Chapman

Background Protein Data Bank includes 3-D data for carbohydrate structures as well as amino acid structures Protein Data Bank includes 3-D data for carbohydrate structures as well as amino acid structures 3-D data for protein / carbohydrate interactions is analyzed through X-Ray crytallography and Nuclear Magnetic Resonance 3-D data for protein / carbohydrate interactions is analyzed through X-Ray crytallography and Nuclear Magnetic Resonance  The absence of 3-D glycan data in PDB does not necessarily mean a potential glycosolation site is unoccupied

Background  The crytallography may have been done on plasmid replicated proteins, which may not have the same carbohydrates attached as the human form.  Glycosylation usually occurs at asparagine residues in Asn-X-Ser/Thr sequons where X does not equal proline  Approximately 30% of all 1663 PDB entries (Sep 2003) containing carbohydrates contain errors in glycan description

Biological Significance Protein / Carbohydrate interactions are important because they are involved in a variety of biological processes Protein / Carbohydrate interactions are important because they are involved in a variety of biological processes  Fertilization  Embryonic development  Cellular differentiation

Background High error rate in PDB glycan description is mainly due to incorrect assignment of saccharide units High error rate in PDB glycan description is mainly due to incorrect assignment of saccharide units  Sequences for complex carbohydrates differ significantly from single letter amino acid sequences  The number of naturally occurring residues is much larger for carbohydrates  Each pair of monosaccharide residues can be linked in several ways  A residue can be connected to three or four others (branching)

Background Unlike amino acids, carbohydrates use a three letter code which are defined the HET dictionary in PDB Unlike amino acids, carbohydrates use a three letter code which are defined the HET dictionary in PDB A new residue name is required for each stereochemically different sugar unit A new residue name is required for each stereochemically different sugar unit This makes the correct assignment complicated, tedious and error prone This makes the correct assignment complicated, tedious and error prone

Background Examples of Definitions of carbohydrate residues: Examples of Definitions of carbohydrate residues:  AGCalpha-D-Glucopyranose  BGCbeta -D-Glucopyranose  FCAalpha-D-Fucose  FCBbeta-D-Fucose There are more than 200 carbohydrate residues used in PDB There are more than 200 carbohydrate residues used in PDB

Implementation Pdb-care is based on the pdb2linucs carbohydrate detection program Pdb-care is based on the pdb2linucs carbohydrate detection program  Pdb2linucs is able to identify and assign carbohydrate structures using only the reported atom types and their 3D coordinates  The program output is in LINUCS notation and is used to normalize complex carbohydrate structures Pdb-care uses a translation table built in XML in order to compare the LINUCS notation from pdb2linucs to the residue assignments in the PDB group dictionary Pdb-care uses a translation table built in XML in order to compare the LINUCS notation from pdb2linucs to the residue assignments in the PDB group dictionary

Implementation The translation table contains: The translation table contains:  141 monosaccharides  31 oligosaccharides  77 combined residues Pdb-care was written in the C language Pdb-care was written in the C language Front end is a web interface implemented in PHP Front end is a web interface implemented in PHP

Implementation Pdb-care web interface can accommodate either direct input using copy/paste of a pdb file or locating a file on a local hard drive or using a PDB-ID Pdb-care web interface can accommodate either direct input using copy/paste of a pdb file or locating a file on a local hard drive or using a PDB-ID The pdb-care protocol reports the type of problems, inconsistencies and errors detected The pdb-care protocol reports the type of problems, inconsistencies and errors detected

Program Example pdb-care examples pdb-care examples pdb-care examples pdb-care examples

Conclusion The authors made relevant points regarding the biological significance of protein-carbohydrate interactions and the need for accurate glycan residue information in PDB. The authors made relevant points regarding the biological significance of protein-carbohydrate interactions and the need for accurate glycan residue information in PDB. However, the authors did not go into detail regarding the actual implementation of the translation table used in pdb-care so it is difficult to judge the accuracy of their program. However, the authors did not go into detail regarding the actual implementation of the translation table used in pdb-care so it is difficult to judge the accuracy of their program.