Download presentation
Presentation is loading. Please wait.
Published bySharon Sparks Modified over 9 years ago
1
Dictionaries and Ontologies in Structural Biology
2
Scope of Ontology PDB Exchange Dictionary Meta Data Experimental information Molecular description Structural description Coordinates Macromolecule Ligands Solvent
3
History of Project 1990 mmCIF project begins 1992 NDB serves as testbed 1998 PDB adopts mmCIF as core data representation 2001 PDB Exchange Dictionary incorporates X-ray, NMR and cryoEM 2003 direct translation of mmCIF data & dictionaries into XML(PDBML)
4
Challenges in Creating an Ontology Appropriate coverage and level of detail Acquiring and organizing expert input Getting consensus Evolution with the science Create a rigorous syntax that can be translated (eg mmCIF ->XML)
5
mmCIF (PDB Exchange) is an Ontology Relationships among data items are explicit
6
Features of Dictionary Data Items Definitions Examples Data types Ranges or enumerations Simple organization Tables and columns (categories) Related data item sets (subcategories) Chapters (category groups) Associations Parent-child relationships Interdependencies/exclusivity Methods
7
Dictionary Definition Example save__em_detector.type _item_description.description ; The detector type used for recording images. Usually film or CCD camera. ; _item.name '_em_detector.type' _item.category_id em_detector _item.mandatory_code no _item_type.code line loop_ _item_enumeration.value 'KODAK SO163 FILM' 'GATAN 673' 'GATAN 676' ’TVIPS TEMCAM F224' 'TVIPS FASTSCAN F114' PROSCAN AMT save_ Controlled vocabulary Data type Schema Semantics
8
Dictionary Definition Example save__struct_biol.id _item_description.description ; The value of _struct_biol.id must uniquely identify a record in the STRUCT_BIOL list. Note that this item need not be a number; it can be any unique identifier. ; _item.name '_struct_biol.id' _item.category_id struct_biol _item.mandatory_code yes _item_type.code line loop_ _item_linked.child_name _item_linked.parent_name '_struct_biol_gen.biol_id' '_struct_biol.id' '_struct_biol_keywords.biol_id' '_struct_biol.id' '_struct_biol_view.biol_id' '_struct_biol.id' '_struct_ref.biol_id' '_struct_biol.id' save_ Parent-child (foreign key) relationships Data type Schema Semantics
9
Molecular Description Macromolecular sequence Macromolecular source Detailed chemical descriptions of monomers Detailed chemical descriptions of ligands and solvent
10
Non-polymer Chemical Details Molecular Description Molecular Component Dictionary Biological Source Molecular Hierarchy Macromolecular Polymer Sequence
11
Structural Description Coordinates of the experimental subunit Symmetry operations required to build functional assemblies Structural annotation Secondary structure Hydrogen bonding classification Base pairs and base pair steps Backbone torsions and base morphology
12
Base Pairs Base Pair Steps Hydrogen Bonding Secondary Structure Atomic Coordinates Experimental Subunits Functional Units Molecular Description Backbone Torsions Base Morphology Structural Hierarchy
13
Connection between Molecular and Structure Descriptions Macromolecular sequences are explicitly aligned to experimentally determined chemical sequences Monomers, ligands and solvent matched with chemical descriptions in the PDB molecular components dictionary Molecular Description Structural Description
14
Relationships with other Resources Sequence database correspondences Domain/family annotation Functional annotation (GO/EC/OMIM) Structural database correspondences SCOP/CATH/RNAML structural classifications Functional annotation Citation and related literature
15
Supporting Software Tools Dictionaries, Data Files and Databases Validating Parsers for Files and Dictionaries (CIFPARSE) Dictionary access and presentation tools (CIFOBJ) File format translation tools (MAXIT, CIFTr) PDB Validation Suite Data acquisition and editor tool (ADIT) Database Builder, Loader (mmCIFLOADER) XML translation tool Data extraction and merging tools (PDB_EXTRACT)
16
Availability http://sw-tools.pdb.org/ WWW and CDROM Distribution Source and Binary Distributions Open Source License Supported on Linux, IRIX, ALPHA, SUNOS, and Mac OSX
17
Structure Related Data Dictionaries DDL2 mmCIF RNAML Ligand data NMR Cryo-EM Modeling Crystallization Symmetry Image data BIOSYNc Protein Production
18
Access RCSB Protein Data Bank Site http://www.pdb.org/ RCSB/PDB Beta Data Site http://pdbbeta.rcsb.org/ RCSB/PDB Dictionary Resource Site http://mmcif.pdb.org / RCSB/PDB Deposition Site http://deposit.pdb.org / PDBML site http://pdbml.pdb.org/ RCSB/PDB Software Download Site http://sw-tools.pdb.org /
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.