Enabling Rapid Interaction with the Protein Data Bank

Slides:



Advertisements
Similar presentations
CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
Advertisements

Database System Concepts and Architecture
Research Collaboratory for Structural Bioinformatics Macromolecular Structure Middleware OpenMMS An Ontology Driven Architecture.
Data Representation, Data Integration and API Delivery of PDB Data John Westbrook RCSB/PDB Rutgers University.
Dictionaries and Ontologies in Structural Biology.
Update on PDB Data Deposition Specifications
Seminarium on Component-based Software Engineering Jan Willem Klinkenberg CORBA.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
The MEMOPS Programming Framework Wayne Boucher, Cambridge
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
M.Sc. Course, Dept. of Informatics and Telecommunications, University of Athens S.Hadjiefthymiades “Web Application Servers” Basics on WAS WAS are necessary.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Enabling Rapid Interaction with the Protein Data Bank Alexy Khrabrov Rutgers University John D. Westbrook Rutgers University.
Chapter 10 Architectural Design
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
EMBL-EBI MSD-mine. EMBL-EBI MSD-mine overview  Web application for online data analysis and mining For the advanced MSDSD researcher Interactive ad-hoc.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
Introduction to MDA (Model Driven Architecture) CYT.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Selected Topics in Software Engineering - Distributed Software Development.
1 5 Nov 2002 Risto Pohjonen, Juha-Pekka Tolvanen MetaCase Consulting AUTOMATED PRODUCTION OF FAMILY MEMBERS: LESSONS LEARNED.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
3 Copyright © 2009, Oracle. All rights reserved. Accessing Non-Oracle Sources.
CSC 480 Software Engineering Lecture 18 Nov 6, 2002.
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
©Kabira Technologies Inc, 2001 May 7-9, 2001 Westward Look Resort Tucson, Arizona SMUG 2001 Execution in UML.
Data Integration and Management A PDB Perspective.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
XML and Database.
Fall CIS 764 Database Systems Engineering L18.2 : Object Relational Mapping … ….Object persistence.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Apr. 8, 2002Calibration Database Browser Workshop1 Database Access Using D0OM H. Greenlee Calibration Database Browser Workshop Apr. 8, 2002.
1 Software Engineering: A Practitioner’s Approach, 6/e Chapter 10a: Architectural Design Software Engineering: A Practitioner’s Approach, 6/e Chapter 10a:
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Preface IIntroduction Objectives I-2 Course Overview I-3 1Oracle Application Development Framework Objectives 1-2 J2EE Platform 1-3 Benefits of the J2EE.
EMBL-EBI Dimitris Dimitropoulos MSD-mine. EMBL-EBI MSD-mine overview  Web application for online data analysis and mining  For the advanced MSDSD researcher.
(C) 2003 University of ManchesterCS31010 Lecture 14: CORBA.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Fall CIS 764 Database Systems Engineering L11: Object Relational Mapping … (a) ORM, Object persistence (b) Pets sequence.
CORBA Antonio Vasquez, John Shelton, Nidia, Ruben.
The Holmes Platform and Applications
Introduction to DBMS Purpose of Database Systems View of Data
11gR2 Integration Extensibility
Components.
Common Object Request Broker Architecture (CORBA)
Data Representation, Data Integration and API Delivery of PDB Data
Flanders Marine Institute (VLIZ)
CORBA Alegria Baquero.
POOL persistency framework for LHC
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
CORBA Alegria Baquero.
SDMX Reference Infrastructure Introduction
MANAGING DATA RESOURCES
Overview of big data tools
Data Model.
Metadata Framework as the basis for Metadata-driven Architecture
Introduction to DBMS Purpose of Database Systems View of Data
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Metadata The metadata contains
TargetDB and PEPCDB •
Supporting High-Performance Data Processing on Flat-Files
TN19-TCI: Integration and API management using TIBCO Cloud™ Integration
Software Architecture & Design
Presentation transcript:

Enabling Rapid Interaction with the Protein Data Bank Alexy Khrabrov Rutgers University John D. Westbrook

Goals Provide application and database access to macromolecular structure data Follow standards-based approach (OMG MMS finalized 2001) Build on informatics structure of PDB data ontology Provides high performance access Direct access to compact binary data structures (e.g. coordinates) Provide broad granularity of access (individual atoms to biological assemblies)

Program Level Access to the Details of Molecular Structure Ligand – Which ligands are contained within the entry? Chain/Entity – Extract the sequence and coordinates for each molecular entity. Secondary Structure – Extract helices and sheets for the entry. Residues/Atoms - What is the environment of this residue? Extract the coordinates for a selection of atoms or residues.

API Architecture Features API organization based on PDB Exchange Data Dictionary - access methods are provided at the level of data categories/classes PDB Exchange Dictionary provides the content to automatically generate: OMG Interface Definition Language (IDL) and access classes SQL queries required to support Corba server Software to load PDB datafiles in memory or into a supporting relational database engine

Current Data Dictionaries http://deposit.pdb.org/mmcif/ PDB data exchange (XML Schema/CIF) Including structural genomics and data harvesting extensions mmCIF NMR 3D-EM Modeling Crystallization Symmetry Image data BIOSYNC 27

Extending Data Dictionaries for Deposition X-ray macromolecular naming, source organism, crystallization and cell parameters, data collection, structure solution and phasing, model building, refinement, model quality NMR explicit details on sample preparation, contents and conditions, constraints, force constants, related statistics Protein Production source information, target gene production, bacterial cloning, bacterial expression, purification 29

Elements of Dictionary Metadata Data Attributes Definition Examples Data type (primitive type/regular expression patterns) Range or allowed values Classes Categories Subcategories Category groups Associations Parent-child relationships Interdependencies/exclusivity Methods

Automatic Production of Macromolecular Structure API Components Metamodel Framework PDB Exchange Dictionary + API Specific Data Dictionaries CORBA IDL, SQL Schema, XML DTD/Schemas, Data Loaders Database Access Classes

Macromolecular Structure API Data Flow mmCIF Parsers Applications XML Files mmCIF Data Files (Data Reference Standard) Relational Database CORBA Server

Metadata Framework PDB Exchange Dictionary Grouping Dictionary Defines content model Grouping Dictionary Maps dictionary content to API organization Assigns attributes to API aggregate data types and indices Schema Mapping Dictionary Maps content to physical storage layer 29

Automatic Generation of IDL Metadata framework is input data for automated generation of Corba IDL IDL is a platform independent definition of API IDL is used to produce client stubs and server skeleton classes on any platform 29

Automatic Generation of API Server Metadata framework is input data for automated generation of server access classes - SQL access methods Implementation of abstract skeleton methods using DB2 CLI Integrate with any custom server methods 29

API Server Extension Extend content model through PDB exchange data dictionary Extend supporting dictionaries in metadata framework Autogenerate IDL Autogenerate skeleton implementations Integrate custom code 29

Supporting Alternative APIs Adapt IDL autogenerator Revise MDF->IDL to MDF->new API spec Adapt autogenerator of server skeleton implementations Integrate custom methods 29

Server Availability OpenMSS toolkit provides Java interface to Oracle/MySQL using JDBC (core mmCIF classes) C++ server using native interface to DB2 (EEE) implemented on 4-node Linux cluster (NDB beta test in Sept.) Installation of DB2 (EEE) at SDSC underway to support high-performance access 29

Client Program Examples DsMmsMacromolecularStructure.idl excerpt: struct AtomSite { string id; IndexId type_symbol; AtomIndex label; IndexId label_entity; VectorXYZ cartn; float occupancy; float b_iso_or_equiv; };

Client Program Examples A primary requirement of the design was that it present an interface that was clearly defined and easy to use from the point of view of developing new applications. The code examples in this section illustrate how client programs can use the API to quickly access macromolecular structure data. As a simple example the following Python code fragment will print out the atom identifier and the Cartesian (x, y, z) position for atoms in the macromolecule 4hhb. Example 1. Retrieving the AtomSite list for hemoglobin (4HHB) and printing the atomic coordinates. try: sid = ”4HHB" e = ef.get_entry_from_id(sid); except: print "cannot get entry %s, exiting!" % sid sys.exit(1) print "got entry!" # Get the atom site list atoms = e.get_atom_site_list() print "got %d atoms total" % (len(atoms)) print "A few atoms:" for a in atoms[:10]: print "%s\t%.3f %.3f %.3f" % (a.id, a.cartn.x, a.cartn.y, a.cartn.z)

# Get the symmetry information s = e.get_sym_info() Example 2. Listing symmetry information and the residues ranges for the helices of the hemoglobin (4HHB).   # Get the symmetry information s = e.get_sym_info() print "space group: %s" % s.space_group print "cell constants: " c = s.acell.unit_cell print "a=%.3f, b=%.3f, c=%.3f" % \ (c.length_a, c.length_b, c.length_c) print "alpha=%.3f, beta=%.3f, gamma=%.3f" % \ (c.angle_alpha, c.angle_beta, c.angle_gamma) # Get the secondary structures sconfs = e.get_struct_conf_list() print "Secondary structures:" for a in sconfs: print a.id, '\t', \ a.beg_auth.asym.id, a.beg_auth.comp.id, a.beg_auth.seq.id, \ '\t-->', \ a.end_auth.asym.id, a.end_auth.comp.id, a.end_auth.seq.id

Client Availability Example clients provide category-level access in Java OpenMMS and C++ native servers Clients available in Java, C++ and Python C++ API extended to support efficient detailed molecular selections (e.g. coordinates of secondary structure elements, symmetry related molecular elements, biological assemblies) 29

Access Protein Data Bank Site OpenMMS site (Java implementation) http://www.pdb.org/ OpenMMS site (Java implementation) http://openmms.sdsc.edu PDB Software Download Site (C++ and Python implementation) http://deposit.pdb.org /mmcif/FILM/ PDB Dictionary Resource Site http://deposit.pdb.org /mmcif/ PDB Beta Data Site ftp://beta.rcsb.org/pub/pdb/uniformity/data/ 29