Memops Data modelling and automatic code generation Edinburgh 9 September 2008.

Slides:



Advertisements
Similar presentations
CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
Advertisements

.NET Technology. Introduction Overview of.NET What.NET means for Developers, Users and Businesses Two.NET Research Projects:.NET Generics AsmL.
CIP4 JDF APIs JDF Editor Elena Skobchenko
SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.
UNDERSTANDING JAVA APIS FOR MOBILE DEVICES v0.01.
CCPNmr Analysis – from spectrum to structure and more Victoria A. Higman Leibniz-Institut für Molekulare Pharmakologie, Berlin and School of Chemistry,
The MEMOPS Programming Framework Wayne Boucher, Cambridge
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Database Management Systems (DBMS)
Computer Software.
Version Enterprise Architect Redefines Modeling in 2006 An Agile and Scalable modeling solution Provides Full Lifecycle.
UNIT-V The MVC architecture and Struts Framework.
Application of PDM Technologies for Enterprise Integration 1 SS 14/15 By - Vathsala Arabaghatta Shivarudrappa.
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
The CCPN Project Tim Stevens and Wayne Boucher October 2005.
UML Tools ● UML is a language, not a tool ● UML tools make use of UML possible ● Choice of tools, for individual or group use, has a large affect on acceptance.
Summary What is CCPN? What approach are we taking and why? What are (some of) the technical details? Software team –Cambridge (Rasmus Fogh, Tim Stevens)
AS Computing Software definitions.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
The Old World Meets the New: Utilizing Java Technology to Revitalize and Enhance NASA Scientific Legacy Code Michael D. Elder Furman University Hayden.
Stimulsoft Reports.Net 20 Problems which Stimulsoft Reports.Net solves
Peter J. Briggs, Liz Potterton *, Pryank Patel, Alun Ashton, Charles Ballard, Martyn Winn CLRC Daresbury Laboratory, Warrington, Cheshire WA4 4AD, UK *
Calculation BIM Curriculum 07. Topics  Calculation with BIM  List Types  Output.
Arc Hydrology Data Model An Overview of the Modeling Process Kim Davis and Tim Whiteaker Center for Research in Water Resources University of Texas at.
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
Introduction to MDA (Model Driven Architecture) CYT.
Authors Project Database Handler The project database handler dbCCP4i is a small server program that handles interactions between the job database and.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.

11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The european ITM Task Force data structure F. Imbeaux.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
GΦLGΦL Leiden, May 2008 Status report Data model for goniometry and collision maps CCPN-generated Java API (file-based backend)‏ Used by:  STAC: to persist.
Project Database Handler The Project Database Handler dbCCP4i is a brokering application that mediates interactions between the project database and an.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
SimDB Implementation & Browser IVOA InterOp 2008 Meeting, Theory Session 1. Baltimore, 26/10/2008 Laurent Bourgès This work makes use of EURO-VO software,
 Programming - the process of creating computer programs.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Faculty Advisor – Dr. Suraj Kothari Client – Jon Mathews Team Members – Chaz Beck Marcus Rosenow Shaun Brockhoff Jason Lackore.
GEM METADATA DEVELOPMENT Xiaoping Wang, Macrosearch Allen Macklin, PMEL and Bernard Megrey, AFSC.
Session 1 Module 1: Introduction to Data Integrity
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
THE EYESWEB PLATFORM - GDE The EyesWeb XMI multimodal platform GDE 5 March 2015.
TRIUMF HLA Development High Level Applications Perform tasks of accelerator and beam control at control- room level, directly interfacing with operators.
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
Singleton Academy, Pune. Course syllabus Singleton Academy Pune – Course Syllabus1.
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
Introduction. System Design Hardware/Software Platform Selection Software Architectures Database Design Human-Computer Interaction (HCI) Interface Object.
The CCPN Project Technical Introduction Expanded from presentation, Utrecht Feb
Workflow and Data Management for Nuclear Magnetic Resonance.
Computer System Structures
Building Enterprise Applications Using Visual Studio®
SOFTWARE DESIGN AND ARCHITECTURE
Introduction Python is an interpreted, object-oriented and high-level programming language, which is different from a compiled one like C/C++/Java. Its.
Spark Presentation.
POOL persistency framework for LHC
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Re3gistry software and the INSPIRE Registry
Analysis models and design models
MySQL Migration Toolkit
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Use Cases Simple Machine Translation (using Rainbow)
SDMX IT Tools SDMX Registry
Presentation transcript:

Memops Data modelling and automatic code generation Edinburgh 9 September 2008

Memops - main points ■Code generation framework ■Data access subroutine libraries ■Fully automatic code generation from model ■Several programming languages in parallel ■Precise, detailed, validated data

Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

The CCPN Project ■Collaborative Computing Project for NMR ■Since 1999 ■Unifying platform for NMR software similar to CCP4 for X-ray crystallography ■Community-based, open-source, software development ■Code generation, data model, applications, meetings

NMR Structural Biology Pipeline Sample Preparation NMR Machine Structure Calculation Data Processing Spectrum Analysis Repository Database Slow, complex, interactive

Native Anarchy Convert Task1 Task2 Convert Task2 Task1 Convert Task3 Convert Task3 Convert Task3

With Data Standard Data Standard Convert Task1 Convert Task2 Task1 Convert Task1 Convert Task3 Convert Task3 Convert Task3

Data standard - objectives ●Lossless data transfer between programs - different approaches and architectures ●All data needed for pipeline software ■Creating data, not analysing end results ■Intermediate results needed ■Comprehensive, detailed, complex ●Completeness, integrity of changing data ●Precisely defined standard ■A single central description ■Validation directly against standard

■Standard API, no stable format ●easier to maintain as model changes ■Abstract data model ●Exact correspondence to APIs ■API implementations for several languages ■Transparent access to XML or DB storage ■Complete validation of model rules and constraints CCPN approach

Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

■Model will change over time ●Several parallel implementations ●Synchronisation between APIs and model ●Maintenance and debugging ●Resources are limited ■Automatic Code Generation ●Write and debug once and for all ●Any domain, from Astrophysics to Zoology ●Quick and simple to extend model ■E.g. Application-specific packages Automatic Code generation

Code Generation Framework Domain Experts MEMOPS framework Software Developers User Documentation Application Deposition APIs Python Java C Storage SQL XML Handcoded(< 1%)‏ UML Model Package 1 Package 2 Package 3 Autogeneration Wrappers

Code Generation ObjectDomain UML data edit UML MetaModel In-Memory Model Python objects On-disk model XML file API code Schemas Mappings etc. Autogeneration CCPN code Off-the-shelf files CCPN generated Legend: Export

API generator ModelTraverseTextWriter ApiGenPyLanguage PyFileApiGen FileApiGenPyApiGenPyType Written in Python Modular Different generators share code

Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

Model features ■Packages to subdivide model, code, and data files ■Objects. Unique context, compare-by-identity ■Complex data types. Different contexts, compare-by-value ■Simple data types, PositiveInt, enumerations, … ■Attributes and links: ●Cardinality, frozen/modifiable, derived ●Unique/ordered collections (sets, lists, unique lists) ■Ad-hoc constraints on attributes, simple and complex datatypes, and objects.

Molstructure model package

CCPN APIs ■ Application Programming Interface ●Object oriented ●Data accessed in memory as if stored in the data model ■Implementations come with: ●Integrated, transparent I/O (file or database)‏ ●Complete validity checking ●Protection against casual change (data encapsulation) ●Versioning and backwards compatibility ●Event notifier system ●Slot for application-specific data

Science code User Interface Utility functions Python+XML at runtime Python API XML I/O code XML I/O mappings Data Storage XML files User application Data get, set. Validity check Generic XML read/write User data in CCPN XML format What to do for which element CCPN code Off-the-shelf Application code files CCPN generated Legend: XML parser

Java+DB at runtime CCPN code Off-the-shelf Application code files CCPN generated Legend: HQL Science code User Interface Utility functions Java API Hibernate Hibernate mappings Database Presentation layer Database Schema Hibernate Optional Custom queries (Hibernate Query Language)

Now Available ■Version 2.0 just released ■Python+XML, Java+XML, C+XML Java+DB (with Hibernate) ■Available under GPL license from Sourceforge or ■CCPN Data Standard: ●NMR, Macromolecules, LIMS ●46 packages ●552 classes and data types ●Python+XML implementation 800,000+ lines of code

Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops

CcpNmr Suite ■Analysis ●Interactive NMR analysis ■FormatConverter ●Convert between 30+ NMR and structure formats ■Built on top of CCPN model (Python+XML) ■Version 2.0 released ■Widely used in macromlecular NMR

CcpNmr Analysis

ExtendNMR NMR pipeline ■Integrated macromolecular NMR pipeline - from sample to structure ■Pre-existing programs from 8 groups ■In-memory conversion to internal data structures ■Integrated versions released: ●ARIA (NMR structure generation) ●Bruker TOPSPIN, Manufacturers processing/analysis package

BIOXDM ■Software pipeline for on-synchrotron crystallography ●Exploit new technology (  goniometers) ●Experiment optimisation, acquisition, and on-line processing ■Independent data model, with Memops machinery ■Java+DB implementation for runtime concurrent access

EUROCarbDB ■Distributed deposition database ●Glycobiology and glycomics ●NMR, MS, HPLC and topology ■Java. Database storage using Hibernate ■CCPN model Java+DB implementation slot in as-is

Funding acknowledgements ■BBSRC CCPN grants ■European Union grants ●EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and TEMBLOR contracts ■Industry support ●AstraZeneca, Dupont Pharma (now BMS), Genentech, GlaxoSmithKline ●Peter Keller (BIOXDM) thanks Synchrotron ‘Soleil’, the Global Phasing Consortium and EU FP6 ‘BIOXHIT’

People ■Authors: Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Stevens, John Ionides, Wim Vranken (EBI), Peter Keller (Global Phasing) ■Collaborators at U. Cambridge: Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima Lamazhapova ■Collaborators at EBI (MSD), Hinxton: Kim Henrick, Anne Pajon, Chris Penkett ■Special thanks to: Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT

END

Overview ●Packages ●The Implementation package ■Objects ■DataTypes and DataObjTypes ●Access control

ARIA – structure generation from NMR data Custom conversion ARIA Data Model CCPN Data Model CCPN XML Application ARIA XML ■ARIA imports ●Peak Lists ●Constraints ●Sequences ●Chemical shifts ■ARIA exports ●Peak Assignments ●Filtered Constraints ●Violations ●Structures

API functions ■‘get’ and ‘set’ (Attributes and links)‏ ■‘add’ and ‘remove’ (Collection attributes and links)‏ ■‘sorted’ (Unordered collection links)‏ ■‘findFirst’ and ‘findAll’ (Collection links)‏ ●Simple filtering (attribute == value)‏ ■create and ‘new’ (Objects)‏ ●Normal and ‘factory function’ object creation ■delete (Objects)‏ ●‘Delete’ function – cascades to objects rendered invalid by deletion ■checkValid, checkAllValid (Objects)‏ ■API classes are strongly coupled. For efficiency reasons object-to-object links are two-way.

FormatConverter - The NMR Translator CCPN Data Model PeaksChemical shifts Acquisition parameters XEasyNmrViewXEasyNmrViewBrukerVarian... Generic peak converter Generic chemical shift converter Generic acquisition parameters converter Processing parameters XEasy NmrViewNMRPipeAzara... NmrView Format specific readers Data model entry Format specific writers Chemical shiftsPeaks

ExtendNMR: ARIA ■Structure generation from macromolecular NMR data, ambiguous distance constraints ■One of two leading programs ■Python and scripts, with CNS dynamics engine ■All input and output integrated to CCPN standard

ARIA: CCPN object selection

ExtendNMR: Bruker TOPSPIN ■NMR processing program of major NMR instrument company ■Java. In-memory conversion to CCPN Java+XML implementation ■CCPN output in current TOPSPIN release, Expanded in upcoming release.

Data Model v. Data Format Atom_IDelementNameBond_IDAtom_IDBond_IDbondOrder Relational Database : Abstract model (UML) : XML :. AtomBondAtom_Bond_Connect Atom +elementName: String = C Bond +bondOrder: Float = 1.0 * 2+bonds +atoms

Packages

■Partition model, code, and data ■Import each other ■Can be omitted ■All import Implementation and AccessControl ■Each have a TopObject ■No links between data from rival Topbjects (different extents of data)‏

Root and TopObjects

TopObjects ■One in every package ●Ultimate parent to all objects in package ■Have globally unique identifier (‘guid’)‏ ■currentXyz links from root ■Links can constrain links between descendants ■In file implementations: ●Hold links to storage and backup locations ●Live in Implementation as almost empty shell

Overview ●Packages ●The Implementation package ■Objects ■DataTypes and DataObjTypes ●Access control

CcpNmr Analysis ■NMR Assignment Program ●Inspired by ANSIG and Sparky ●Demonstrates CCPN approach ●Modern interface and scripting ●Scalable and extensible ■Operating Systems ●Linux, Sun, SGI, OSX, Windows ■Languages ●Python ■Data model interaction ■Tk Graphical interface ■Scripting ●C ■OpenGL/Tk contours ■Structure display ■Mathematical operations

Implementation Package ■Model and Code: ●Supertypes that define all objects ■Objects ■DataTypes ■DataObjTyps ●Basic data types ■Data – how to access the real data: ●Data location pointers ●Current-package pointers ●Implementation data are not part of the data set, and are not in the database. ●Represent view or session?

Data Location

Objects and their Supertypes

Simple Data Types Boolean DataType Int DataType Float DataType String DataType Line DataType Text DataType Long DataType Double DataType Word DataType PositiveInt DataType SingleLine DataType NonNegativeInt DataType Dict DataType DateTime DataType StringKeyDict DataType Any DataType Token DataType NonNegativeFloat DataType FloatRatio DataType PositiveFloat DataType SpacelessString DataType LongWord DataType PositiveDouble DataType NonNegativeDouble DataType UrlProtocol DataType

Complex Data Types