Download presentation
Presentation is loading. Please wait.
Published byAnnis McGee Modified over 9 years ago
1
Memops Data modelling and automatic code generation Edinburgh 9 September 2008
2
Memops - main points ■Code generation framework ■Data access subroutine libraries ■Fully automatic code generation from model ■Several programming languages in parallel ■Precise, detailed, validated data
3
Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops
4
The CCPN Project ■Collaborative Computing Project for NMR ■Since 1999 ■Unifying platform for NMR software similar to CCP4 for X-ray crystallography ■Community-based, open-source, software development ■Code generation, data model, applications, meetings
5
NMR Structural Biology Pipeline Sample Preparation NMR Machine Structure Calculation Data Processing Spectrum Analysis Repository Database Slow, complex, interactive
6
Native Anarchy Convert Task1 Task2 Convert Task2 Task1 Convert Task3 Convert Task3 Convert Task3
7
With Data Standard Data Standard Convert Task1 Convert Task2 Task1 Convert Task1 Convert Task3 Convert Task3 Convert Task3
8
Data standard - objectives ●Lossless data transfer between programs - different approaches and architectures ●All data needed for pipeline software ■Creating data, not analysing end results ■Intermediate results needed ■Comprehensive, detailed, complex ●Completeness, integrity of changing data ●Precisely defined standard ■A single central description ■Validation directly against standard
9
■Standard API, no stable format ●easier to maintain as model changes ■Abstract data model ●Exact correspondence to APIs ■API implementations for several languages ■Transparent access to XML or DB storage ■Complete validation of model rules and constraints CCPN approach
10
Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops
11
■Model will change over time ●Several parallel implementations ●Synchronisation between APIs and model ●Maintenance and debugging ●Resources are limited ■Automatic Code Generation ●Write and debug once and for all ●Any domain, from Astrophysics to Zoology ●Quick and simple to extend model ■E.g. Application-specific packages Automatic Code generation
12
Code Generation Framework Domain Experts MEMOPS framework Software Developers User Documentation Application Deposition APIs Python Java C Storage SQL XML Handcoded(< 1%) UML Model Package 1 Package 2 Package 3 Autogeneration Wrappers
13
Code Generation ObjectDomain UML data edit UML MetaModel In-Memory Model Python objects On-disk model XML file API code Schemas Mappings etc. Autogeneration CCPN code Off-the-shelf files CCPN generated Legend: Export
14
API generator ModelTraverseTextWriter ApiGenPyLanguage PyFileApiGen FileApiGenPyApiGenPyType Written in Python Modular Different generators share code
15
Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops
16
Model features ■Packages to subdivide model, code, and data files ■Objects. Unique context, compare-by-identity ■Complex data types. Different contexts, compare-by-value ■Simple data types, PositiveInt, enumerations, … ■Attributes and links: ●Cardinality, frozen/modifiable, derived ●Unique/ordered collections (sets, lists, unique lists) ■Ad-hoc constraints on attributes, simple and complex datatypes, and objects.
17
Molstructure model package
18
CCPN APIs ■ Application Programming Interface ●Object oriented ●Data accessed in memory as if stored in the data model ■Implementations come with: ●Integrated, transparent I/O (file or database) ●Complete validity checking ●Protection against casual change (data encapsulation) ●Versioning and backwards compatibility ●Event notifier system ●Slot for application-specific data
19
Science code User Interface Utility functions Python+XML at runtime Python API XML I/O code XML I/O mappings Data Storage XML files User application Data get, set. Validity check Generic XML read/write User data in CCPN XML format What to do for which element CCPN code Off-the-shelf Application code files CCPN generated Legend: XML parser
20
Java+DB at runtime CCPN code Off-the-shelf Application code files CCPN generated Legend: HQL Science code User Interface Utility functions Java API Hibernate Hibernate mappings Database Presentation layer Database Schema Hibernate Optional Custom queries (Hibernate Query Language)
21
Now Available ■Version 2.0 just released ■Python+XML, Java+XML, C+XML Java+DB (with Hibernate) ■Available under GPL license from Sourceforge or www.ccpn.ac.uk ■CCPN Data Standard: ●NMR, Macromolecules, LIMS ●46 packages ●552 classes and data types ●Python+XML implementation 800,000+ lines of code
22
Memops ●Introduction ●Code generation ●Generated libraries ●Applications of Memops
23
CcpNmr Suite ■Analysis ●Interactive NMR analysis ■FormatConverter ●Convert between 30+ NMR and structure formats ■Built on top of CCPN model (Python+XML) ■Version 2.0 released ■Widely used in macromlecular NMR
24
CcpNmr Analysis
25
ExtendNMR NMR pipeline ■Integrated macromolecular NMR pipeline - from sample to structure ■Pre-existing programs from 8 groups ■In-memory conversion to internal data structures ■Integrated versions released: ●ARIA (NMR structure generation) ●Bruker TOPSPIN, Manufacturers processing/analysis package
26
BIOXDM ■Software pipeline for on-synchrotron crystallography ●Exploit new technology ( goniometers) ●Experiment optimisation, acquisition, and on-line processing ■Independent data model, with Memops machinery ■Java+DB implementation for runtime concurrent access
27
EUROCarbDB ■Distributed deposition database ●Glycobiology and glycomics ●NMR, MS, HPLC and topology ■Java. Database storage using Hibernate ■CCPN model Java+DB implementation slot in as-is
28
Funding acknowledgements ■BBSRC CCPN grants ■European Union grants ●EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and TEMBLOR contracts ■Industry support ●AstraZeneca, Dupont Pharma (now BMS), Genentech, GlaxoSmithKline ●Peter Keller (BIOXDM) thanks Synchrotron ‘Soleil’, the Global Phasing Consortium and EU FP6 ‘BIOXHIT’
29
People ■Authors: Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Stevens, John Ionides, Wim Vranken (EBI), Peter Keller (Global Phasing) ■Collaborators at U. Cambridge: Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima Lamazhapova ■Collaborators at EBI (MSD), Hinxton: Kim Henrick, Anne Pajon, Chris Penkett ■Special thanks to: Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004- 01195
30
END
31
Overview ●Packages ●The Implementation package ■Objects ■DataTypes and DataObjTypes ●Access control
32
ARIA – structure generation from NMR data Custom conversion ARIA Data Model CCPN Data Model CCPN XML Application ARIA XML ■ARIA imports ●Peak Lists ●Constraints ●Sequences ●Chemical shifts ■ARIA exports ●Peak Assignments ●Filtered Constraints ●Violations ●Structures
33
API functions ■‘get’ and ‘set’ (Attributes and links) ■‘add’ and ‘remove’ (Collection attributes and links) ■‘sorted’ (Unordered collection links) ■‘findFirst’ and ‘findAll’ (Collection links) ●Simple filtering (attribute == value) ■create and ‘new’ (Objects) ●Normal and ‘factory function’ object creation ■delete (Objects) ●‘Delete’ function – cascades to objects rendered invalid by deletion ■checkValid, checkAllValid (Objects) ■API classes are strongly coupled. For efficiency reasons object-to-object links are two-way.
34
FormatConverter - The NMR Translator CCPN Data Model PeaksChemical shifts Acquisition parameters XEasyNmrViewXEasyNmrViewBrukerVarian... Generic peak converter Generic chemical shift converter Generic acquisition parameters converter Processing parameters XEasy NmrViewNMRPipeAzara... NmrView Format specific readers Data model entry Format specific writers Chemical shiftsPeaks
35
ExtendNMR: ARIA ■Structure generation from macromolecular NMR data, ambiguous distance constraints ■One of two leading programs ■Python and scripts, with CNS dynamics engine ■All input and output integrated to CCPN standard
36
ARIA: CCPN object selection
37
ExtendNMR: Bruker TOPSPIN ■NMR processing program of major NMR instrument company ■Java. In-memory conversion to CCPN Java+XML implementation ■CCPN output in current TOPSPIN release, Expanded in upcoming release.
38
Data Model v. Data Format Atom_IDelementNameBond_IDAtom_IDBond_IDbondOrder Relational Database : Abstract model (UML) : XML :. AtomBondAtom_Bond_Connect Atom +elementName: String = C Bond +bondOrder: Float = 1.0 * 2+bonds +atoms
39
Packages
40
■Partition model, code, and data ■Import each other ■Can be omitted ■All import Implementation and AccessControl ■Each have a TopObject ■No links between data from rival Topbjects (different extents of data)
41
Root and TopObjects
42
TopObjects ■One in every package ●Ultimate parent to all objects in package ■Have globally unique identifier (‘guid’) ■currentXyz links from root ■Links can constrain links between descendants ■In file implementations: ●Hold links to storage and backup locations ●Live in Implementation as almost empty shell
43
Overview ●Packages ●The Implementation package ■Objects ■DataTypes and DataObjTypes ●Access control
44
CcpNmr Analysis ■NMR Assignment Program ●Inspired by ANSIG and Sparky ●Demonstrates CCPN approach ●Modern interface and scripting ●Scalable and extensible ■Operating Systems ●Linux, Sun, SGI, OSX, Windows ■Languages ●Python ■Data model interaction ■Tk Graphical interface ■Scripting ●C ■OpenGL/Tk contours ■Structure display ■Mathematical operations
45
Implementation Package ■Model and Code: ●Supertypes that define all objects ■Objects ■DataTypes ■DataObjTyps ●Basic data types ■Data – how to access the real data: ●Data location pointers ●Current-package pointers ●Implementation data are not part of the data set, and are not in the database. ●Represent view or session?
46
Data Location
47
Objects and their Supertypes
48
Simple Data Types Boolean DataType Int DataType Float DataType String DataType Line DataType Text DataType Long DataType Double DataType Word DataType PositiveInt DataType SingleLine DataType NonNegativeInt DataType Dict DataType DateTime DataType StringKeyDict DataType Any DataType Token DataType NonNegativeFloat DataType FloatRatio DataType PositiveFloat DataType SpacelessString DataType LongWord DataType PositiveDouble DataType NonNegativeDouble DataType UrlProtocol DataType
49
Complex Data Types
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.