Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflow and Data Management for Nuclear Magnetic Resonance.

Similar presentations


Presentation on theme: "Workflow and Data Management for Nuclear Magnetic Resonance."— Presentation transcript:

1 Workflow and Data Management for Nuclear Magnetic Resonance

2 ● Introduction ● CCPN and WeNMR ● Data and workflow of macromolecular NMR ● WMS Workflow Management System ● Goals ● Organization ● Example ● Plans ● Credits

3 CCPN ● Collaborative Computing Project for NMR (Nuclear Magnetic Resonance) ● Funded by BBSRC since 1999 ●Goals: ● Unifying platform for NMR software ● Community-based, open-source, software development ● Meetings and courses Member of WeNMR project Member of WeNMR project

4 CCPN Results ●Software development ●CcpNmr suite of NMR applications ●Integrating external software ●CCPN Data standard for NMR and structural biology ●Abstract data model ●Data access subroutine libraries ●Multiple programming languages ●Memops: Data modelling and code generation tools

5 WeNMR

6 WeNMR goals ●Science gateway for NMR and SAXS communities ●Virtual research platform for data storage and exchange ●Operate and expand eNMR grid infrastructure ●Support users and developers ●Extend integration with related disciplines and Grid initiatives. ●WeNMR maintains and operates web portals allowing Grid submission for over 25 NMR and structure calculation programs.

7 ● Introduction ● CCPN and WeNMR ● Data and workflow of macromolecular NMR ● WMS Workflow Management System ● Goals ● Organization ● Example ● Plans ● Credits

8 Macromolecular NMR pipelineAnalysisAssignment Structure generation Validation NMR processing ●Macromolecular structures and dynamics ●Underlying information heterogenous and extremely complex ●Workflow often branched or recursive ●Multiple, incompatible data formats ●Multiple, complex data transformations

9 Peculiarities of NMR field ●Data in electronic form from the beginning ●No direct mathematical relationship between results and original data ●Peak-atom mapping (‘assignment’) is ‘puzzle solving’ ●Redone for each sample group ●Not fully automatic ●Semi-ambiguous ●Limited resources ●Programs often done by single person, ● who has since left or become professor

10 Task3 Convert Task1 Task2 Convert Task2 Task1 Convert Task3 Convert Task3 Convert Programs: Native Disorganisation

11 Integration with Data Standard Data Standard Task1 Convert Task2 Task1 Convert Task1 Convert Task3 Convert Task3

12 CCPN Data Standard ●Precisely defined ●A single central description ●Validation directly against standard ●Comprehensive – cover everything, including intermediate results ●Ensure consistency and validity for changing data ●Support different implementations in parallel ●Easy to maintain and modify

13 Pipeline and CCPN data model CCPN data model CcpNmrFormatConverter Reference data External formats Deposition in Protein Data Bank (PDB) and BioMagResBank using CCPN XML files AnalysisAssignment Structure generation Validation NMR processing CcpNmrECI

14 ● Introduction ● CCPN and WeNMR ● Data and workflow of macromolecular NMR ● WMS Workflow Management System ● Goals ● Organization ● Example ● Plans ● Credits

15 Workflow Management Goals ●Standardized interface to WeNMR portals ●Application-independent data selection ●Standard submission and result gathering ●Submit to multiple programs ●Seamless, invisible format conversion ●Start and end on precisely defined CCPN data ●Combine jobs into workflows ●Easy use for non-specialists

16 Data Management Goals ●Central data store, with access control ●Track jobs and data flow ●NMR analysis is rarely linear ●Alternative jobs from single starting point ●Run – modify – re-run ●Identified as desirable also for non-Grid data

17 ● O7.1: Design and implement a grid-based multidisciplinary approach for the characterization of biomolecular interactions, based on the joint use of NMR, SAXS, bioinformatics and biophysical tools. ● O7.2: Establish a SAXS Grid-enabled infrastructure providing secure remote access to SAXS instrumentation ● O7.3: Develop an end-user local platform making use of portals and web services. ● O7.4: Establish an infrastructure and tools for data- and structure validation. ● O7.5: Provide web services and/or simple direct upload mechanisms for the web portal applications. ● O7.6: Implement a WeNMR end-user virtual machine. WP7: Research Platform

18 ● Introduction ● CCPN and WeNMR ● Data and workflow of macromolecular NMR ● WMS Workflow Management System ● Goals ● Organization ● Example ● Plans ● Credits

19 WP7 – End User Local Platform ●WMS is a web-based end user platform for accessing web-based services and executing workflows ●Development of the Extend-NMR project ●Funded as part of WeNMR ●Accesses services though adaptor modules ●Allows direct access from CcpNmr Analysis

20 WMS – Architecture Client GWT Web Bioinformatics Web Services Taverna Remote Execution Server Analysis Python Desktop Java web service wrapper Python i/o and CGI code CS-ROSETTA Java CGI code ARIA, CING WeNMR Web Portals and Services CS-ROSETTA, ARIA, CING Server Java / Hibernate Database Postgres Plan to use TAVERNA for the actual workflow management

21 WMS – Adaptor service Adaptor Servlet I/O Module CCPN in CCPN out Misc format Execution Module(s) Web Local GRID Misc format nmrCalcId Execution Module(s) Web Local GRID ● Format conversion. Access existing web portals using CGI approaches ● Exposed as wsdl-defined web services for consumption by TAVERNA etc.

22 Data handling ●Data stored as tarred, zipped CCPN data sets ●Repository-type storage planned when CCPN data set become ‘diff-able’. ●Workflow tracks starting data, end data, job ●Run data and parameters stored within CCPN data set in ‘Calculation’ package. ●Run input and output transferred as CCPN data set plus calculation ID

23 Protocol and interface specification ●Data selection driven from protocol specification ●Parameters: names, types and default values ●Types of data to select ●Specific widget for each data type (structures, peak lists, …) ●New protocols can be specified by users, with JSON file or protocol editor (forthcoming). ●Specific widget for each data type (structures, peak lists, …) ●Layout specification as part of protocol specification

24 Data conversion ●Takes place in adapter ●Decoupled from server ●Python, working on CCPN data set ●Data export ●Data selection from Calculation package ●To program-specific files ●Result import ●Re-integrated in input Calculation package ●Starting data known ●Mapping information kept as needed

25 ● Introduction ● CCPN and WeNMR ● Data and workflow of macromolecular NMR ● WMS Workflow Management System ● Goals ● Organization ● Example ● Plans ● Credits

26 WMS – Home page

27 WMS – Running a task

28 WMS – Workflows

29 ● Introduction ● CCPN and WeNMR ● Data and workflow of macromolecular NMR ● WMS Workflow Management System ● Goals ● Organization ● Example ● Plans ● Credits

30 Status and plans ●Current: ●System working at alpha test level ●ARIA, CING, CS-Rosetta integrated ●Short term: ●Integrate UNIO, CYANA, Autostructure ●Parallel structure determination ■ARIA, UNIO., CYANA, Autostructure, from single input selection ■Results captured together; CcpNmr Analysis to analyze. ●Longer term: ●Improve user interface and robustness ●Integrate more programs ●Replace CGI wrappers with WSDL services on the portals

31 ● Introduction ● CCPN and WeNMR ● Data and workflow of macromolecular NMR ● WMS Workflow Management System ● Goals ● Organization ● Example ● Plans ● Credits

32 CCPN People ■Cambridge (Biochemistry)‏ ●Ernest Laue ●Wayne Boucher ●Rasmus Fogh ●John Ionides ●Tim Stevens ●Alan Sousa da Silva ■EBI (PDBe), Hinxton ●Kim Henrick ●Wim Vranken ■SpronkNMR ●Chris Spronk

33 Funding ■BBSRC ■Industry ●AstraZeneca, Dupont Pharma (now BMS), Genentech, GlaxoSmithKline, Vernalis, Syngenta ■European Community ●WeNMR, EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and TEMBLOR contracts

34 END END


Download ppt "Workflow and Data Management for Nuclear Magnetic Resonance."

Similar presentations


Ads by Google