Research Traceability using Provenance Services for Biomedical Analysis Dr Peter Bloodsworth CCCS Research Centre UWE, Bristol, UK HealthGrid Presentation: 29 th of June 2010
Talk Structure The neuGRID Project. Requirements from Users. The Bigger Picture. A Provenance Service. CRISTAL. Conclusion. HealthGrid Presentation: 29 th of June 2010
The neuGRID Consortium Vrije Universiteit Medical Centre, THE NETHERLANDS CF consulting s.r.l., ITALY Provincia Lombardo Veneta Fatebenefratelli, ITALY Karolinska institutet, SWEDEN University of the West of England, Bristol, UK Neuralyse Europe (Prodema Medical), SWITZERLAND Maat Gknowledge, SPAIN HealthGrid, FRANCE HealthGrid Presentation: 29 th of June 2010
To build a new user-friendly Grid-based research e-Infrastructure. Collection/archiving of large amounts of imaging data. Paired with computationally intensive data analyses. To enable EU neuroscientists to carry out cutting- edge research. Imaging of degenerative brain diseases. Project Objectives HealthGrid Presentation: 29 th of June 2010
neuGRID Provenance Requirements Provenance in neuGRID relates to: 1.Data provenance (source, quality control applied and other facets.) 2.Workflow provenance (author, versioning, certification, etc.) 3.Analysis Result provenance (data set, workflow chosen, settings, errors, etc.) HealthGrid Presentation: 29 th of June 2010
The Bigger Picture Real-world end users care about doing their research and getting their results. They don’t care about the grid / certificates or virtual organisations. They don’t want to learn grid-speak. They don’t all want to do the same things in the same way. They expect services that help them to do their work. They expect a high-level of integration between services and reliability. HealthGrid Presentation: 29 th of June 2010
The neuGRID Provenance Service HealthGrid Presentation: 29 th of June 2010
The Provenance Architecture Provenance API Translator CRISTAL Core Provenance DB HealthGrid Presentation: 29 th of June 2010
Service Wrapper Provides a web service-based interface to the Provenance Service Consists of methods for Creating workflows Creating workflow instances Storing workflow provenance Retrieving workflow provenance HealthGrid Presentation: 29 th of June 2010
Translator To prevent lock-in to a specific workflow format, the Provenance Service consists of an adaptor-based translator for converting user workflows into CRISTAL workflow format Acts as bridge between users and CRISTAL core CRISTAL Core Provenance management is handled internally by CRISTAL. Workflow needs to be translated between user format and CRISTAL format. HealthGrid Presentation: 29 th of June 2010
CRISTAL was designed to track the development of LHC detector components at CERN HealthGrid Presentation: 29 th of June 2010
CRISTAL in neuGRID Overview CRISCRISTAL TAL Researcher Input Data Derived Data Analysis Suite LORIS CRISTAL Process & Data Tracking Provenance Data Workflow steps Analysis data Histories A Complete Analysis Knowledge Base
CRISTAL Main Functions Complete capture of system functionality in workflows. As every action is represented by a workflow activity, every operation is recorded and stored in a replayable way. Every piece of data, including descriptions, is versioned, so all previous states of items are available. Several interfaces exist to bridge to other components for database storage, job distribution, definition management, etc.
Service Architecture
Further Developments Composite jobs. If some tasks are clustered together, they should be executed by CRISTAL as a composite activity. In composite jobs, each sub-job should send the feedback to CRISTAL as soon as it completes its execution. The Glueing Service should have user related information to map users to jobs and provenance data. The Querying Service should query both CRISTAL provenance and LORIS data The translation component in the pipeline service should map the user workflows to CRISTAL workflows. The translation should be two way. HealthGrid Presentation: 29 th of June 2010
Conclusions A robust provenance system is necessary if users are to have confidence in and use the neuGRID infrastructure for their research. Provenance is important throughout neuGRID, from data input through to analysis output. Errors that occur at any stage may effect the final results. It can be thought of as a chain of evidence and spans: Data provenance (source, quality control applied and other facets.) Workflow provenance (source, versioning, certification, etc.) Analysis Result provenance (data set, workflow chosen, settings, errors, etc.) We need CRISTAL which is a resource that is both powerful and flexible in the way that it captures provenance data. HealthGrid Presentation: 29 th of June 2010
Question Time None like this please!! HealthGrid Presentation: 29 th of June 2010
CRISTAL Enabled Provenance