Capturing provenance data Dr Alison McKay (in place of Dr Richard Bagshaw) University of Leeds, School of Mechanical Engineering
Distributed Aircraft Maintenance Environment - DAME Purpose of presentation to present the DAME provenance research to discuss the experiences of deploying this technology in a Grid based systems
Distributed Aircraft Maintenance Environment - DAME Outline of presentation What do we mean by “provenance data”? What are we aiming for? What does achieving this goal entail? What progress has been made to date? What remains to be done?
Distributed Aircraft Maintenance Environment - DAME Provenance Data –Recording the history of data and its place of origin
Distributed Aircraft Maintenance Environment - DAME Provenance Database Provenance ViewerWorkflow Advisor Workflow Script Workflow Definition (BPEL) Workflow Instance Service Instance Workflow Manager DAME Provenance Architecture
Distributed Aircraft Maintenance Environment - DAME Outline of presentation What do we mean by “provenance data”? What are we aiming for? What does achieving this goal entail? What progress has been made to date? What remains to be done?
Distributed Aircraft Maintenance Environment - DAME RR Integrated Product Development process Stage 1 New Project Planning Business Concept Definition Identify the Need Preliminary Concept Definition Stage 2 Full Concept Definition Stage 3 Propulsion System Realisation Stage 4 In-Service Monitoring & Technical Support Capability Acquisition Engine Launch Entry into Service
Distributed Aircraft Maintenance Environment - DAME Provenance Requirement Legal Implications Audit Trail Contractual Obligations Troubleshootin g Re-run diagnosis DAME provenance data users
Distributed Aircraft Maintenance Environment - DAME failure mode curves Position and shape depend on -engine type (from PDM/SDM) - engine state (eg, age) - events (eg, from QUOTE data) this line shows when failure occurs – its position and shape depends upon its operating environment position of an engine, ie, its current state of health Time T extra T Potential benefits
Distributed Aircraft Maintenance Environment - DAME Specific tasks to be supported Create an audit trail (Who, What, Where, Why, When, Which, hoW) Re-execute a workflow process –repeat a workflow process (same Grid resources & services, sequence and data) –rerun a workflow process (same Grid resources & services and sequence on different data)
Distributed Aircraft Maintenance Environment - DAME Outline of presentation What are we aiming for? What does achieving this goal entail? What progress has been made to date? What remains to be done?
Distributed Aircraft Maintenance Environment - DAME Initial requirements Support the re-execution of workflows with new data * Provide provenance data for the Workflow Advisor Provide a viewer to captured provenance data * As opposed to repeating a given workflow using the same data and resources
Distributed Aircraft Maintenance Environment - DAME DS&S perspective on requirements Origin of data fully traceable –(Including time and date stamps) Processed data traceable through application software Any human interaction/annotations must be captured
Distributed Aircraft Maintenance Environment - DAME Research issues Specify Define Execute / deploy Product Process Product Data Managemen t system Service Data Manager Workflow process definition Workflow execution data
Distributed Aircraft Maintenance Environment - DAME Process definition (as defined) process definition process relationship composition relationship connection relationship process element process element relationship (1) [GRID] resource GRID resource usage start end date_and_ time name description id resource callee caller why_used outcome executed_by description id relatedrelating * of
Distributed Aircraft Maintenance Environment - DAME CaseWorkflowResource Case_id User_id Open_date Close_date Flight_start_date Deadline_date Tail_number Airline Airport Stand Quote_diagnosis Quote_status Engineer Engineer_active Engineer_why Analyst Analyst_active Analyst_why Expert Expert_active Expert_why Workflow_sequence_number Workflow_id Workflow_author_id Workflow_name Workflow_description Workflow_start_date Workflow_end_date Workflow_ip_data_type Workflow_op_data_type Workflow_diagnosis Workflow_status Resource_sequence_number Resource_id Resource_name Resource_type Resource_description Resource_start_time Resource_end_time Resource_location Resource_configuration Resource_version_number Resource_status Resource_req_no_of_processors Resource_req_memory Resource_req_operating_system Resource_req_op_sys_ver_num ber Process definition (as executed)
Distributed Aircraft Maintenance Environment - DAME MyGrid Workflow Provenance Workflow instance capture –Workflow overview Workflow ID, Status, Start Time, End Time, O/All input and outputs, Service List. –Service Invocations Status, Start Time, End Time, WSDLURI, DataSets x 2. –Inputs and Outputs ID, Name, Type, Value
Distributed Aircraft Maintenance Environment - DAME Outline of presentation What do we mean by “provenance data”? What are we aiming for? What does achieving this goal entail? What progress has been made to date? What remains to be done?
Distributed Aircraft Maintenance Environment - DAME Legend Interface (transfer) resource Data storage resource Transient data resource Compute resource Application resource Interface (search) resource User executed process step XTO Control Files XTO MySQL-SDM2 XTO SDM CR1 Look at SDM to select an engine Get XTO control files for selected engine Run XTO for selected engine Data interface GRID resource
Distributed Aircraft Maintenance Environment - DAME BOM data viewer Product data database Software (Java) Software (Java) Software (Microsoft.Net) Web service: Database Graphical user interface Web service: Structure constructor
Distributed Aircraft Maintenance Environment - DAME Outline of presentation What do we mean by “provenance data”? What are we aiming for? What does achieving this goal entail? What progress has been made to date? What remains to be done?
Distributed Aircraft Maintenance Environment - DAME Remaining tasks Support the re-execution of workflows with new data Provide provenance data for the Workflow Advisor Provide a viewer for captured provenance data Provide audit trail for accountability purposes
Distributed Aircraft Maintenance Environment - DAME Provenance research issues Provenance requirements and scope Provenance data security Data storage format Centralised provenance data Stop points for audit trails Repeatability of GRID resources
Distributed Aircraft Maintenance Environment - DAME Longer term research Specify Define Execute / deploy Product Process Product Data Managemen t system Service Data Manager Workflow process definition Workflow execution data Requiremen ts definition Workflow process specificatio n