Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit Borude Indiana University
Roles Using role annotations in OPM is not well defined… – Named relationships are used as first class objects as defined in the RDF model – Affect the way inferences are made – Semantically meaningful or not?
Bake Cake Eggs(3) John Flour Used(eggs) Eggs(1) wasGeneratedBy(unused) wasGeneratedBy(cake) Used(flour) Bake Cake Eggs(A, B, C) John Flour Used(eggs) Eggs(A) wasGeneratedBy(unused) wasGeneratedBy(cake) Used(flour)
Accounts Composite processes identified in OPM – Different granularity? – Different view (client vs service) – service/workflow composition using alternate accounts? – Should we specify composition more explicitly in edges as edge types? Subclasses? Customer A Baking Baker Baking [] [] [] Customer B Observer Observers
Data Collections does not seem to support the idea of granularity for data products Alternate accounts more suited for process granularity, less for data granularity – process types for data de/compositions? Subclasses?
Annotations Causality is not the only relationship between provenance entities – Relevant domain-specific relationships that are needed to answer a scientists query. Subclasses stronger form of annotations – Different? – Subclasses part of model – Annotations dependent on representation? Extensibility mechanisms?
Representation/Serialization OPM maps exactly to the W3C recommended standard to represent metadata Resource Description Framework (RDF) – OPM graph is differently named RDF graph XML, RDF, CSV…
Time OPM approach to incorporating temporal parameter in provenance using time interval to represent instantaneous is not well defined – based on granularity of values the query result will vary – Accuracy of timestamps affects inference – Logical timestamps? Do we need time range? – Long running process (provenance is past, notcurrent)…
Agent Loose form of control flow? – Workflow engine? – Commandline invoking workflow engine? – Researcher who starts commandline? – Previous component that triggers next component? – Where do we have TriggeredBy and where do we have ControlledBy?
Service Output data Input data WF Engine Service Output data Input data Client ? WF Engine? WF document Service Output data Input data WF document User ? ? Client ? WF Engine ? ? ?
Vagueness in Inferences Edge count limits? Weak and strong semantics P1 used A1 – P1 MUST have used A1 – P1 MAY have used A1 P1 used A1; A2 wasGenerated by P1 – A2 MUST have been derived from A1 – A2 MAY have been derived from A1 Weak is lowest common denominator – mayHaveBeenUsed <= mustHaveBeenUsed…subclass?