Presentation is loading. Please wait.

Presentation is loading. Please wait.

Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.

Similar presentations


Presentation on theme: "Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau."— Presentation transcript:

1 Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau

2 Provenance In the study of fine art, provenance refers to the documented history of some art object. If the provenance of data produced by computer systems could be determined like it can for some works of art, then users would be able to interpret and judge the quality of data better.

3 The Provenance of the Challenge Back in May: IPAW’06 (International Provenance and Annotation Workshop) www.ipaw.info Proceedings to appear in LNCS 4145

4 Standardisation discussion at IPAW’06 How can (workflow-based or other) systems inter-operate? Individual systems may be able to track provenance of data How can we that we track provenance of data across systems? Would a standard be useful? At the time, it was felt it was premature to standardise, we needed to understand systems’ capabilities

5 The Challenge Aims The provenance challenge aims to establish an understanding of the capabilities of available provenance-related systems –The representations that systems use to document details of processes that have occurred –The capabilities of each system in answering provenance-related queries –What each system considers to be within scope of the topic of provenance (regardless of whether the system can yet achieve all problems in that scope) twiki.ipaw.info

6 The Challenge Process Each participant in the challenge will have their own page on this TWiki, following the ChallengeTemplate, where they can inform the rest of their efforts in meeting the challenge..ChallengeTemplate –Representations of the workflow in their system –Representations of provenance for the example workflow –Representations of the result of the core (and other) queries –Contributions to a matrix of queries vs systems, indicating for each that:matrix of queries vs systems (1) the query can be answered by the system, (2) the system cannot answer the query now but considers it relevant, (3) the query is not relevant to the project. Optionally, the participants may like to contribute the following. –Additional queries that illustrate the scope of their system –Extensions to the example workflow to best illustrate the unique aspects of their system –Any categorisation of queries that the project considers to have practical value twiki.ipaw.info

7

8 The Queries 1.Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. 2.Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean. 3.Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic. 4.Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.model menualign_warp 5.Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.scanheader 6.Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." 7.A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg 8.A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago. 9.A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

9 17 Participating Teams REDUX, Database Research Group, MSRREDUX MINDSWAP, Semantic Web Research Group, University of Maryland, College ParkMINDSWAP Karma, Computer Science Department, Indiana UniversityKarma CESNET, GRID research group, CESNET z.s.p.o. Prague, Czech RepublicCESNETCESNET z.s.p.o. myGrid, University of ManchestermyGrid VisTrails, University of UtahVisTrails Gridprovenance, Cardiff UniversityGridprovenance ES3, University of California, Santa BarbaraES3 UPenn, University of Pennsylvania, Database GroupUPenn RWS, UC Davis and SDSC, CaliforniaRWS DAKS, Genome Center, UC Davis, CaliforniaDAKS PASS, HarvardPASS SDG, Pacific Northwest National LabSDG NcsaD2k and NcsaCi, National Center for Supercomputing ApplicationsNcsaD2kNcsaCi UChicago, University of Chicago Computation InstituteUChicago Southampton, University of Southampton, PASOA and Provenance projectsSouthampton USC/ISI, University Of Southern California/Information Sciences InstituteUSC/ISI twiki.ipaw.info

10 Schedule Session 1: Wednesday 10.00-11.30 team presentations Session 2: Wednesday 13.00-15.00 team presentations Session 3: Wednesday 16.00-17.30 Session 4: Thursday 9.30-11.00 analysing commonalities and differences Session 5: Thursday 11.30-13.00 what next? sessions 3-5 are open, contribute ideas on twiki http://twiki.ipaw.info/bin/view/Challenge/WorkshopAgenda

11 10.00-10.10: Introduction 10.10-10.20: PNL 10.20-10.30: UPenn, University of Pennsylvania, Database GroupUPenn 10.30-10.40: UChicagoUChicago 10.40-10.50: myGrid, University of ManchestermyGrid 10.50-11.00: Kepler (SDSC) 11.00-11.10: Kepler (UCDavis) 11.10-11.20: VisTrails, University of UtahVisTrails 13.00-13.10: REDUX, Database Research Group, MSRREDUX 13.10-13.20: CESNET, GRID research group, CESNET z.s.p.o. Prague, Czech RepublicCESNET 13.20-13.30: Karma, Computer Science Department, Indiana UniversityKarma 13.30-13.40: MINDSWAP, Semantic Web Research Group, University of Maryland, College ParkMINDSWAP 13.40-13.50: PASS, Harvard slidesPASSslides 13.50-14.00: Southampton, PASOA/EU ProvenanceSouthampton 14.00-14.10: Gridprovenance, Cardiff UniversityGridprovenance 14.10-14.20: ISI 14.20-14.30: NCSA 14.30-14.40: ES3, University of California, Santa BarbaraES3


Download ppt "Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau."

Similar presentations


Ads by Google