Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.

Slides:



Advertisements
Similar presentations
University of Southampton Institutional Repository Fast flows the stream: tackling the workflow challenge with the University of Southampton Research Repository.
Advertisements

Harvards PASS Takes on The Provenance Challenge September 13, 2006 Margo Seltzer Harvard University Division of Engineering and Applied Sciences.
Open Provenance Model Tutorial Session 6: Interoperability.
Provenance Challenge, Sept Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
Open Provenance Model Tutorial Session 1: Background Luc Moreau University of Southampton.
Publishing Workflow for InDesign Import/Export of XML
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Web applications. Javascript. Web 2.0: The dynamic, read-write web UC Santa Cruz CMPS 10 – Introduction to Computer Science
Kapi’olani Community College Art 155 Information Architecture In-class Presentation Week 2B.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University.
Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
Lawson System Foundation 9.0
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
January, 23, 2006 Ilkay Altintas
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
DE&T (QuickVic) Reporting Software Overview Term
Access IT e-Learning courses – overview of an educational offering dedicated for small memory institutions Adam Dudczak Poznań Supercomputing and Networking.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction.
Usage of `provenance’: A Tower of Babel Luc Moreau.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Mario Costa Sousa Interactive Illustrative Graphics and Visualization Mario Costa Sousa Associate Professor Department of Computer Science Computer Graphics.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
A Logic Programming Approach to Scientific Workflow Provenance Querying* Shiyong Lu Department of Computer Science Wayne State University, Detroit, MI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
Paolo Missier (1), Bertram Luda ̈ scher (2), Shawn Bowers (3), Saumen Dey (2), Anandarup Sarkar (3), Biva Shrestha (4), Ilkay Altintas (5), Manish Kumar.
Search Engine Architecture
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Kepler+PF+RWS, Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance GGF18 RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Building Map Books.
Page 1© Crown copyright 2004 FLUME Metadata Steve Mullerworth 3 rd -4 th October May 2006.
REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
INFORMATION MANAGEMENT Module INFORMATION MANAGEMENT Module
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Metric Studio Introduction Beget Software Solutions.
34 Copyright © 2007, Oracle. All rights reserved. Module 34: Siebel Business Services Siebel 8.0 Essentials.
WHY DO YOU NEED IT? What is a wireframe?. A wireframe is… A wireframe is a simple visual guide to show you what a Web page would look like. Wireframes.
DICOMwebTM 2015 Conference & Hands-on Workshop University of Pennsylvania, Philadelphia, PA September 10-11, 2015 DICOMweb Workflow API (UPS-RS) Jonathan.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Shuang Wu REU-DIMACS, 2010 Mentor: James Abello. Project description Our research project Input: time data recorded from the ‘Name That Cluster’ web page.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
1 / 23 Presenter: Dong Dai, DISCL Lab. TTU Data-Intensive Scalable Computing Laboratory Department of Computer Science Accelerating Scientific.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
Facilitating Semantic Web Search with Embedded Grammar Tags (EGTs) Gautham K.Dorai Yaser Yacoob Department of Computer Science University of Maryland –
How People with Disabilities Access the Web
Search Engine Architecture
OGSA Data Architecture Scenarios
Chapter 4 Application Software
Multimedia Information Retrieval
Workflow Provenance Bill Howe.
Standard Scripts Project 2
Search Engine Architecture
Standard Scripts Project 2
Standard Scripts Project 2
Presentation transcript:

Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau

Provenance In the study of fine art, provenance refers to the documented history of some art object. If the provenance of data produced by computer systems could be determined like it can for some works of art, then users would be able to interpret and judge the quality of data better.

The Provenance of the Challenge Back in May: IPAW’06 (International Provenance and Annotation Workshop) Proceedings to appear in LNCS 4145

Standardisation discussion at IPAW’06 How can (workflow-based or other) systems inter-operate? Individual systems may be able to track provenance of data How can we that we track provenance of data across systems? Would a standard be useful? At the time, it was felt it was premature to standardise, we needed to understand systems’ capabilities

The Challenge Aims The provenance challenge aims to establish an understanding of the capabilities of available provenance-related systems –The representations that systems use to document details of processes that have occurred –The capabilities of each system in answering provenance-related queries –What each system considers to be within scope of the topic of provenance (regardless of whether the system can yet achieve all problems in that scope) twiki.ipaw.info

The Challenge Process Each participant in the challenge will have their own page on this TWiki, following the ChallengeTemplate, where they can inform the rest of their efforts in meeting the challenge..ChallengeTemplate –Representations of the workflow in their system –Representations of provenance for the example workflow –Representations of the result of the core (and other) queries –Contributions to a matrix of queries vs systems, indicating for each that:matrix of queries vs systems (1) the query can be answered by the system, (2) the system cannot answer the query now but considers it relevant, (3) the query is not relevant to the project. Optionally, the participants may like to contribute the following. –Additional queries that illustrate the scope of their system –Extensions to the example workflow to best illustrate the unique aspects of their system –Any categorisation of queries that the project considers to have practical value twiki.ipaw.info

The Queries 1.Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. 2.Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean. 3.Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic. 4.Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday.model menualign_warp 5.Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility.scanheader 6.Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." 7.A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg 8.A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago. 9.A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

17 Participating Teams REDUX, Database Research Group, MSRREDUX MINDSWAP, Semantic Web Research Group, University of Maryland, College ParkMINDSWAP Karma, Computer Science Department, Indiana UniversityKarma CESNET, GRID research group, CESNET z.s.p.o. Prague, Czech RepublicCESNETCESNET z.s.p.o. myGrid, University of ManchestermyGrid VisTrails, University of UtahVisTrails Gridprovenance, Cardiff UniversityGridprovenance ES3, University of California, Santa BarbaraES3 UPenn, University of Pennsylvania, Database GroupUPenn RWS, UC Davis and SDSC, CaliforniaRWS DAKS, Genome Center, UC Davis, CaliforniaDAKS PASS, HarvardPASS SDG, Pacific Northwest National LabSDG NcsaD2k and NcsaCi, National Center for Supercomputing ApplicationsNcsaD2kNcsaCi UChicago, University of Chicago Computation InstituteUChicago Southampton, University of Southampton, PASOA and Provenance projectsSouthampton USC/ISI, University Of Southern California/Information Sciences InstituteUSC/ISI twiki.ipaw.info

Schedule Session 1: Wednesday team presentations Session 2: Wednesday team presentations Session 3: Wednesday Session 4: Thursday analysing commonalities and differences Session 5: Thursday what next? sessions 3-5 are open, contribute ideas on twiki

: Introduction : PNL : UPenn, University of Pennsylvania, Database GroupUPenn : UChicagoUChicago : myGrid, University of ManchestermyGrid : Kepler (SDSC) : Kepler (UCDavis) : VisTrails, University of UtahVisTrails : REDUX, Database Research Group, MSRREDUX : CESNET, GRID research group, CESNET z.s.p.o. Prague, Czech RepublicCESNET : Karma, Computer Science Department, Indiana UniversityKarma : MINDSWAP, Semantic Web Research Group, University of Maryland, College ParkMINDSWAP : PASS, Harvard slidesPASSslides : Southampton, PASOA/EU ProvenanceSouthampton : Gridprovenance, Cardiff UniversityGridprovenance : ISI : NCSA : ES3, University of California, Santa BarbaraES3