IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,

Slides:



Advertisements
Similar presentations
CICC June meeting IUPUI team: Kelsey Forsythe Malika Mahoui Deepthi Jonnala Usha Cheemakurthi.
Advertisements

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Modelling and computing the quality of information in e-science Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University of.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Design Concepts and Principles
Database Searches. Peptide mass fingerprinting digestMS Search HIT SCORE Protein X 1000 Protein Y 50 Protein Z 5 Protein X theoretical digestProtein Y.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Query Processing and Reasoning How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users? Esther Kaufmann and Abraham Bernstein.
Chapter 10 Quality Control McGraw-Hill/Irwin
Train Control Language Teaching Computers Interlocking By: J. Endresen, E. Carlson, T. Moen1, K. J. Alme, Haugen, G. K. Olsen & A. Svendsen Synthesizing.
February 12, 2009 Center for Hybrid and Embedded Software Systems Encapsulated Model Transformation Rule A transformation.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
Strategies to relate the program and problem domains using code instrumentation Mario Marcelo Berón University of Minho Pedro Rangel Henriques University.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Rainbow Facilitating Restorative Functionality Within Distributed Autonomic Systems Philip Miseldine, Prof. Taleb-Bendiab Liverpool John Moores University.
© 2011 Pearson Prentice Hall, Salkind. Introducing Inferential Statistics.
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory A. Henson, and Amit P. Sheth Kno.e.sis Center, Ohio Center of Excellence on Knowledge-enabled Computing,
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Towards the Management of Information Quality in Proteomics David Stead University of Aberdeen.
1 CENTRIA, Dept. Informática da Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal. 2 Institute of Computer Science,
Understanding the utility and fitness of Workflow Provenance for Experiment Reporting Pınar Alper, Supervisor: Carole A. Goble 1.
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
Mining Minds Mr. Amjad UsmanMr. Amjad Usman19-July-2014KHU High-level Context Awareness.
Integrated Development Environment for Policies Anjali B Shah Department of Computer Science and Electrical Engineering University of Maryland Baltimore.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
DR Software: Essential Foundational Elements and Platform Components UCLA Smart Grid Energy Research Center (SMERC) Industry Partners Program (IPP) Meeting.
Košice, 10 February Experience Management based on Text Notes The EMBET System Michal Laclavik.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Exploitation of Dynamic Information Relations in the Service-Oriented AFRL Information Management Systems Andrzej Uszok, Larry Bunch, Jeffrey M. Bradshaw.
Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Enabling Reuse-Based Software Development of Large-Scale Systems IEEE Transactions on Software Engineering, Volume 31, Issue 6, June 2005 Richard W. Selby,
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Knowledge Enabled Information and Services Science Glycomics project overview.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Combining the strengths of UMIST and The Victoria University of Manchester Quality views: capturing and exploiting the user perspective on information.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Gordana Rakić, Zoran Budimac
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Application Ontology Manager for Hydra IST Ján Hreňo Martin Sarnovský Peter Kostelník TU Košice.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
 Programming - the process of creating computer programs.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Quality Control Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Modelling and computing the quality of information in e-science Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University of.
Christopher Pierce (Cleveland Clinic)
1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.
Scientific Method 1a. Select and use appropriate tools and technology(such as computer- linked probes, spreadsheets, and graphing calculators) to perform.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Technologies Stuart N. Wrigley 1, Raúl García-Castro 2 and Cassia Trojahn 3 1.
Chapter 8 Introducing Inferential Statistics.
CIS 375 Bruce R. Maxim UM-Dearborn
Online Laptop Shop through Semantic Web
High level view of the MAE algorithm.
Chaitali Gupta, Madhusudhan Govindaraju
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
Presentation transcript:

IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury, Richard Stapenhurst Information Management Group School of Computer Science The University of Manchester, UK

IPAW'08 – Salt Lake City, Utah, June 2008 Outline Setting: the problem of quality control in scientific workflows Quality control is an automated decision process –accept /reject data based on user-defined criteria –part of the workflow  quality workflow Role of workflow provenance in explaining automated decisions –why was data element X accepted/rejected?

IPAW'08 – Salt Lake City, Utah, June 2008 Scope of provenance analysis Model-driven quality workflows: –automatically generated from a specification –makes for a predictable workflow structure Services in quality workflows are semantically annotated The provenance data model exploits the semantics: –provenance queries leverage the ontology –provenance elements explained in ontology terms

IPAW'08 – Salt Lake City, Utah, June 2008 Practical setting Scientific workflows accelerate the rate at which results are produced Quality control on the results becomes paramount –automation / high throughput limit the options for systematic human inspection –use of public resources (data, services) may introduce noise: e.g. dirty data Risk of producing invalid results but: quality metrics vary with data and application domain

IPAW'08 – Salt Lake City, Utah, June 2008 Example: protein identification process Data output Protein identification algorithm “Wet lab” experiment Protein Hitlist Protein function prediction Correct entry  true positive Evidence: mass coverage (MC) measures the amount of protein seqnce matched Hit ratio (HR) gives an indication of the signal to noise ratio in a mass spectrum ELDP reflects the completeness of the digestion that precedes the peptide mass fingerprinting This evidence is independent of the algorithm / SW package It is readily available and inexpensive to obtain This evidence is independent of the algorithm / SW package It is readily available and inexpensive to obtain

IPAW'08 – Salt Lake City, Utah, June 2008 Quality process components PMF score = (HR x 100) + MC + (ELDP x 10)‏ Quality assertion: Evidence: mass coverage (MC)‏ Hit ratio (HR)‏ ELDP Collect evidence Evaluate conditions Execute actions Compute assertions Protein identification Protein Hitlist Protein function prediction Quality filtering actions rules: if (score < x)‏ then reject The Qurator hypothesis [VLDB06] quality controls have a common process representation –regardless of their specific data and application domain The Qurator hypothesis [VLDB06] quality controls have a common process representation –regardless of their specific data and application domain

IPAW'08 – Salt Lake City, Utah, June 2008 From quality processes to quality workflows Approach in practice: users provide a declarative specification of an abstract quality process (a “Quality View”)‏ The abstract process is automatically translated into a quality workflow –this makes arbitrary Taverna workflows “quality-aware”

IPAW'08 – Salt Lake City, Utah, June 2008 Example: original proteomics workflow Quality flow embedding point

IPAW'08 – Salt Lake City, Utah, June 2008 Example: embedded quality workflow

IPAW'08 – Salt Lake City, Utah, June 2008 Qurator provenance component scope: workflow run scope: workflow run data being quality assessed data being quality assessed quality metrics applied to the data value of metric on the data value of metric on the data evidence used to compute metrics evidence used to compute metrics quality rules based on metrics values quality rules based on metrics values statistics Specialised for quality workflows

IPAW'08 – Salt Lake City, Utah, June 2008 Semantics of quality processors upper ontology for Information Quality upper ontology for Information Quality extensions to the proteomics domain extensions to the proteomics domain services and data

IPAW'08 – Salt Lake City, Utah, June 2008 Provenance model Provenance elements are individuals of ontology classes –OWL ontology => RDF provenance data Static model – RDF graph –workflow graph structure, services –auto-generated along with the quality workflow itself Dynamic model – RDF graph –populated during workflow execution –RDF resources can be elements of the static model –data values are literals

IPAW'08 – Salt Lake City, Utah, June 2008 Static model (fragment)‏

IPAW'08 – Salt Lake City, Utah, June 2008 Dynamic model (fragment)‏

IPAW'08 – Salt Lake City, Utah, June 2008 Provenance service interface Java SPARQL API (Jena ARQ)‏ –GUI shown earlier is an example Queries are straightforward SPARQL –3-layer workflow pattern => no recursion Examples –all evidence for data elements of class ProteinHitEntry [for a given execution] ?x rdf:type ProteinHitEntry –all action outcomes [for a given execution] –values for all quality metrics [for a given execution and data element] –...

IPAW'08 – Salt Lake City, Utah, June 2008 Conclusions An experiment in “semantic provenance” –restricted to quality workflows Semantic service annotations => high-level provenance query / presentation Key enabler: workflow is the result of a compilation step –regular pattern facilitates analysis / presentation Speculative conclusion: –workflows are targets, not sources... –model-driven generation of workflows has benefits and will happen more and more Speculative conclusion: –workflows are targets, not sources... –model-driven generation of workflows has benefits and will happen more and more