Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 31 Slide 1 Service-centric Software Engineering 1.
Modelling and computing the quality of information in e-science Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University of.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
Semantic Web Services Peter Bartalos. 2 Dr. Jorge Cardoso and Dr. Amit Sheth
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Nadia Ranaldo - Eugenio Zimeo Department of Engineering University of Sannio – Benevento – Italy 2008 ProActive and GCM User Group Orchestrating.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Surfing the Service Web Sudhir Agarwal, Siegfried Handschuh, and Steffen Staab Presenter: Yihong Ding.
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.
1 Adapting BPEL4WS for the Semantic Web The Bottom-Up Approach to Web Service Interoperation Daniel J. Mandell and Sheila McIlraith Presented by Axel Polleres.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Process-oriented System Automation Executable Process Modeling & Process Automation.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
David Willmor and Suzanne M Embury Informatics Process Group
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Ontology-derived Activity Components for Composing Travel Web Services Matthias Flügge Diana Tourtchaninova
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Towards the Management of Information Quality in Proteomics David Stead University of Aberdeen.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
Combining the strengths of UMIST and The Victoria University of Manchester Quality views: capturing and exploiting the user perspective on information.
Logical view –show classes and objects Process view –models the executables Implementation view –Files, configuration and versions Deployment view –Physical.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
Prof S.Ramachandram Dept of CSE,UCE Osmania University
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Semantic sewer pipe failure detection: Linked data approaches for discovering events Jonathan Yu | Research software engineer Environmental Information.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Modelling and computing the quality of information in e-science Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University of.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Service Oriented Architecture (SOA) Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
The GEMBus Architecture and Core Components
Web Ontology Language for Service (OWL-S)
Service-centric Software Engineering 1
CCO: concept & current status
ece 627 intelligent web: ontology and beyond
Semantic Markup for Semantic Web Tools:
Chaitali Gupta, Madhusudhan Govindaraju
Business Process Management and Semantic Technologies
Presentation transcript:

Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University of Manchester, UK Alun Preece, Binling Jin Department of Computing Science University of Aberdeen, UK

Combining the strengths of UMIST and The Victoria University of Manchester Integration of public data (in biology) GenBank UniProt EnsEMBLEntrezdbSNP Large volumes of data in many public repositories Increasingly creative uses for this data Their quality is largely unknown

Combining the strengths of UMIST and The Victoria University of Manchester Quality of e-science data Defining quality can be challenging: In-silico experiments express cutting-edge research –Experimental data liable to change rapidly –Definitions of quality are themselves experimental Scientists’ quality requirements often just a hunch –Quality tests missing or based on experimental heuristics –Often implicit and embedded in the experiment  not reusable Criteria for data acceptability within a specific data processing context A data consumer’s view on quality:

Combining the strengths of UMIST and The Victoria University of Manchester Example: protein identification Data output Protein identification algorithm “Wet lab” experiment Reference databases Protein Hitlist Protein function prediction Remove likely false positives  Improve prediction accuracy Quality filtering Goal: to explicitly define and automatically add the additional filtering step in a principled way Goal: to explicitly define and automatically add the additional filtering step in a principled way Support evidence: provenance metadata

Combining the strengths of UMIST and The Victoria University of Manchester Our goals Offer e-scientists a principled way to: Discover quality definitions for specific data domains Make them explicit using a formal model Implement them in their data processing environment Test them on their data … in an incremental refinement cycle Benefits: Automated processing Reusability “plug-in” quality components

Combining the strengths of UMIST and The Victoria University of Manchester Approach Research hypothesis: adding quality to data can be made cost-effective –By separating out generic quality processing from domain- specific definitions Define abstract quality views on the data Map quality view to an executable process Execute quality views - runtime environment - data-specific quality services Qurator architectural framework:

Combining the strengths of UMIST and The Victoria University of Manchester Abstract quality view model Data Assertions Class space 1 C 11 C 12 … C 21 C 22 … Class space 2 Classification 1 Classification 2 Actions on regions Conditions: regions specification Quality Metadata Evidence e1e1 e2e2 e3e3 Data annotation Coverage PeptidesCount

Combining the strengths of UMIST and The Victoria University of Manchester Semantic model for quality concepts Quality “upper ontology” (OWL) Quality “upper ontology” (OWL) Evidence annotations are class instances Quality evidence types Evidence Meta-data model (RDF) Evidence Meta-data model (RDF)

Combining the strengths of UMIST and The Victoria University of Manchester Quality hypotheses discovery and testing Performance assessment Execution on test data abstract quality view Compilation Targeted Compilation Quality-enhanced User environment Quality-enhanced User environment Quality-enhanced User environment Target-specific Quality component Target-specific Quality component Target-specific Quality component Deployment Multiple target environments: Workflow query processor

Combining the strengths of UMIST and The Victoria University of Manchester Generic quality process pattern Collect evidence - Fetch persistent annotations - Compute on-the-fly annotations <variables <var variableName="Coverage“ evidence="q:Coverage"/> <var variableName="PeptidesCount“ evidence="q:PeptidesCount"/> Evaluate conditions Execute actions ScoreClass in {``q:high'', ``q:mid''} and Coverage > 12 Compute assertions Classifier <QualityAssertion serviceName="PIScoreClassifier" serviceType="q:PIScoreClassifier" tagSemType="q:PIScoreClassification" tagName="ScoreClass" Persistent evidence

Combining the strengths of UMIST and The Victoria University of Manchester Bindings: assertion  service service class  Web service endpoint PIScoreClassifier  All services implement the same WSDL interface Makes concrete assertion functions homogeneous Facilitates compilation Uniform input / output messages PIScoreClassifierSvc Common WSDL interface PI_Top_k_svc D = {(d i, evidence(d i ))} {class(d i )} {score(d i )} (service registry)

Combining the strengths of UMIST and The Victoria University of Manchester Execution model for Quality views Binding  compilation  executable component –Sub-flow of an existing workflow –Query processing interceptor Host workflow Abstract Quality view Embedded quality workflow QV compiler D D’Quality view on D’ Qurator quality framework Services registry Services implementation Host workflow: D  D’

Combining the strengths of UMIST and The Victoria University of Manchester Example: original proteomics workflow Taverna (*): workflow language and enactment engine for e-science applications (*) part of the myGrid project, University of Manchester - taverna.sourceforge.net Quality flow embedding point

Combining the strengths of UMIST and The Victoria University of Manchester Example: embedded quality workflow

Combining the strengths of UMIST and The Victoria University of Manchester Interactive conditions / actions

Combining the strengths of UMIST and The Victoria University of Manchester Quality views for queries Actions: filtering, dump to DB / file

Combining the strengths of UMIST and The Victoria University of Manchester Qurator architecture

Combining the strengths of UMIST and The Victoria University of Manchester Summary For complex data types, often no single “correct” and agreed-upon definition of quality of data Qurator provides an environment for fast prototyping of quality hypotheses –Based on the notion of “evidence” supporting a quality hypothesis –With support for an incremental learning cycle Quality views offer an abstract model for making data processing environments quality-aware –To be compiled into executable components and embedded –Qurator provides an invocation framework for Quality Views More info and papers: Live demos (informal) available