Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

Oyster, Edinburgh, May 2006 AIFB OYSTER - Sharing and Re-using Ontologies in a Peer-to-Peer Community Raul Palma 2, Peter Haase 1 1) Institute AIFB, University.

Mitsunori Ogihara Center for Computational Science

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.

Modelling and computing the quality of information in e-science Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University of.

Chapter 6: Modeling and Representation Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.

Ontological Logic Programming by Murat Sensoy, Geeth de Mel, Wamberto Vasconcelos and Timothy J. Norman Computing Science, University of Aberdeen, UK 1.

1 A Description Logic with Concrete Domains CS848 presentation Presenter: Yongjuan Zou.

Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.

OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.

Ontology Notes are from:

Chapter 6: Modeling and Representation Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.

Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens.

What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.

Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at

How can Computer Science contribute to Research Publishing?

Description Logics. Outline Knowledge Representation Knowledge Representation Ontology Language Ontology Language Description Logics Description Logics.

Kmi.open.ac.uk Semantic Execution Environments Service Engineering and Execution Barry Norton and Mick Kerrigan.

1 Semantic Web Mining Presented by: Chittampally Vasanth Raja 10IT05F M.Tech (Information Technology)

Domain Modelling the upper levels of the eframework Yvonne Howard Hilary Dexter David Millard Learning Societies LabDistributed Learning, University of.

Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.

A Semantic Workﬂow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …

Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.

Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.

Towards the Management of Information Quality in Proteomics David Stead University of Aberdeen.

OWL 2 Web Ontology Language. Topics Introduction to OWL Usage of OWL Problems with OWL 1 Solutions from OWL 2.

Building an Ontology of Semantic Web Techniques Utilizing RDF Schema and OWL 2.0 in Protégé 4.0 Presented by: Naveed Javed Nimat Umar Syed.

“Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical.

Populating A Knowledge Base From Text Clay Fink, Tim Finin, Christine Piatko and Jim Mayfield.

Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.

IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,

1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.

Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.

Michael Eckert1CS590SW: Web Ontology Language (OWL) Web Ontology Language (OWL) CS590SW: Semantic Web (Winter Quarter 2003) Presentation: Michael Eckert.

Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Semantic Web - an introduction By Daniel Wu (danielwujr)

Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.

Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.

Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.

Combining the strengths of UMIST and The Victoria University of Manchester Quality views: capturing and exploiting the user perspective on information.

SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.

User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.

Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.

Mining the Biomedical Research Literature Ken Baclawski.

EUROPEANA DATA MODEL, short-term plans EDM worskhop 2015 Netherlands, Public Domain , Rijksmuseum Anonymous Arrival of a Portuguese ship.

Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.

Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani

Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood

Ontology Evaluation, Metrics, and Metadata in NCBO BioPortal Natasha Noy Stanford University.

1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo

Manufacturing Systems Integration Division Development Process and Testing Tools for Content Standards Simon Frechette National Institute of Standards.

An Ontological Approach to Financial Analysis and Monitoring.

OWL Web Ontology Language Summary IHan HSIAO (Sharon)

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Modelling and computing the quality of information in e-science Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University of.

High throughput biology data management and data intensive computing drivers George Michaels.

International Workshop 28 Jan – 2 Feb 2011 Phoenix, AZ, USA Ontology in Model-Based Systems Engineering Henson Graves 29 January 2011.

Of 24 lecture 11: ontology – mediation, merging & aligning.

Service-Oriented Computing: Semantics, Processes, Agents

What contribution can automated reasoning make to e-Science?

Web Ontology Language for Service (OWL-S)

Ontology Reuse In MBSE Henson Graves Abstract January 2011

Service-Oriented Computing: Semantics, Processes, Agents

Service-Oriented Computing: Semantics, Processes, Agents

Presentation transcript:

Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University of Aberdeen Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science, University of Manchester David Stead, Al Brown Molecular and Cell Biology, University of Aberdeen Describing the Quality of Curated e-Science Information Resources

Combining the strengths of UMIST and The Victoria University of Manchester E-science experiment Information and quality in e-science Scientists required to place their data in the public domain Scientists use other scientists' experimental results as part of their own work Lab experiment In silico experiments (eg Workflow-based) How can I decide whether I can trust this data? Variations in the quality of the data No control over the quality of public data Difficult to measure and assess quality - No standards Public BioDBs

Combining the strengths of UMIST and The Victoria University of Manchester A concrete scenario Qualitative proteomics: identification of proteins in a cell sample Step 1Step n Candidate Data for matching (peptides peak lists) Match algorithm Reference DBs - MSDB - NCBI - SwissProt/Uniprot Wet lab Information service (“Dry lab”) Hit list: {ID, Hit Ratio, Mass Coverage,…} False negatives: incompleteness of reference DBs, pessimistic matching False positives: optimistic matching False negatives: incompleteness of reference DBs, pessimistic matching False positives: optimistic matching

Combining the strengths of UMIST and The Victoria University of Manchester Quality is personal Scientists tend to express their quality requirements for data by giving acceptability criteria These are personal and vary with the expected use of the data “What is the right trade-off between false positives and false negatives?”

Combining the strengths of UMIST and The Victoria University of Manchester Requirements for IQ ontology 1. Establish a common vocabulary –Let scientists express quality concepts and criteria in a controlled way –Within homogeneous scientific communities –Enable navigation and discovery of existing IQ concepts 2. Sharing and reuse: let users contribute to the ontology while ensuring consistency –Achieve cost reduction 3. Making IQ computable in practice –Automatically apply acceptability criteria to the data

Combining the strengths of UMIST and The Victoria University of Manchester Quality Indicators Quality Indicators: measurable quantities that can be used to define acceptability criteria: “Hit Ratio”, “Mass Coverage”, “ELDP” –provided by the matching algorithm Match algorithm Information service (“Dry lab”) Hit list: {proteinID Hit Ratio, Mass Coverage,…} Experimentally established correlation between these indicators and the probability of mismatch

Combining the strengths of UMIST and The Victoria University of Manchester Data acceptability criteria Indicators used as indirect “clues” to assess quality Quality Assertions (QA) formally capture these clues as functions of indicators Data classification or ranking functions: ex: PIClassifier defined as f(proteinID, Hit Ratio, Mass Coverage, ELDP)  { (proteinID, rank) } –This provides a custom ranking of the match results Formalized acceptability criteria are conditions on QAs accept(proteinID) if PIClassifier(ProteinID,…) > X OR …

Combining the strengths of UMIST and The Victoria University of Manchester IQ ontology backbone Class restriction: MassCoverage   is-evidence-for. ImprintHitEntry Class restriction: PIScoreClassifier   assertion-based-on-evidence. HitScore PIScoreClassifier   assertion-based-on-evidence. Mass Coverage assertion-based-on-evidence: QualityAssertion  QualityEvidence is-evidence-for: QualityEvidence  DataEntity

Combining the strengths of UMIST and The Victoria University of Manchester Quality properties Users may add to a collection of generic quality properties Accuracy Currency Consistency Completenes s Conformity Timeliness Conciseness PI-acceptability ? User-defined Quality property Generic quality properties Part of the backbone How do we ensure consistent specialization?

Combining the strengths of UMIST and The Victoria University of Manchester … Specializations of base ontology concepts Concrete assertion (informal): “the property Accuracy of Protein Identification is based upon the Hit Ratio indicator for Protein Hit data” Concrete assertion (informal): “the property Accuracy of Protein Identification is based upon the Hit Ratio indicator for Protein Hit data” Proteomics Protein identification Data Entity Quality Indicator … Abstract assertion (informal): “a Quality Property is based upon one or more Quality Indicators for a Data Entity ” Abstract assertion (informal): “a Quality Property is based upon one or more Quality Indicators for a Data Entity ” Quality Property … Accuracy Property Protein Hit Accuracy of Protein identification Hit Ratio

Combining the strengths of UMIST and The Victoria University of Manchester Maintaining consistency by reasoning Axiomatic definition for Accuracy: (  QtyProperty-from-QtyAssertion. (  QA-based-on-evidence. ConfidenceEvidence)) PI-TopK PMF-Match Ranking PI-acceptability Mass Coverage Hit Ratio PIMatch Confidence Characterization Accuracy QtyProperty-from-QtyAssertion Pref-based-on-evidence Based-on Output-of  Has-quality characterization Is a

Combining the strengths of UMIST and The Victoria University of Manchester Computing quality in practice Annotation model: Representation of indicator values as semantic annotations: –model: RDF schema –annotation instances: RDF metadata Binding model: Representation of the mapping between Data ontology classes  data resources Functions ontology classes  service resources Goal: to make quality assertions defined in the ontology computable in practice Goal: to make quality assertions defined in the ontology computable in practice

Combining the strengths of UMIST and The Victoria University of Manchester Data resource annotations Resource = Data items at various granularity Data item  indicator values

Combining the strengths of UMIST and The Victoria University of Manchester Data resource bindings Data class  data resource Account for different granularities, data types

Combining the strengths of UMIST and The Victoria University of Manchester Service resource bindings Function class  (Web) service implementation –Eg annotation function, QA function

Combining the strengths of UMIST and The Victoria University of Manchester The complete quality model

Combining the strengths of UMIST and The Victoria University of Manchester IQ Service Example

Combining the strengths of UMIST and The Victoria University of Manchester Summary An extensible OWL DL ontology for Information Quality –Consistency maintained using DL reasoning Used by e-scientists to share and reuse: –Quality indicators and metrics –Formal criteria for data acceptability Annotation model: generic schema for associating quality metadata to data resources Binding model: generic schema for mapping ontology concepts to (data, service) resources Model tested on data for proteomics experiments