2014-May-07. What is the problem? What have others done? What is our solution? Does it work? Outline 2.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
Prototype Knowledge Base: an on-line information service in dependability and security Hugh Glaser Electronics & Computer Science University of Southampton.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Logics for Data and Knowledge Representation Projects and thesis introduction.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Multi-Phase Reasoning of temporal semantic knowledge Sakirulai O. Isiaq and Taha Osman School of Computer and Informatics Nottingham Trent University Nottingham.
Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
Geographical Service: Gianluca Correndo, Manuel Salvadores, Yang Yang, Nicholas Gibbins, Nigel Shadbolt A compass for the Web of Data.
A web-based repository service for vocabularies and alignments in the Cultural Heritage domain Lourens van der Meij Antoine Isaac Claus Zinn.
LINKED DATA COMS E6125 Prof. Gail Kaiser Presented By : Mandar Mohe ( msm2181 )
Behshid Behkamal Ferdowsi University of Mashhad Web Technology Lab.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
Metadata : Concentrating on the data, not on the scheme Imma Subirats FAO of the United Nations Marcia Zeng Kent State University euroCRIS Meeting Bologna.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
RDA and Linking Library Data VuStuff III Conference Villanova University, Villanova, PA October 18, 2012 Dr. Sharon Yang Rider University.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
© Copyright 2013 STI INNSBRUCK Linked Open Data Anna Fensel, Ioannis Stavrakantonakis,
Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10 th August 2011.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Semantic Web Programming in Python an Introduction Biju B Jaganath G.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
U.S. Department of the Interior U.S. Geological Survey A Consideration of Geospatial Feature Formation in Linked Open Vocabularies Workshop on Linked Open.
Semantic Enhancement: Key to Massive and Heterogeneous Data Pools Violeta Damjanovic, Thomas Kurz, Rupert Westenthaler, Wernher Behrendt, Andreas Gruber,
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Auditing Grey in a CRIS Environment
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Semantic Web Final Exam Review. Topics for Final Exam First exam material (~30%) Design Patterns and Map/Reduce (~20%) Inference / Restrictions (~10%)
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
Paloma Marín Arraiza 17 th International Conference on Grey Literature 1 st and 2 nd December 2015, Amsterdam (Netherlands) SCIENTIFIC AUDIOVISUAL MATERIALS.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
The Registration Agency, DDI and Linked Open Data
A Hierarchical Model for Object-Oriented Design Quality Assessment
Cloud based linked data platform for Structural Engineering Experiment
Query Rewriting Framework for Spatial Data
Work plan revisited Activity 3 Impact Activity 4 Management
Wheat Data Interoperability Esther DZALE YEUMO KABORE Richard FULSS
Big Data Quality the next semantic challenge
Lifting Data Portals to the Web of Data
Ontology based Collection Discovery
ece 720 intelligent web: ontology and beyond
An ecosystem of contributions
PREMIS Tools and Services
The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems
Big Data Quality the next semantic challenge
LOD reference architecture
Resource Description Framework (RDF)
W3C Recommendation 17 December 2013 徐江
Linked Data 101 Things, URIs, RDF, Triples, Turtle, Ontologies, Vocabularies and SPARQL Linked Data is our Implementation choice for FAIR.
Linked Data Ryan McAlister.
Big Data Quality the next semantic challenge
Rome Hackathon results March 2019
Classifications and Linked Open Data Formalizing the structure and content of statistical classifications Item 9.1 Standards Working Group Luxembourg,
Presentation transcript:

2014-May-07

What is the problem? What have others done? What is our solution? Does it work? Outline 2

What is the problem? Linked Open Data (LOD): ▫Realizing Semantic Web by interlinking existing but dispersed data Main components of LOD: ▫URIs to identify thingsURIs ▫RDF to describe dataRDF ▫HTTP to access dataHTTP 3

Datasets: 295 Triples: over 30,000,000,000 (30 B) Links: over 500,000,000 (500 M) 4 What is the problem?

Inclusion Criteria for publishing and interlinking datasets into LOD cloud resolvable http/https URIs Presented in one of the standard formats of Semantic Web (RDF, RDFa, RDF/XML, Turtle, N-Triples) Contains at least 1000 triples Connected via at least 50 RDF links to the existing datasets of LOD Accessible via RDF crawling, RDF dump, or SPARQL endpoint Is dataset ready to publish? 5 What is the problem?

6 Publishing first, improving later Idea of the LOD: Publishing first, improving later quality problems in the published datasets Results in: quality problems in the published datasets Missing link: What is the problem?

Data quality in the Context of LOD General Validators Parsing and Syntax Accessibility / Dereferencability Validators Quality Assessment of Published data Classifying quality problems of LOD Using metadata for quality assessment filtering poor quality data (WIQA) Semantic Annotation using ontologies 7 What have others done?

Limitations of related works: Syntax validation, not quality evaluation Not scalable Not full automated Evaluation after publishing 8 What have others done?

What is our solution? Proposing a set of metrics for Inherent quality assessment of datasets before interlinking to LOD cloud 9

10 What is our solution?

Studying data quality models Defining inherent quality of LOD Selecting the basic model (ISO-25012) Mapping quality dimensions of ISO to LOD Selecting Inherent Quality Dimensions

Inherent Quality of LOD InterlinkingCompletenessSemantic AccuracySyntax AccuracyUniquenessConsistency Selecting Inherent Quality Dimensions

Defining metrics using GQM Implementing an automated tool Formal definition Proposing Metrics Example: Goal: Goal: Assessment of the consistency of a dataset in the context of LOD Question: Question: What is the degree of conflict in the context of data value? Metric: Metric: The number of functional properties with inconsistent values

14 LODQM: Linked Open Data Quality Model 6 Quality dimensions 6 Quality dimensions 32 Metrics 32 Metrics 3. Developing LODQM

Using Theoretical Measurement Framework Identifying properties of desirable metrics Validating metrics Theoretical Validation Metric Type Number of metrics Null- Value Non- Negativity SymmetryMonotonicity Disjoint Module Additivity Merging Cohesive Modules Complexity 29 √ √ √√ n/a __ Cohesion 2 √ √ _ √ __ √ Coupling 1 √√ _ √ n/a √ _

Selecting several real datasets from LOD Calculation of the metrics values for datasets Metrics interdependency Study Manipulating the quality of the datasets Comparing the trends of Metrics over two observations Collecting experts’ subjective perception on quality dimensions Correlation study between metrics and quality dimensions Empirical Evaluation

17 Selecting several real datasets from LOD Calculation of the metrics values for datasets Metrics interdependency Study Manipulating the quality of the datasets Comparing the trends of Metrics over two observations Collecting experts’ subjective perception on quality dimensions Correlation study between metrics and quality dimensions Datasets No. of triples No. of instances No. of classes No. of properties FAO Water Areas 10, Water Economic Zones 29,1931, Large Marine Ecosystems 12, Geopolitical Entities 22, ISSCAAP Species Classification 398,16625, Species Taxonomic Classification 319,49011, Commodities 56,4202, Vessels 4, Empirical Evaluation √

18 Selecting several real datasets from LOD Calculation of the metrics values for datasets Metrics interdependency Study Manipulating the quality of the datasets Comparing the trends of Metrics over two observations Collecting experts’ subjective perception on quality dimensions Correlation study between metrics and quality dimensions√ √ 5. Empirical Evaluation

19 Selecting several real datasets from LOD Calculation of the metrics values for datasets Metrics interdependency Study Manipulating the quality of the datasets using heuristics Comparing the trends of Metrics over two observations Collecting experts’ subjective perception on quality dimensions Correlation study between metrics and quality dimensions√ √ √ 5. Empirical Evaluation Result: Three pairs of metrics are correlated: {IFP, Im_DT} {Im_DT, Sml_Cls} {Inc_Prp_Vlu, IF} The others are independent

20 Selecting several real datasets from LOD Calculation of the metrics values for datasets Metrics interdependency Study Manipulating the quality of the datasets using heuristics Comparing the trends of Metrics over two observations Collecting experts’ subjective perception on quality dimensions Correlation study between metrics and quality dimensions√ √ √ √ 5. Empirical Evaluation

21 Selecting several real datasets from LOD Calculation of the metrics values for datasets Metrics interdependency Study Manipulating the quality of the datasets using heuristics Comparing the trends of Metrics over two observations Collecting experts’ subjective perception on quality dimensions Correlation study between metrics and quality dimensions√ √ √ √ √ √ 5. Empirical Evaluation

22 Selecting several real datasets from LOD Calculation of the metrics values for datasets Metrics interdependency Study Manipulating the quality of the datasets using heuristics Comparing the trends of Metrics over two observations Collecting experts’ subjective perception on quality dimensions Correlation study between metrics and quality dimensions√ √ √ √ √ √ 5. Empirical Evaluation Result: Only one pair of quality dimensions is correlated: {Interlinking, Syntactic accuracy} The others are independent √

Applying PCA Method to select the highly correlated metrics Developing predictive models Assessing the quality of new datasets using models Quality Prediction Result: 20 out of 32 metrics are selected Using Neural Network Method: MultiLayerPerceptron Dataset No. of triplesNo. of instancesDomain Geonames 6, Geography IMDB Movie Anatomy 6, Anatomy Citeseer 948, Publication FAO 248,73128,098 Food Science

24 6. Quality Prediction

Conclusion on Metrics 25 Definable Proposed by GQM (32) Formally defined (32)Valid Theoretically validated (32)Practical Implemented (32) Correlated with quality Experts (28) Correlation study (27) PCA (20) Predictability MLP (20)

Appreciative of your Attention and Comments