Big Data Quality Identity in Linked Data

Slides:



Advertisements
Similar presentations
Requirements gathering
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Semantic Web Thanks to folks at LAIT lab Sources include :
CS570 Artificial Intelligence Semantic Web & Ontology 2
By Ahmet Can Babaoğlu Abdurrahman Beşinci.  Suppose you want to buy a Star wars DVD having such properties;  wide-screen ( not full-screen )  the extra.
Knowledge Representation
Chapter 8: Web Ontology Language (OWL) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
Where are the Semantics in the Semantic Web? Michael Ushold The Boeing Company.
Four Dark Corners of Requirements Engineering
Philosophy and Computer Science: New Perspectives of Collaboration
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Knowledge representation
Protege OWL Plugin Short Tutorial. OWL Usage The world wide web is a natural application area of ontologies, because ontologies could be used to describe.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
USCISIUSCISI Background Description Logic Systems Thomas Russ.
1 Science as a Process Chapter 1 Section 2. 2 Objectives  Explain how science is different from other forms of human endeavor.  Identify the steps that.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Pattern-directed inference systems
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Semantic Web - an introduction By Daniel Wu (danielwujr)
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
What’s Ahead for Embedded Software? (Wed) Gilsoo Kim
Modeling Security-Relevant Data Semantics Xue Ying Chen Department of Computer Science.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Ontology and the lexicon Nicola Guarino and Christopher A. Welty(2004). An Overview of OntoClean Weber ( 張澄清 ) 2014/04/23 1.
Artificial Intelligence Knowledge Representation.
Knowledge Representation Part I Ontology Jan Pettersen Nytun Knowledge Representation Part I, JPN, UiA1.
Logical Database Design and the Rational Model
Philosophy and Computer Science: New Perspectives of Collaboration
The Semantic Web By: Maulik Parikh.
Linked Data Web that can be processed by machines
Lecture 11 Persistence: arguments for perdurance
Bias.
Knowledge Representation Part II Description Logic & Introduction to Protégé Jan Pettersen Nytun.
Knowledge Representation Part I Ontology
Object-Oriented Software Engineering Using UML, Patterns, and Java,
ece 627 intelligent web: ontology and beyond
Big Data Quality Identity in Linked Data
Big Data Quality the next semantic challenge
Ontology.
Linked Data for SDG Reporting
ece 720 intelligent web: ontology and beyond
Objects First with Java
Introduction Artificial Intelligent.
Scientific Inquiry Unit 0.3.
Unpacking the Essay Question
ece 627 intelligent web: ontology and beyond
Linking Guide Michel Böhms.
KNOWLEDGE REPRESENTATION
[jws13] Evaluation of instance matching tools: The experience of OAEI
Big Data Quality the next semantic challenge
Nov. 29, 2001 Ontology Based Recognition of Complex Objects --- Problems to be Solved Develop Base Object Recognition algorithms that identify non-decomposable.
Ontology-Based Approaches to Data Integration
Ontology.
LOD reference architecture
Information Networks: State of the Art
Big Data Quality Identity in Linked Data
Big Data Quality the next semantic challenge
Presentation transcript:

Big Data Quality Identity in Linked Data Maria Teresa PAZIENZA a.a. 2017-18

Introduction The problem of «identity» is an outstanding and well known issue in artificial intelligence. In the web of linked data is the first time the problem is encountered by different individuals attempting to independently knit their knowledge representation together using the same standardized language. owl:sameAs in linked data tend to be mutually incompatible and almost always violate the strict logical semantics of identity demanded by owl:sameAs

What is Identity? «two URI references actually The problem of identity lies not within Linked Data per se; it is a long-standing and well-known issue in philosophy: the problem of identity and reference. Owl:sameAs construct semantics is defined as stating that «two URI references actually refer to the same thing and share all the same properties »

What is Identity on the web? Owl:sameAs can be considered just one type of «identity link» a link that declares two items/individuals to be identical in some fashion or otherwise closely related identity links define two individuals to be identical or otherwise closely related between diverse and heterogeneous data-sets. It is unrealistic to assume everybody will use the same name to refer to individuals. In fact it would require some grand design which is contrary to the spirit of the web. (Ex: prof. Pazienza, mother of the bride, … )

Leibnitz law a) If x is not identical to y, then there must be some property that they do not share. b) If x and y share all properties (i.e. they are indiscernible) then they are identical

Different temporal spatial coordinates Is Tim Berners-Lee as an adult the same as Tim Berners-Lee of five minutes ago? Or as a child? Or if he lost his arm? In the engineering discipline of knowledge representation we can never enumerate all possible properties. As a solution we can have some properties as those necessary for identity, an explicit theory of identity criteria. Two kind of properties: intrinsic to the identity extrinsic /purely relative to other things

Theory of identity Theories of identity could be based on different criteria: some theories subsume weaker or stronger ones, but others are simply incommensurable. Problems arise with respect to comparing values or asserted properties; ex. Vague values: 2 inches are the same as 5 cm? Contradictory properties: Morning Star refers to Venus- Evening Star refers to Venus; may we consider true the equality among the two stars?

Logical analisys of identity Inference: When someone says two things are the same, the two things share all the same properties ; so every property of one thing can be inferred to be a property of the other. The question is: does such a definition of identity work in a decentralized environment such as the Web of Linked Data? The real problem with the use of URI as identifiers and Owl:sameAs is a problem of context and the implicit import of properties.

Varieties of identity A possible solution: the agent’s claim, i.e. The statement of identity is not necessarily true, but only stated by a particular agent Then different agents may accept different identity statements and so have different inferences; once an agent accepts an identity claim, the agent is bound to all its valid inferences This issue comes into play when different agents describe the world at different levels of granurality.

Varieties of identity A possible solution: weaker notion of being similar, i.e. Two different things share some -but not all -properties in their given incomplete descriprion A wine glass and a coffee-cup are similar as regards holding liquids, but they hold entirely different kinds of liquid usually and are different shapes; so Leibnitz’s Law would not hold obviously as they are different things. Condivisione di proprietà di livelli più alti dell’ontologia

Varieties of identity A possible solution: related relationship, i.e. When two different things share no properties in common in a given description but are nonetheless closely aligned in some fashion Complex, structured, yet hard to-specify relationships between things that are «kind of close to identity», such as the relation between a quantity and a measurent of a quantity, or the use of a drug in a clinical trial and the drug itself As on some trivial level «everithing is related», there are degrees of relatedness. There is a family of heterogeneous and semi-structured relationships that should be studied more carefully and empirically observed before any hasty judgements are made

The similarity ontology It has been proposed a number of new relationships of identity based on permutation around each of the properties of transitivity, symmetry and reflexivety: the Similarity Ontology. We can use these properties to make inferences about the relationship in a certain domain-specific cases, while one would not thereby necessarily be claiming that any two objects having this new kind of relationship would share properties.

Sub-property relationships between the properties of the Similarity Ontology and existing properties from OWL, RDFS, and SKOS

Inference (with Similarity Onotology) A particular property or set of properties are isomorphic across a particular kind of similarity. This kind of entailment can be performed through introduction of a property chain introduced in OWL2, to express «same relevant property as» It is much more structured than a vague notion of matching and similarity

Linked Data quality Linked Data quality can be measured along several dimensions, including accessibility, interlinking, performance, syntactic validity or completeness In each of these dimensions, we can define a number of concrete metrics, which can be used to precisely and objectively measure a certain indicator for linked data quality. Additionally, domain specific quality metrics can be defined

Linked Data quality Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links.

Objective linked data quality classification The basic idea behind Linked Data is that its usefulness increases when it is more interlinked with other datasets. Tim Berners-Lee defined four main principles for publishing data that can ensure a certain level of uniformity reflecting directly data’s usability Make the data available on the Web: assign URIs to identify things. Make the data machine readable: use HTTP URIs so that looking up these names is easy. Use publishing standards: when the lookup is done provide useful information using standards like RDF. Link your data: include links to other resources to enable users to discover more things

Objective linked data quality classification Building on previous principles, we group the quality attributes into four main categories: Quality of the entities: quality indicators that focus on the data at the instance level. Quality of the dataset: quality indicators at the dataset level. Quality of the semantic model: quality indicators that focus on the semantic models, vocabularies and ontologies. Quality of the linking process: quality indicators that focus on the inbound and outbound links between datasets.

Modeling Quality Reusing existing ontologies is a common practice that Linked Data publishers are always trying to adopt. However, ontologies and vocabularies development is often a long error-prone process especially when many contributors are working consecutively or collaboratively. This can introduce deficiencies such as redundant concepts or conflicting relationships. Getting to choose the right ontology or vocabulary is vital to ensure modeling correctness and consistency.

Dataset Quality Considering the large amount of available datasets in the Linked Open Data, users have a hard time trying to identify appropriate datasets that suit certain tasks. The most adopted approaches are based on link assessment. Provenance-based approaches and entity-based approaches are also used to compute not only dataset rankings, but also rankings on the entity level.