Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
April 24, 2007McGuinness NIST Interoperability Week Ontology Summit Semantic Web Perspective Deborah L. McGuinness Acting Director & Senior Research Scientist.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
Oyster, Edinburgh, May 2006 AIFB OYSTER - Sharing and Re-using Ontologies in a Peer-to-Peer Community Raul Palma 2, Peter Haase 1 1) Institute AIFB, University.
Searching and Ranking Ontologies on the Semantic Web Edward Thomas (Aberdeen) Harith Alani (Southampton) Derek Sleeman (Aberdeen) Christopher Brewster.
TU e technische universiteit eindhoven / department of mathematics and computer science Modeling User Input and Hypermedia Dynamics in Hera Databases and.
WP8: User Centred Applications Enrico Motta, Marta Sabou, Vanessa Lopez, Laurian Gridinoc, Lucia Specia Knowledge Media Institute The Open University Milton.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Semantic Web Thanks to folks at LAIT lab Sources include :
RDF Tutorial.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Using Watson for Building Intelligent Applications in E-learning Mathieu d’Aquin The Knowledge Media Institute, The Open University
Using the Semantic Web Mathieu d’Aquin Knowledge Media Institute, the Open University
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Ontology Notes are from:
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Exploiting the Semantic Web: Next Generation Semantic Web Applications in KMi Watson, PowerMagpie, PowerAqua, … Mathieu d’Aquin Laurian Gridinoc Vanessa.
Watson Supporting Next Generation Semantic Web Applications Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Marta Sabou, Sofia Angeletou, Enrico.
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Clément Troprès - Damien Coppéré1 Semantic Web Based on: -The semantic web -Ontologies Come of Age.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Towards an ecosystem of data and ontologies Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Ontology Repositories: Discussions and Perspectives Mathieu d’Aquin KMi, the Open University, UK
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
-KHUSHBOO BAGHADIYA.  Introduction  System Description  iCAT in use  Evolution of the system  Evolution of modeling  Evolution of features  Evolution.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
OWL & Protege Introduction Dongfang Xu Ph.D student, School of Information, University of Arizona Sept 10, 2015.
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
And the Watson Plugin for the NeOn Toolkit. IST NeOn-project.org The Semantic Web is growing… #SW Pages.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.
© 2011 Pearson Education, Inc. All rights reserved. This multimedia product and its contents are protected under copyright law. The following are prohibited.
Of 24 lecture 11: ontology – mediation, merging & aligning.
CS276B Text Information Retrieval, Mining, and Exploitation Practical 1 Jan 14, 2003.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Ontology Evolution: A Methodological Overview
Analyzing and Securing Social Networks
ece 720 intelligent web: ontology and beyond
NJVR: The NanJing Vocabulary Repository
Exploring Scholarly Data with Rexplore
Web archive data and researchers’ needs: how might we meet them?
A Snapshot of the OWL Web
Semantic Markup for Semantic Web Tools:
Classifications and Linked Open Data Formalizing the structure and content of statistical classifications Item 9.1 Standards Working Group Luxembourg,
Presentation transcript:

Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta The Knowledge Media Institute, The Open University

The Semantic Web is Growing Lee, J., Goodwin, R. (2004) The Semantic Webscape: a View of the Semantic Web. IBM Research Report.

The Semantic Web is growing…

Next Generation Semantic Web applications Exploiting the Semantic Web rather than engineering their own knowledge/ontologies Need for a Gateway to the Semantic Web

Watson: a Gateway to the Semantic Web

More on Watson? See also… –Watson Web Interface: –Watson poster and demo at ISWC 2007…

Characterizing Knowledge in Watson?  Beside being a gateway for applications, Watson gives the opportunity to better understand:  How semantic technologies are used to published knowledge online  How knowledge is structured on the Semantic Web  How ontologies and semantic documents are interconnected in a semantic network  through an analysis of its repository.  Such an analysis provides valuable information for application and tool developers concerning the knowledge they have to manipulate.

The Watson Collection  Collecting Semantic Content:  A number of specialized crawlers for Google, ontology repositories (e.g. Swoogle), PingTheSemanticWeb, etc.  Validated by parsing with Jena, to get only RDF documents  Filters:  Before filtering, the repository was composed almost entirely of RSS and FOAF (more than 5 times the number of other documents)  Therefore, the analysis would have been more an analysis of RSS and FOAF than anything else.  These have been filtered out.  An analysis of the FOAF part of the repository separately would be interesting.

The Watson Collection Result: almost 25,500 semantic documents

The Watson Collection  In order to index these documents, Watson extracts information about them.  Information about the content: classes, properties and individuals, the relations between them, the coverage in terms of domain topics, etc.  Information about the representation: the language used and its expressivity, the size and structure of the document, etc.  Information about the network aspects of semantic documents: identification, links between documents, etc.  It is these elements of information that we intend to analyse.  Note that all these elements of information are freely available through the Watson API.

In the Following Measures on the following aspects: 1.Usage of semantic technologies to publish knowledge on the Web 2.Structure and coverage of semantic documents 3.The knowledge network Focusing more on the most “debatable” elements.

Semantic Web languages… The majority is factual data in RDF OWL adopted as ontology language Less overlap between OWL and RDF-S than between DAML+OIL and RDFS: –better separation of the meta- models in OWL Here a document is considered in a given language if it instantiates an entity of the language - e.g. it is in OWL and RDF-S if it contains an owl:property and an rdfs:class for example

… and their expressivity Apparent contradiction: –Most of the documents are in OWL FULL –But 95% use only a very restricted part of the expressive power of OWL (below OWL Lite)  OWL Full because of simple syntactic mistakes

Size of the documents Number of classes Documents Number of instances Like for expressivity, a power law distribution: lots of very small document and a few very large ones (both for ontological knowledge and factual data, but on different scales)

Density of the representation In average, classes are: 1.Poorly defined (small number of properties and super-classes per class) 2.Highly instantiated (high number of instances per class) Even the best represented class in each ontology only have 1 property in avg.

Topic Domain Coverage Level of coverage of ontologies for the top categories in DMOZ (details in the paper) Very heterogeneous distribution Not well correlated with the one of the Web

Identification of semantic document Participates to the networked and distributed aspects of the Semantic Web URI are unique identifiers, but when applied to ontologies, they may be duplicated: –Default URI of the ontology editor (Protégé) –Misuse of the URI of existing vocabularies (OWL) –Different versions of an ontology having the same URI Also, it is a good practice for URIs to be dereferenceable, but only 30% of the semantic documents can be reached through their URI.

Connectedness and Redundancy Connectedness and redundancy are both important aspects of distributed systems. Connectedness: –A few large providers (W3.org, Stanford) and a few locally dense networks (Ontoworld) –Otherwise, very local ontologies Redundancy: –Almost 30% of the semantic documents are duplicates –12% of the entities are described more than once  A better support of the network aspects of ontologies is required.

Conclusion Our analysis allows to draw some conclusions about some of the characteristics of the knowledge published online. In particular, it shows that –Semantic Web documents tend to be small, lightweight and weakly structured –Efforts are still required to publish knowledge in a variety of domains –The network aspects are not taken enough into consideration in semantic technologies These constitute valuable information for tools and applications developers.

Limitations This work can be seen as a first step towards a fine grained characterization of the Semantic Web. But in its current state, it suffers from a number of limitations: –Only a sample of the Semantic Web –A snapshot of the current dataset. Should consider evolution –Simple analysis methods. Would data mining approaches be relevant? –The analyzed aspects are insufficient to fully capture the quality of the knowledge available online

Comment, suggest, question… A last word… We believe that the field of evaluation of ontologies and ontology based tools could provide valuable inputs to this study, so please: Watson is an open system, our data is available through the Watson API.