Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,

Slides:

Advertisements

Similar presentations

…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…

Advertisements

Visualizing Scholarly Discourse in eScience Simon Buckingham Shum Knowledge Media Institute Open University, UK In collaboration with: John Domingue, Enrico.

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.

1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.

Modelling Data-Intensive Web Sites with OntoWeaver Knowledge Media Institute The Open University Yuangui Lei, Enrico Motta, John Domingue {y.lei, e.motta,

Seher Acer, Başak Çakar, Elif Demirli, Şadiye Kaptanoğlu.

Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.

WP8: User Centred Applications Enrico Motta, Marta Sabou, Vanessa Lopez, Laurian Gridinoc, Lucia Specia Knowledge Media Institute The Open University Milton.

The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering

Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.

Search Engines and Information Retrieval

CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.

Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.

Low-cost semantics-enhanced web browsing with Magpie Enrico Motta Knowledge Media Institute The Open University, UK.

NLDB 2004 ORAKEL: A Natural Language Interface to an F-Logic Knowledge Base Philipp Cimiano Institute AIFB University of Karlsruhe NLDB 2004.

Developing Semantic Web Sites: Results and Lessons Learnt Enrico Motta, Yuangui Lei, Martin Dzbor, Vanessa Lopez, John Domingue, Jianhan Zhu, Liliana Cabral,

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

By : Vanessa López, Enrico Motta Knowledge Media Institute. Open University Ontology-driven question answering in: AQUALog 9 th International Conference.

Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

Knowledge-Based NLP and the Semantic Web Sergei Nirenburg Institute for Language and Information Technologies University of Maryland Baltimore County Workshop.

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

Characterizing Semantic Web Applications Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK.

 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.

Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,

Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.

Search Engines and Information Retrieval Chapter 1.

June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.

Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,

Language Technology for the Semantic Web OntoWeb5,Florida,October 17 th,2003 WP12: Language Technology Overview SIG5 Paul Buitelaar.

Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.

Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.

Dynamic Hypermedia Generations through a Mediator using CRM and Web Service Jen-Shin Hong National ChiNan University,Taiwan

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

1 Technologies for (semi-) automatic metadata creation Diana Maynard.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented.

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Semantic Web services Interoperability for Geospatial decision.

Video Event Recognition Algorithm Assessment Evaluation Workshop VERAAE ETISEO – NICE, May Dr. Sadiye Guler Sadiye Guler - Northrop Grumman.

Towards an ecosystem of data and ontologies Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University.

Using TREC for cross-comparison between classic IR and ontology-based search models at a Web scale Miriam Fernández 1, Vanessa López 2, Marta Sabou 2,

Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.

Dimitrios Skoutas Alkis Simitsis

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

LREC 2008 Marrakech1 Clustering Related Terms with Definitions Scott Piao, John McNaught and Sophia Ananiadou

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.

Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.

Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.

Ontology based Information Extraction

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Presented By- Shahina Ferdous, Student ID – , Spring 2010.

Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.

Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.

2010 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT) Hierarchical Cost-sensitive Web Resource Acquisition.

Adaptive Faceted Browsing in Job Offers Danielle H. Lee

1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

OpenACS and.LRN Conference 2008 Automatic Limited-Choice and Completion Test Creation, Assessment and Feedback in modern Learning Processes Institute for.

Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham

Ontology Evolution: A Methodological Overview

Ontology-Based Information Integration Using INDUS System

[jws13] Evaluation of instance matching tools: The experience of OAEI

Hierarchical, Perceptron-like Learning for OBIE

Presentation transcript:

Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute, The Open University

Focuses A quality model which characterizes quality problems in semantic metadata An automatic detection algorithm Experiments

Ontology Metadata Data

Semantic Metadata Generation Semantic Metadata Acquisition Semantic Metadata Repositories

Semantic Metadata Generation Semantic Metadata Acquisition Semantic Metadata Repositories A number of problems can happen that decrease the quality of metadata

Quality Evaluation Metadata providers: ensuring high quality Users: facilitate assessing the trustworthiness Applications: filtering out poor quality data

Our Quality Evaluation Framework A quality model Assessment metrics An automatic evaluation algorithm

The Quality Model Real World Semantic Metadata OntologiesData Sources Modelling Instantiating Annotating Representing Describing

Quality Problems (a) Incomplete Annotation Data ObjectsSemantic Entities

Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation

Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation

Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation

Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation (e) Inaccurate Annotation

Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation (e) Inaccurate Annotation Semantic metadata I1 I2 I3 R1 R2 Class C1 C2 C3 I4 R2 (f) Inconsistent Annotation

Current Support for Evaluation Gold standard based: –Examples: Gate[1], LA[2], BDM[3] Feature: assessing the performance of information extraction techniques used. Not suitable for evaluating semantic metadata –Gold standard annotations are often not available

The Semantic Metadata Acquisition Scenario KMi News Stories Information Extraction Engine (ESpotter) Semantic Data Transformation Engine Departmental Databases Raw Metadata High Quality Metadata Evaluation Evaluation needs to take place dynamically whenever a new entry is generated. In such context, gold standard is NOT available.

Our Approach Using available knowledge instead of asking for gold standard annotations –Knowledge sources specific for the domain: Domain ontologies, data repositories, domain specific lexicons –Knowledge available at background Semantic Web, Web, and general lexicon resources Advantages: –Making possible for automatic operation –Making possible for large scale data evaluation

Using Domain Knowledge 1. Domain Ontologies Constraints and restrictions Inconsistent Problems Example: one person classified as both KMi-Member and None-KMi-Member when they are disjoint classes.

Using Domain Knowledge 1. Domain Ontologies Constraints and restrictions Inconsistent Annotations 2. Domain Lexicons Lexicon – instance mappings Duplicate Annotations Example: when OU and Open-University both appear as values of the same property of the same instance

Using Domain Knowledge 1. Domain Ontologies Constraints and restrictions Inconsistent Annotations 2. Domain Lexicons Lexicon – instance mappings Duplicate Annotations 3. Domain Data Repositories Ambiguous Annotations Inaccurate Annotations

When nothing can be found in the domain knowledge, the data can be: –Correct but outside the domain (e.g., IBM in the KMi domain) –Inaccurate annotation: mis-classification (e.g., Sun Micro-systems as a person) –Spurious (e.g., workshop chair as an organization) Background knowledge is then used to further investigate the problems

Semantic Web Investigating the Semantic Web Classes Similar? Found matches No Yes Examining the Web No Inaccurate Annotations Watson WordNet Yes Adding data to the repositories

Pankow Web Examining the Web Similar? Has classification? No Yes No Inaccurate Annotations Spurious Annotations WordNet

The Overall Picture Web Semantic Web Background Knowledge Domain Knowledge Metadata Evaluation Results Ontologies Lexical Resources WordNet Web PANKOWWATSON Semantic Web SemSearch Step1: Using domain knowledge Step2: Using background knowledge Evaluation Engine Pellet + Reiter

(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation (e) Inaccurate Annotation Semantic metadata I1 I2 I3 R1 R2 Class C1 C2 C3 I4 R2 (f) Inconsistent Annotation Addressed Quality Problems

Experiments Data settings: gathered in our previous work [4] in KMi semantic web portal –Randomly chose 36 news stories from the KMi news archive –Collected a metadata set by using ASDI –Constructed a gold standard annotation Method: –A gold standard based evaluation as a comparison base line –Evaluating the data set using domain knowledge only –Evaluating the data set using both domain knowledge and background knowledge

A number of entities are not contained in the problem domain

Background knowledge is useful in data evaluation

Discussion The performance of such an approach largely depends on: –A good domain specific knowledge source –A good publicity of the entities that are contained in the data set, otherwise there would be lots of false alarms.

References 1.H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40 th Anniversary Meeting of the Association for Computational Linguistics (ACL02), P. Cimiano, S. Staab, and J. Tane. Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, pages 10 – 17, D. Maynard, W. Peters, and Y. Li. Metrics for Evaluation of Ontology-based Information Extraction. In Proceedings of the 4th International Workshop on Evaluation of Ontologies on the Web, Edinburgh, UK, May Y. Lei, M. Sabou, V. Lopez, J. Zhu, V. S. Uren, and E. Motta. An Infrastructure for Acquiring High Quality Semantic Metadata. In Proceedings of the 3rd European Semantic Web Conference, 2006.