Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,

Similar presentations


Presentation on theme: "Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,"— Presentation transcript:

1 Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute, The Open University {y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk

2 Focuses A quality model which characterizes quality problems in semantic metadata An automatic detection algorithm Experiments

3 Ontology Metadata Data

4 Semantic Metadata Generation Semantic Metadata Acquisition Semantic Metadata Repositories

5 Semantic Metadata Generation Semantic Metadata Acquisition Semantic Metadata Repositories A number of problems can happen that decrease the quality of metadata

6 Quality Evaluation Metadata providers: ensuring high quality Users: facilitate assessing the trustworthiness Applications: filtering out poor quality data

7 Our Quality Evaluation Framework A quality model Assessment metrics An automatic evaluation algorithm

8 The Quality Model Real World Semantic Metadata OntologiesData Sources Modelling Instantiating Annotating Representing Describing

9 Quality Problems (a) Incomplete Annotation Data ObjectsSemantic Entities

10 Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation

11 Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation

12 Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation

13 Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation (e) Inaccurate Annotation

14 Quality Problems (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation (e) Inaccurate Annotation Semantic metadata I1 I2 I3 R1 R2 Class C1 C2 C3 I4 R2 (f) Inconsistent Annotation

15 Current Support for Evaluation Gold standard based: –Examples: Gate[1], LA[2], BDM[3] Feature: assessing the performance of information extraction techniques used. Not suitable for evaluating semantic metadata –Gold standard annotations are often not available

16 The Semantic Metadata Acquisition Scenario KMi News Stories Information Extraction Engine (ESpotter) Semantic Data Transformation Engine Departmental Databases Raw Metadata High Quality Metadata Evaluation Evaluation needs to take place dynamically whenever a new entry is generated. In such context, gold standard is NOT available.

17 Our Approach Using available knowledge instead of asking for gold standard annotations –Knowledge sources specific for the domain: Domain ontologies, data repositories, domain specific lexicons –Knowledge available at background Semantic Web, Web, and general lexicon resources Advantages: –Making possible for automatic operation –Making possible for large scale data evaluation

18 Using Domain Knowledge 1. Domain Ontologies Constraints and restrictions Inconsistent Problems Example: one person classified as both KMi-Member and None-KMi-Member when they are disjoint classes.

19 Using Domain Knowledge 1. Domain Ontologies Constraints and restrictions Inconsistent Annotations 2. Domain Lexicons Lexicon – instance mappings Duplicate Annotations Example: when OU and Open-University both appear as values of the same property of the same instance

20 Using Domain Knowledge 1. Domain Ontologies Constraints and restrictions Inconsistent Annotations 2. Domain Lexicons Lexicon – instance mappings Duplicate Annotations 3. Domain Data Repositories Ambiguous Annotations Inaccurate Annotations

21 When nothing can be found in the domain knowledge, the data can be: –Correct but outside the domain (e.g., IBM in the KMi domain) –Inaccurate annotation: mis-classification (e.g., Sun Micro-systems as a person) –Spurious (e.g., workshop chair as an organization) Background knowledge is then used to further investigate the problems

22 Semantic Web Investigating the Semantic Web Classes Similar? Found matches No Yes Examining the Web No Inaccurate Annotations Watson WordNet Yes Adding data to the repositories

23 Pankow Web Examining the Web Similar? Has classification? No Yes No Inaccurate Annotations Spurious Annotations WordNet

24 The Overall Picture Web Semantic Web Background Knowledge Domain Knowledge Metadata Evaluation Results Ontologies Lexical Resources WordNet Web PANKOWWATSON Semantic Web SemSearch Step1: Using domain knowledge Step2: Using background knowledge Evaluation Engine Pellet + Reiter

25 (a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation (d) Spurious Annotation (e) Inaccurate Annotation Semantic metadata I1 I2 I3 R1 R2 Class C1 C2 C3 I4 R2 (f) Inconsistent Annotation Addressed Quality Problems

26 Experiments Data settings: gathered in our previous work [4] in KMi semantic web portal –Randomly chose 36 news stories from the KMi news archive –Collected a metadata set by using ASDI –Constructed a gold standard annotation Method: –A gold standard based evaluation as a comparison base line –Evaluating the data set using domain knowledge only –Evaluating the data set using both domain knowledge and background knowledge

27

28

29 A number of entities are not contained in the problem domain

30 Background knowledge is useful in data evaluation

31 Discussion The performance of such an approach largely depends on: –A good domain specific knowledge source –A good publicity of the entities that are contained in the data set, otherwise there would be lots of false alarms.

32 References 1.H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40 th Anniversary Meeting of the Association for Computational Linguistics (ACL02), 2002. 2.P. Cimiano, S. Staab, and J. Tane. Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, pages 10 – 17, 2003. 3.D. Maynard, W. Peters, and Y. Li. Metrics for Evaluation of Ontology-based Information Extraction. In Proceedings of the 4th International Workshop on Evaluation of Ontologies on the Web, Edinburgh, UK, May 2006. 4.Y. Lei, M. Sabou, V. Lopez, J. Zhu, V. S. Uren, and E. Motta. An Infrastructure for Acquiring High Quality Semantic Metadata. In Proceedings of the 3rd European Semantic Web Conference, 2006.


Download ppt "Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,"

Similar presentations


Ads by Google