© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas Eigner Competence Center Semantic Web & Language Technology Lab DFKI GmbH Saarbrücken, Germany
© Paul Buitelaar – November 2007, Busan, South-Korea Overview Ontology Search Knowledge reuse (integration with Ontology Learning) OntoSelect Browse (ontologies, labels, classes, properties) Search by topic Evaluating Ontology Search Benchmark (evaluation) data set Experiment (compare SWOOGLE, OntoSelect) Conclusions
© Paul Buitelaar – November 2007, Busan, South-Korea Ontology Search There are more and more ontologies published on the (Semantic) Web Available as RDFS or OWL files (also still DAML) Opens up possibilities for reuse of knowledge Access through ontology search engines and/or (manual/automatic) organization in ontology libraries But: increasingly harder to find the right one for your application Increasing research in ontology search/selection (Alani et al., Buitelaar et al., Ding et al., Sabou et al.) – SWOOGLE, OntoSelect, Watson
© Paul Buitelaar – November 2007, Busan, South-Korea OntoSelect Ontology Library and Search Engine Monitors the web for ontologies with automatic harvesting and indexing Browse and search On ontologies, classes, properties and (multilingual) labels Ontology search integrates relevance feedback over Wikipedia for search term Ontology publishing Submit ontologies - will be automatically integrated Statistics On formats, languages, labels used, ontology publishing Paul Buitelaar, Thomas Eigner, Thierry Declerck OntoSelect: A Dynamic Ontology Library with Support for Ontology Selection In: Proc. of the Demo Session at the International Semantic Web Conference, Hiroshima, Japan, Nov
© Paul Buitelaar – November 2007, Busan, South-Korea OntoSelect – Browse
© Paul Buitelaar – November 2007, Busan, South-Korea Ontology Search
© Paul Buitelaar – November 2007, Busan, South-Korea Keyword as Wikipedia Topic
© Paul Buitelaar – November 2007, Busan, South-Korea Keyword Expansion (Extraction) Relevance Feedback from Wikipedia
© Paul Buitelaar – November 2007, Busan, South-Korea Ranked Results (Browsable)
© Paul Buitelaar – November 2007, Busan, South-Korea Search Criteria Relevance criteria address ontology content, structure, status: Coverage - Term Matching How many of the terms in a text collection are covered by labels for classes and properties? Structure - Properties Relative to Classes How detailed is the knowledge structure that the ontology represents? Connectedness - Number of Included Ontologies Is the ontology connected to other ontologies and how well established are these?
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluation – Benchmark Benchmark: 15 Wikipedia topics and 57 manually assigned ontologies out of 1056 cached through OntoSelect 15 Wikipedia topics were selected out of the set of all (37284) class/property labels in OntoSelect, by: Filtering out labels that did not correspond to a Wikipedia page > 5658 labels / topics 5658 labels were used as search terms in SWOOGLE to filter out labels that returned less than 10 ontologies (out of the 1056 in OntoSelect) > 3084 labels / topics Out of 3084 labels we manually selected useful topics, e.g. we left out very short labels (‘v’) and very abstract ones (‘thing’) > 50 topics We randomly selected 15 for which we manually checked the ontologies retrieved from OntoSelect and SWOOGLE > 15 topics with 57 assigned ontologies
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluation – Benchmark by Topic 15 (Wikipedia) topics with number of assigned ontologies: Atmosphere (2) Biology (11) City (3) katyn/CMSC828y/location.daml Communication (10) Economy (1) Infrastructure (2) Institution (1) Math (3) Military (5) Newspaper (2) Oil (0) Production (1) Publication (6) Railroad (1) Tourism (9)
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluation – Experiment Comparison of (average) results between SWOOGLE and OntoSelect Use OntoSelect benchmark 15 topics (queries) 57 assigned ontologies (relevance assessments) 1056 ontologies (data set) Use different configurations for OntoSelect With/without keyword expansion/extraction With/without class names (in addition to labels) With/without property labels Weighting of relevance criteria …
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluation – Results
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluation – Weighting of ‚title‘
© Paul Buitelaar – November 2007, Busan, South-Korea Conclusions Conclusions on evaluation are too early Many more configurations (weights) to compare Extend the benchmark Comparison with other ontology search engines Main contribution of the presented work First comprehensive benchmark for topic-driven evaluation of ontology search (Extended) Benchmark will be made publicly available