AT, Anja Theobald University of the Saarland, Germany An Ontology for Domain-oriented Semantic Similarity Search On XML Data (BTW) February 25 – 28, 2003 Leipzig, Germany
AT, Motivation Query on Web Data: Ranking based on content data and structure (XML,…) Grouping results by their topics Using Ontologies for similarity search movie astronomy sports
AT, Outline 5. Similarity of Ontology Nodes 2. Ontologies - a Linguistic Challenge 3. Graph-based Ontology 4. Quantification: Edge Weights 6. Ontology-based Query Processing 0. Why we need Ranked Retrieval and Ontologies? 1. XXL Search Engine
AT, XXL Search Engine Visual XXL WWW Crawler Path Indexer Content Indexer Name Ontology Indexer Content Ontology Indexer EPI ECI NOI COI Query Processor EPI Handler ECI Handler Name Ontology Handler Content Ontology Handler XXL Query: SELECT * FROM INDEX WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ … XML Document sun …light and heat… … … …
AT, Ontologies – a linguistic challenge ontology:...representational vocabulary of words including hier- archical relationships and associative relationships between these words [Gruber93]... symbolized stands for refers to sense:...a celestial body of hot gases... word: star object:
AT, Word – Sense – Synset synset(s) = { w | (w,s) U} U = {(w,s) | w Σ *, s S: word w has sense s} words w Σ* + word senses + synonym relationship
AT, Disambiguation: Synset – Category synset(s) = { w | (w,s) U}// U = {(w,s) | word w has sense s} + hypernym relationship sense s: synset(s): sense 1: (astronomy) a celestial body of hot gases… star sense 4: a plane figure with 5 or more points… star category(s) = { synset(s‘) | synset(s‘) is hypernym of synset(s)} celestial body, heavenly body natural object object, physical object entity, physical thing plane figure, 2-dim. figure figure abstraction shape, form attribute
AT, Disambiguation: Synset – Category sense s: synset(s): entity, physical thing object, physical object natural object celestial body, heavenly body sense 1: (astronomy) a celestial body of hot gases… star synset(s) = { w | (w,s) U}// U = {(w,s) | word w has sense s} + hypernym relationship plane figure, 2-dim. figure figure abstraction shape, form attribute sense 4: a plane figure with 5 or more points… star category(s) = { synset(s‘) | synset(s‘) is hypernym of synset(s)}
AT, Example Ontology entity, physical thing [entity, physical thing] food [substance, matter] milk [foodstuff,...] cows‘milk [milk] group, grouping [group, grouping] galaxy,... [collection,...] milky way [galaxy,...] natural object [object,...] sun [star] universe, cosmos [collection,...] star [celestial body,...] Beta Centauri [star] [0. 71] abstraction [abstraction] star [plane figure, 2-dim figure] hexagram [star] [0.83] [0.94]
AT, Graph-based Ontology Ontology G=(V,E) x = (synset(s), category(s)) V e = (x,y, type, weight) E Construction: Use: word:... extracted from a document category, type:... extracted from an existing thesaurus (interchangable!!!) weight:... expresses semantic similarity of connected words sim:... expresses semantic similarity of ontology nodes
AT, Quantification: Edge Weight semantic similarity of connected synsets according to their concepts vector space measures / probabilistic measures galaxy, extragalactic nebula [collection,aggregation,accumulation,assemblage] star [celestial body,heavenly body] sun [star] DICE coefficient:…using web search engines for word frequencies… Y := (cel heav) (star) X := (coll … ass) (galaxy extr…) X Y := X Y [0.172] [0.113]
AT, Similarity of Ontology Nodes entity [entity] cows‘ milk [milk] milk [liquid] protein [macromolecule] group [group] galaxy [collection] milky way [galaxy] sun [star] universe [collection] star [celestial body] Beta Centauri [star] [0.2] natural object [object] [0.6][0.5] [0.8] [0.1] [0.3] [0.6] sim(milky way, sun) |p|=3: 3/ / /3 0.8 = 1.2
AT, Similarity of Ontology Nodes entity [entity] cows‘ milk [milk] milk [liquid] protein [macromolecule] group [group] galaxy [collection] milky way [galaxy] sun [star] universe [collection] star [celestial body] Beta Centauri [star] [0.2] natural object [object] [0.6][0.5] [0.8] [0.1] [0.3] [0.6] sim(milky way, sun) |p|=3: 3/ / /3 0.8 = 1.23/ / /3 0.6 = 1.3
AT, Similarity of Ontology Nodes entity [entity] cows‘ milk [milk] milk [liquid] protein [macromolecule] group [group] galaxy [collection] milky way [galaxy] sun [star] universe [collection] star [celestial body] Beta Centauri [star] [0.2] natural object [object] [0.6][0.5] [0.8] [0.1] [0.3] [0.6] sim(milky way, sun) |p|=3: 3/ / /3 0.8 = 1.23/ / /3 0.6 = 1.3 sim(milky way, sun) = 0.42 sim(milky way, cows‘ milk) = 0.2
AT, Ontology-based Query Processing XXL Query:... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ XXL Query Representation: ~universe ~appearance % % ~ “star” XML Documents: … sun …light and heat… … … …
AT, Ontology-based Query Processing XXL Query:... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ sim(universe, galaxy) sim(star, sun) * tfidf (sun) 0.43 XXL Query Representation: ~universe ~appearance % % ~ “star” 1.0 sim(app, app) 1.0 XML Data Graph: galaxy object “…light and heat…” description sun appearance location history
AT, Ontology-based Query Processing XXL Query:... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ sim(universe, galaxy) sim(star, sun) * tfidf (sun) 0.43 XXL Query Representation: ~universe ~appearance % % ~ “star” 1.0 sim(app, app) 1.0 XML Data Graph: galaxy object “…light and heat…” description sun appearance location history (result graph) = 0.4
AT, ENDE - Vielen Dank! Gibt es etwa noch Fragen?