Download presentation
Presentation is loading. Please wait.
Published byFlora Rose Modified over 9 years ago
1
Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006 Rapha ë l Troncy, Umberto Straccia
2
Motivation Various SW repositories, using different vocabularies, distributed on the web Already large amounts of data out there Swoogle hits 1.5M unique Semantic Web documents (05/06/2006) Problem: How to search and retrieve information in such an environment?
3
Example scenario Kim Clijsters (courtesy of AFP) Montenegro independence (courtesy of Euronews) NewsML SportsML EventsML iCalendar TimeML... EXIF EBU P/Meta MXF MPEG-7
4
Distributed Search in the SW: Resource selection Select a subset of some relevant resources Query reformulation Reformulate the information need into the vocabulary used by the resource Data fusion and rank aggregation Merged and ranked all the results together
5
Resource Selection Compute an approximation of the content of each resources For some random queries, an approximation consists of: The ontology the resource relies on Some instances (sampling annotated documents)
6
Query reformulation Transformation rules: From the query vocabulary to the vocabularies used by the resources Semantic Web: Ontology Alignment Establishing relationships holding between the entities (subsumption, equivalence, disjointness…) With a confidence measure Automatically computed an ontology alignment framework
7
oMAP: Ontology Alignment Tool
8
oMAP: A Formal Framework Sources of inspiration: Formal work in data exchange [Fagin et al., 2003] GLUE: combining several specialized components for finding the best set of mappings [Doan et al., 2003] Notation: A mapping is a triple: M = (T, S, ∑ ) S and T are the source and target ontologies S i is an OWL entity (class, datatype property, object property) of the ontology ∑ is a set of mapping rules: α ij T j ← S i
9
oMAP: Combining Classifiers Weight of a mapping rule: α ij = w (S i,T j, ∑ ) Using different classifiers: w (S i,T j, CL k ) is the classifier's approximation of the rule T j ← S i Combining the approximations: Use of a priority list: CL 1 CL 2 … CL n Weighted average of the classifiers prediction
10
Terminological Classifiers Same entity names (or URI) Same entity name stems
11
Terminological Classifiers String distance name Iterative substring matching See [Stoilos et al., ISWC'05]
12
Terminological Classifiers WordNet distance name lcs is the longest common substring between S i and T j sim =
13
Machine Learning-Based Classifiers Collecting bag of words: label for the named individuals data value for the datatype properties type for the anonymous individuals and the range of object properties … Recursion on the OWL definition: depth parameter Use statistical methods on the collected bag of words
14
Machine Learning-Based Classifiers Example Individual (x 1 type (Conference) value (label "European SW Conf") value (location x 2 ) ) Individual (x 2 type (Address) value (city "Budva") value (country "Montenegro") ) u1 = (" European SW Conf ", "Address") u2 = ("Address", "Budva", "Montenegro") Naïve Bayes text classifier kNN text classifier
15
Structural and Semantics-Based Classifier ∑ is a set of mapping rules: α ij T j ← S i ∑ sets are computed by taking the OWL definition of the entities to align recursively in the OWL structure... without looping thanks to cycles detection
16
Structural and Semantics-Based Classifier If S i and T j are property names: If S i and T j are concept names 1 : 1 Where D = D(S i ) * D(T j ) ; D(S i ) represents the set of concepts directly parent of S i
17
Structural and Semantics-Based Classifier Let C S =(QR.C) and D T =(Q’R’.D), then 1 : Let C S =(op C 1 …C m ) and D T =(op’ D 1 …D m ), then 2 : 1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions 2 Where op, op’ are concept constructors and n,m ≥ 1
18
Evaluation OAEI Contests (2004, 2005, 2006): http://oaei.ontologymatching.org/ http://oaei.ontologymatching.org/ Systematic benchmark tests on bibliographic data Tests 2xx: aligning an ontology with variations of itself where each OWL constructs are discarded or modified one per one Tests 3xx: four real bibliographic ontologies Web categories alignment http://oaei.ontologymatching.org/2005/results/
19
Benchmark Tests dublin200.92 Falcon0.91 FOAM0.90 oMAP0.85 CMS0.81 OLA0.80 ctxMatch0.72 edna0.45 Falcon0.89 OLA0.74 dublin200.72 FOAM0.69 oMAP0.68 edna0.61 ctxMatch0.20 CMS0.18 oMAP: 4 th with the global F-Measure 1 st on 3xx tests (real ontologies to align) Precision Recall
20
Aligning Web Categories Aligning Google, Loksmart and Yahoo web categories [Avesani et al., ISWC'05] Blind tests: only recall results are available ctxMatchFOAMCMSDublin20FalconOLAoMAP 9.4%11.9%14.1%26.5%31.2%32.0%34.4%
21
Distributed Search in the SW Q: retrieve course material dealing with history of the Americas and "Columbus" query(d)<- History_Americas(d,"Columbus") is re-written as two queries 0.63 query(d)<- Latin_American_History(d,"Columbus") 0.84 query(d)<- American_History(d,"Columbus") Each document score is then multiplied with the confidence score of the rule. university courses
22
Conclusion Distributed Search in the SW resource selection / query reformulation / data fusion and rank aggregation oMAP: a formal framework for aligning automatically OWL ontologies Combining several specific classifiers Terminological classifiers Machine learning-based classifiers Structural and semantics-based classifier
23
Future Work Implementing the three steps proposed Keyword-based or structured (SPARQL) queries Ranked list of results oMAP Using additional classifiers: KL-distance, other resources, background K, etc. Straightforward theoretically but practically difficult! Finding complex alignment name = firstName + lastName OWL and rule-based languages: Take into account this additional expressivity
24
http://www.cwi.nl/~troncy/oMAP/ Any questions ?
26
Structural and Semantics-Based Classifier Possible values for w op and w Q weights w op w Q ⊓⊔ ¬ ⊓ 11/40 ⊔ 10 ¬1 1 1 n n m11/3 m 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.