Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Wednesday 14 th of June, 2006.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Learning to Map between Ontologies on the Semantic Web AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy Databases and Data Mining group University.
Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
Ameet N Chitnis, Abir Qasem and Jeff Heflin 11 November 2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Scalable Text Mining with Sparse Generative Models
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
A Motivating Scenario for Designing an Extensible Audio- Visual Description Language Monday 25 th of October, 2004 Raphaël Troncy, Jean Carrive, Steffen.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
12th of October, 2006KEG seminar1 Combining Ontology Mapping Methods Using Bayesian Networks Ontology Alignment Evaluation Initiative 'Conference'
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
A Panoramic Approach to Integrated Evaluation of Ontologies in the Semantic Web S. Dasgupta, D. Dinakarpandian, Y. Lee School of Computing and Engineering.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Dimitrios Skoutas Alkis Simitsis
Chapter 6: Information Retrieval and Web Search
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Aligner automatiquement des ontologies avec Tuesday 23 rd of January, 2007 Rapha ë l Troncy.
ISWC2007, Nov. 14. Discovering simple mappings between Relational database schemas and ontologies Wei Hu, Yuzhong Qu {whu,
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Class Imbalance in Text Classification
Text Based Similarity Metrics and Delta for Semantic Web Graphs Krishnamurthy Koduvayur Viswanathan Monday, June 28,
AIFB Ontology Mapping I3CON Workshop PerMIS August 24-26, 2004 Washington D.C., USA Marc Ehrig Institute AIFB, University of Karlsruhe.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
ISO TC37/SC4 N435 Nov 12, 2007 Presented by Miran Choi/ETRI Written by Jae Sung Lee/Chungbuk National Univ.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
A Reduced Yet Extensible Audio- Visual Description Language: How to Escape From The MPEG-7 Bottleneck Thursday 28 th of October, 2004 Raphaël Troncy, Jean.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Cross-Ontological Relationships
Introduction to Unified Modeling Language (UML)
NJVR: The NanJing Vocabulary Repository
Presentation 王睿.
Property consolidation for entity browsing
[jws13] Evaluation of instance matching tools: The experience of OAEI
An Interactive Approach to Collectively Resolving URI Coreference
Searching with context
Integrating Taxonomies
Semantic Resolution in a Simple E-Commerce Application
Presentation transcript:

Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Wednesday 14 th of June, 2006 Rapha ë l Troncy, Umberto Straccia

Motivation Various SW repositories, using different vocabularies, distributed on the web Already large amounts of data out there Swoogle hits 1.5M unique Semantic Web documents (05/06/2006) Problem: How to search and retrieve information in such an environment?

Example scenario Kim Clijsters (courtesy of AFP) Montenegro independence (courtesy of Euronews) NewsML SportsML EventsML iCalendar TimeML... EXIF EBU P/Meta MXF MPEG-7

Distributed Search in the SW: Resource selection Select a subset of some relevant resources Query reformulation Reformulate the information need into the vocabulary used by the resource Data fusion and rank aggregation Merged and ranked all the results together

Resource Selection Compute an approximation of the content of each resources For some random queries, an approximation consists of: The ontology the resource relies on Some instances (sampling annotated documents)

Query reformulation Transformation rules: From the query vocabulary to the vocabularies used by the resources Semantic Web: Ontology Alignment Establishing relationships holding between the entities (subsumption, equivalence, disjointness…) With a confidence measure Automatically computed  an ontology alignment framework

oMAP: Ontology Alignment Tool

oMAP: A Formal Framework Sources of inspiration: Formal work in data exchange [Fagin et al., 2003] GLUE: combining several specialized components for finding the best set of mappings [Doan et al., 2003] Notation: A mapping is a triple: M = (T, S, ∑ )  S and T are the source and target ontologies  S i is an OWL entity (class, datatype property, object property) of the ontology  ∑ is a set of mapping rules: α ij T j ← S i

oMAP: Combining Classifiers Weight of a mapping rule: α ij = w (S i,T j, ∑ ) Using different classifiers: w (S i,T j, CL k ) is the classifier's approximation of the rule T j ← S i Combining the approximations: Use of a priority list: CL 1 CL 2 … CL n Weighted average of the classifiers prediction

Terminological Classifiers Same entity names (or URI) Same entity name stems

Terminological Classifiers String distance name Iterative substring matching See [Stoilos et al., ISWC'05]

Terminological Classifiers WordNet distance name lcs is the longest common substring between S i and T j sim =

Machine Learning-Based Classifiers Collecting bag of words: label for the named individuals data value for the datatype properties type for the anonymous individuals and the range of object properties … Recursion on the OWL definition: depth parameter Use statistical methods on the collected bag of words

Machine Learning-Based Classifiers Example Individual (x 1 type (Conference) value (label "European SW Conf") value (location x 2 ) ) Individual (x 2 type (Address) value (city "Budva") value (country "Montenegro") ) u1 = (" European SW Conf ", "Address") u2 = ("Address", "Budva", "Montenegro") Naïve Bayes text classifier kNN text classifier

Structural and Semantics-Based Classifier ∑ is a set of mapping rules: α ij T j ← S i ∑ sets are computed by taking the OWL definition of the entities to align recursively in the OWL structure... without looping thanks to cycles detection

Structural and Semantics-Based Classifier If S i and T j are property names: If S i and T j are concept names 1 : 1 Where D = D(S i ) * D(T j ) ; D(S i ) represents the set of concepts directly parent of S i

Structural and Semantics-Based Classifier Let C S =(QR.C) and D T =(Q’R’.D), then 1 : Let C S =(op C 1 …C m ) and D T =(op’ D 1 …D m ), then 2 : 1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions 2 Where op, op’ are concept constructors and n,m ≥ 1

Evaluation OAEI Contests (2004, 2005, 2006): Systematic benchmark tests on bibliographic data  Tests 2xx: aligning an ontology with variations of itself where each OWL constructs are discarded or modified one per one  Tests 3xx: four real bibliographic ontologies Web categories alignment

Benchmark Tests dublin Falcon0.91 FOAM0.90 oMAP0.85 CMS0.81 OLA0.80 ctxMatch0.72 edna0.45 Falcon0.89 OLA0.74 dublin FOAM0.69 oMAP0.68 edna0.61 ctxMatch0.20 CMS0.18 oMAP: 4 th with the global F-Measure 1 st on 3xx tests (real ontologies to align) Precision Recall

Aligning Web Categories Aligning Google, Loksmart and Yahoo web categories [Avesani et al., ISWC'05] Blind tests: only recall results are available ctxMatchFOAMCMSDublin20FalconOLAoMAP 9.4%11.9%14.1%26.5%31.2%32.0%34.4%

Distributed Search in the SW Q: retrieve course material dealing with history of the Americas and "Columbus" query(d)<- History_Americas(d,"Columbus") is re-written as two queries 0.63 query(d)<- Latin_American_History(d,"Columbus") 0.84 query(d)<- American_History(d,"Columbus") Each document score is then multiplied with the confidence score of the rule. university courses

Conclusion Distributed Search in the SW resource selection / query reformulation / data fusion and rank aggregation oMAP: a formal framework for aligning automatically OWL ontologies Combining several specific classifiers Terminological classifiers Machine learning-based classifiers Structural and semantics-based classifier

Future Work Implementing the three steps proposed Keyword-based or structured (SPARQL) queries Ranked list of results oMAP Using additional classifiers:  KL-distance, other resources, background K, etc.  Straightforward theoretically but practically difficult! Finding complex alignment  name = firstName + lastName OWL and rule-based languages:  Take into account this additional expressivity

Any questions ?

Structural and Semantics-Based Classifier Possible values for w op and w Q weights w op w Q ⊓⊔ ¬ ⊓ 11/40 ⊔ 10 ¬1   1  1  n  n  m11/3  m 1