and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan

Slides:



Advertisements
Similar presentations
Alexander Kotov and ChengXiang Zhai University of Illinois at Urbana-Champaign.
Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Language Modeling Approaches for Information Retrieval Rong Jin.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Effective Query Formulation with Multiple Information Sources
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Mining fuzzy domain ontology based on concept Vector from wikipedia category network.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Evgeniy Gabrilovich and Shaul Markovitch
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.
Personalized Social Search Based on the User’s Social Network David Carmel et al. IBM Research Lab in Haifa, Israel CIKM’09 16 February 2011 Presentation.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Automatic Labeling of Multinomial Topic Models
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Improving the Classification of Unknown Documents by Concept Graph Morteza Mohagheghi Reza Soltanpour
Geographical Latent Variable Models for Microblog Retrieval Alexander Kotov 1,2 Vineeth Rakesh 2 Eugene Agichtein 3 Chandan K. Reddy 2 1 Textual Data Analytics.
A Study of Poisson Query Generation Model for Information Retrieval
Hui Fang (ACL 2008) presentation 2009/02/04 Rick Liu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
1 Integrating Term Relationships into Language Models for Information Retrieval Jian-Yun Nie RALI, Dept. IRO University of Montreal, Canada.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
A Formal Study of Information Retrieval Heuristics
Semantic Processing with Context Analysis
An Empirical Study of Learning to Rank for Entity Search
An Automatic Construction of Arabic Similarity Thesaurus
Compact Query Term Selection Using Topically Related Text
Applying Key Phrase Extraction to aid Invalidity Search
Wikitology Wikipedia as an Ontology
John Lafferty, Chengxiang Zhai School of Computer Science
Feature Selection for Ranking
Learning to Rank with Ties
Presentation transcript:

and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan Textual Data Analytics (TEANA) lab Saeid Balaneshinkordan saeid@wayne.edu Alexander Kotov kotov@wayne.edu ConceptNet, DBpedia and Freebase: ConceptNet 5 is the largest common sense knowledge base, which features diverse relational ontology of 20 relationship types. DBpedia is a structured version of Wikipedia in RDF format. Freebase, similar to DBpedia, provides descriptions of entities as RDF triplets, with a more comprehensive list of concepts in comparison to DBpedia. Problem Difficult queries: queries for which most (top) results are irrelevant (AP < 0.1). Some of the main causes: Vocabulary mismatch: searchers and authors of relevant documents use different terms to refer to the same concepts Partially specified and poorly formulated information needs Challenges: Query results can be improved through query expansion using explicit or pseudo-relevance feedback. However, RF is ineffective for difficult queries due to the absence of positive relevance signals in the initial retrieval results external resources (e.g. term graphs) can be utilized Research question: how do statistical association term graphs compare with term graphs derived from knowledge bases in terms of retrieval effectiveness for normal and difficult queries? Term association graphs Nodes are distinct words or phrases in the collection Weighted edges represent strength of semantic relatedness between words and phrases Can be constructed manually or automatically from the document collection using information-theoretic measures of term association, such Mutual Information (MI) or Hyperspace Analog to Language (HAL) Using term graphs for query LM expansion   Query expansion LM is constructed from the neighbors of query terms in the term graph:   HAL: edge weights in term graph are calculated using Hyperspace Analog to Language MI: edge weights in term graph are calculated using Mutual Information NEIGH: all neighbors of query terms are used in query expansion LM (Bai et al., CIKM’05) DB: term graph structure is derived from DBpedia 3.9 FB: term graph structure is derived from the last version of Freebase CNET: term graph structure is derived from ConceptNet 5 Results AQUAINT, ROBUST and GOV TREC collections are used in experiments KL-DIR: KL-divergence retrieval with Dirichlet prior smoothing TM: document LM expansion using translation model on MI term graph (Karimzadehgan and Zhai, SIGIR’10)13 Method MAP P@20 GMAP KL-DIR 0.2413 0.3460 0.1349 TM 0.2426 0.3488 0.1360 NEIGH-MI 0.2432 NEIGH-HAL 0.2431 0.3454 0.1333 DB-MI 0.2482 0.3524 0.1397 DB-HAL 0.3444 FB-MI 0.2452 0.3526 0.1232 FB-HAL 0.2476 0.3540 0.1261 CNET 0.3472 0.1407 CNET-MI 0.2495 0.3530 0.1459 CNET-HAL 0.2503 0.3528 0.1463 Method MAP P@20 GMAP KL-DIR 0.2333 0.0464 0.0539 TM 0.2399 0.0476 0.0551 NEIGH-MI 0.2415 0.0489 0.0518 NEIGH-HAL 0.2419 0.0456 DB-MI 0.2346 0.0467 0.0019 DB-HAL 0.2404 FB-MI 0.2420 0.0484 0.0573 FB-HAL 0.0565 CNET 0.2407 0.0584 CNET-MI 0.2416 0.0504 0.0587 CNET-HAL 0.2428 0.0516 0.0586 Method MAP P@20 GMAP KL-DIR 0.1943 0.3940 0.1305 TM 0.2033 0.3980 0.1339 NEIGH-MI 0.2031 0.3970 0.1326 NEIGH-HAL 0.1989 0.3900 0.1319 DB-MI 0.2073 0.4160 0.1468 DB-HAL 0.2059 0.4080 0.1411 FB-MI 0.2055 0.3990 0.1336 FB-HAL 0.2056 0.3960 0.1384 CNET 0.2051 0.1388 CNET-MI 0.2042 0.3920 0.1371 CNET-HAL 0.2058 Performance on AQUAINT for all queries Performance on ROBUST for all queries Performance on GOV for all queries Method MAP P@20 GMAP KL-DIR 0.0311 0.0281 0.0140 TM 0.0343 0.0304 0.0146 NEIGH-MI 0.0333 0.0307 0.0130 NEIGH-HAL 0.0425 0.0293 0.0122 DB-MI 0.0312 0.0285 0.0136 DB-HAL 0.0306 0.0274 0.0134 FB-MI 0.0350 0.0319 0.0154 FB-HAL 0.0339 0.0152 CNET 0.0407 0.0172 CNET-MI 0.0427 0.0367 0.0176 CNET-HAL 0.0453 0.0385 0.0181 Method MAP P@20 GMAP KL-DIR 0.0474 0.1250 0.0386 TM 0.0478 NEIGH-MI 0.0476 0.1375 0.0393 NEIGH-HAL 0.1500 0.0378 DB-MI 0.0528 0.1906 0.0452 DB-HAL 0.0544 0.1538 0.0455 FB-MI 0.0534 0.1333 0.0437 FB-HAL 0.0564 0.1444 0.0471 CNET 0.0504 0.1219 0.0440 CNET-MI 0.0496 0.1156 0.0422 CNET-HAL 0.0502 0.0436 Method MAP P@20 GMAP KL-DIR 0.0410 0.1290 0.0261 TM 0.0458 0.0267 NEIGH-MI 0.0429 0.1323 0.0273 NEIGH-HAL 0.0419 0.1260 0.0265 DB-MI 0.0503 0.1449 0.0301 DB-HAL 0.0474 0.1437 FB-MI 0.0381 0.1222 0.0200 FB-HAL 0.0393 0.1272 0.0211 CNET 0.0559 0.1487 0.0334 CNET-MI 0.0560 0.0326 CNET-HAL 0.0558 0.1475 0.0323 Performance on AQUAINT for difficult queries Performance on ROBUST for difficult queries Performance on GOV for difficult queries Conclusions Query expansion using different types of term graphs behaves differently depending on the collection: using knowledge graphs is more effective than using collection terms association graphs for newswire datasets on both regular and difficult queries. However, on Web collections, term association graphs have better (for all queries) or comparable performance (for difficult queries) with statistical term association graphs. ConceptNet-based term graphs outperformed DBpedia and Freebase -based ones on 2 out of 3 experimental collections, which indicates the importance of using commonsense knowledge repositories in addition to the ones derived from encyclopedia