Download presentation
Presentation is loading. Please wait.
Published byStewart Berry Modified over 8 years ago
1
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 2010.05.04 전자전기컴퓨터공학 부 USN 연구실 G201049005 김 은 환
2
Ubiquitous Sensor Network Lab 1/14 서울시립대 University Of Seoul Electrical and Computer Engineering Contents Introduction Query Categorization Wikipedia Data Set Query Categorization Query Expansion Methods Relevance model Strategy for Entity/Ambiguous Queries Field Evidence for Query Expansion Experiments Experiments Settings Baselines Using Entity Pages for Relevance Feedback Field Based Expansion Conclusion
3
Ubiquitous Sensor Network Lab 2/14 서울시립대 University Of Seoul Electrical and Computer Engineering Introduction The aim of this study is to explore the possible utility of Wikipedia as a resource improving for IR in PRF For a long time query expansion has been a focus for researchers relevant irrelevant Supervised method and Unsupervised method 3 types query Query about a specific entity (EQ) Ambiguous query (AQ) Broader query (BQ) It has the potential to enhance IR effectiveness
4
Ubiquitous Sensor Network Lab 3/14 서울시립대 University Of Seoul Electrical and Computer Engineering Introduction For all query expansion methods, pseudo relevance feedback (PRF) is attractive because it requires no user input PRF assumes that the top ranked documents in the initial retrieval are relevant However, this assumption is often invalid which can result in a negative impact on PRF performance Meanwhile, as the volume of data on the web becomes much larger, other resources have emerged which can potentially supplement an initial search better in PRF e.g. Wikipedia
5
Ubiquitous Sensor Network Lab 4/14 서울시립대 University Of Seoul Electrical and Computer Engineering Query Categorization Wikipedia Data Set A topic in Wikipedia has a distinct Person Place Organization or miscellaneous In addition, Important information for the topic of a given article may also be found in other Wikipedia articles With the help of enriched text, we can expect to bridge the gap between the large volume of information on the web and the simple queries issued by users.
6
Ubiquitous Sensor Network Lab 5/14 서울시립대 University Of Seoul Electrical and Computer Engineering Query Categorization Queries about a specific entity (EQ) We mean queries that have a specific meaning and cover a narrow topic The corresponding entity page is the page with the same title field as the query Queries exactly matching one title of an entity page or a redirect page will be classified as EQ e.g. “Seoul” Thus EQ can be mapped directly to the entity page with the same title
7
Ubiquitous Sensor Network Lab 6/14 서울시립대 University Of Seoul Electrical and Computer Engineering Query Categorization Ambiguous Queries (AQ) We mean queries with terms having more than one potential meaning e.g. “Apple” Broader Queries (BQ) We denote the rest of the queries to be BQ because these queries are neither ambiguous nor focused on a specific entity e.g. “Orange” A disambiguation process is needed to determine it’s sense
8
Ubiquitous Sensor Network Lab 7/14 서울시립대 University Of Seoul Electrical and Computer Engineering Query Expansion Methods Strategy for Ambiguous Queries Lee el al.
9
Ubiquitous Sensor Network Lab 8/14 서울시립대 University Of Seoul Electrical and Computer Engineering Query Expansion Methods Relevance model Language modeling framework
10
Ubiquitous Sensor Network Lab 9/14 서울시립대 University Of Seoul Electrical and Computer Engineering Query Expansion Methods Field Evidence For Query Expansion Supervised Method Training DataMachine LearningRegressionClassification Unsupervised Method Machine LearningRandom Variable Bayesian Inference Clustering
11
Ubiquitous Sensor Network Lab 10/14 서울시립대 University Of Seoul Electrical and Computer Engineering Experiments Experiment Setting In our experiments, documents are retrieved for a given query by the query likelihood language model with dirichlet smoothing Experiments were conducted using four standard Text Retrieval Conference (TREC) AP Robust2004 WT10G Gov2
12
Ubiquitous Sensor Network Lab 11/14 서울시립대 University Of Seoul Electrical and Computer Engineering Experiments Baseline Query likelihood language model (QL) Relevance model (RMC) Relevance model based on Wikipedia (RMW) All The test collection on test topics > Mean Average Precision (MAP)
13
Ubiquitous Sensor Network Lab 12/14 서울시립대 University Of Seoul Electrical and Computer Engineering Experiments Using Entity Pages For Relevance Feedback Note that in out proposed method, not all the queries can be mapped to a specific Wikipedia entity page, thus the method is only applicable EQ AQ
14
Ubiquitous Sensor Network Lab 13/14 서울시립대 University Of Seoul Electrical and Computer Engineering Experiments Field Based Expansion The first is to add the top ranked 100 good terms (SL) The second is to add the top ranked 10 good terms, (SLW) each given the classification probability as weight
15
Ubiquitous Sensor Network Lab 14/14 서울시립대 University Of Seoul Electrical and Computer Engineering Conclusion We have explored utilization of Wikipedia in PRF Three types based on Wikipedia Four TREC collection and topics Finally, in this paper, we focused on using Wikipedia as the sole source of PRF information However, we believe both the initial result from the test collection and Wikipedia have their own advantages for PRF By combining them together, one may be able to develop an expansion strategy which is robust to the query being degraded by either of the resources
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.