Download presentation
Presentation is loading. Please wait.
Published byLisa Wilkinson Modified over 9 years ago
1
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of Montreal Wei-Ying Ma Microsoft Research Asia, China
2
Outline Motivations Central ideas Establishing correlations between query terms and document terms Query expansion based on term correlations Evaluations Conclusions
3
Motivations More severe challenges on web searching Very short queries (less than two words) Inconsistency of term usages on two sides The Web is not well-organized Users express queries with their own vocabulary Most search engines are keyword based. Previous query expansion techniques focus on one side only – documents Our solution – concentrate on both sides
4
Big gap between the query space and the document space Query space and document space. For each document, measure the cosine value of the internal angle between the two spaces. Big gap: 73.68 degree on average (Cos A=0.28)
5
Outline Motivations Central ideas Establishing correlations between query terms and document terms Query expansion based on term correlations Evaluations Conclusions
6
Principle of exploiting query logs Query logs Means to explore the query side. session= := [clicked document] Central idea Log-based query expansion. Probabilistic correlations between query terms and index terms in the clicked documents against the respective queries.
7
Assumption The clicked documents are relevant to the given query. Reasonable because: Users do not click documents randomly. Stable from a statistical view Our previous work on query clustering proved it.
8
Compared with Local Feedback and Relevance Feedback
9
Characteristic of the log-based query expansion Local technique in general. Feasibility in computation. No initial retrieval. Reflecting most users’ intentions An example Evolve with the accumulations of user usages
10
Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions
11
Query sessions as a bridge Query Sessions Netscape Bill Gates Java Microsoft Programming Windows OS #Doc1 #Doc2 *Query1 #Doc3 *Query2 #Doc1 #Doc4 *Query3 Document Space Query Space
12
Correlations between query terms and document terms Bill Gates Java Windows Netscape Microsoft Programming OS 0.83 0.89 0.24 0.17 0.67 0.04 Query SpaceDocument Space
13
Term-Term Probabilistic correlations Term-Term Correlations are represented as the conditional probability: Query Term Index Term #Doc1 #Doc2 *Query
14
Term-Term probabilistic correlations (Cont) Estimate of the two conditional probabilities.
15
Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions
16
Query expansion based on term correlations For a whole query, we have to select candidate expansion terms. Top ranked document terms are added into the original query to formulate a new one.
17
Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions
18
Data and methodology Data Two month query logs (Oct 2000-Dem 2000) 41,942 documents 30 evaluation queries (mostly are short queries) Document relevance judged by human assessors. Comparing our method with the baseline and the Local Context Analysis (LCA)
19
Experiment I---Retrieval effectiveness Average Improvement 75.42% over Baseline 38.95% over LCA Significant improvement from a statistical view
20
Experiment II---Quality of expansion terms Examining 50 expansion terms obtained by the log-based method and LCA. LC Analysis (base) Log Based Improvement (%) Relevant Terms (%) 23.2730.73+32.03 Example – “Steve Jobs” “Apple Computer”, “CEO”, “Macintosh”, “Microsoft”, “GUI”, “Personal Computers”
21
Experiment III---Impact of phrases For TREC queries, phrases may not be as effective as expected. Not the case in short query context. A example. Phrases are extracted from user logs. Experiments show 11.37% improvement when using phrases in average.
22
Experiment IV---Impact of number of expansion terms The more expansion terms, the better? The best performance can be achieved by adding 40 to 60 expansion terms.
23
Summary for evaluation The log-based query expansion produces significant improvements over the baseline and LCA in terms of precision and recall. Query expansion is of great importance for short queries on the Web. Phrases can improve the performance of search engines.
24
Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions
25
We show how big the gap exists between the query space and the document space. A new log-based query expansion method considering both sides of the problem. Experimental results show our solution is effectual for short queries in Web searching. User log mining is a promising direction for future research.
26
Thanks !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.