Download presentation
Presentation is loading. Please wait.
Published byMegan Hilda Kennedy Modified over 9 years ago
1
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China
2
Introduction Query suggestions are more useful for difficult topics, for which users have little knowledge to create meaningful queries A meaningful query must infer the user’s query intent & information needs & must help user find the relevant documents containing relevant information Existing web search engines rely on query logs to make query suggestions which are not available for desktop or enterprise search systems Solution: a document centric probabilistic mechanism to generate query suggestions w/o using query logs which utilizes the document corpus to extract phrases 2
3
Related Work Most of the previous works provide query expansion & refinement rather than query suggestions Comp(lete)Search Method: Provides real time auto-completion of the last query term typed by the user Requires user to type at least two characters of the last query term which is the most frequent term SimSearch Method: Phrase index is searched to find phrases that contain the user submitted partial query as a sub-phrase Selected phrases are presented to the user in order of their occurrence frequency 3
4
Proposed QS Approach Based on the document centric probabilistic mechanism Extracting phrases to create a database of phrases that can be used for completing partial user queries from document corpus Using N-grams of all order 1, 2, & 3, i.e., unigrams, bigrams, & trigrams from the document corpus Use idea similar to skip-grams rather than N-grams N-gram is the number of non stop-words 4
5
Query Suggestions At any given instant of time, after the user has entered k characters, denoted Q 1 k, which can be decomposed Q 1 K = Q c + Q t (1) where |Q c | 0, a set of words, & |Q t | {0, 1}, a (in)complete word Given a partial query Q 1 k & a phrase p i P = {p 1, p 2, …, p n }, what is the probability P(p i | Q 1 k ), i.e., the probability that the user will type p i after typing Q 1 k ? 5
6
Query Suggestions 6
7
The proposed query suggestion is defined as The probability of selecting a phrase given a partial word is The importance of phrases is determined by occurrence frequencies in the document corpus 7 a vocabulary word that start with Q t a phrase that contains the word c i
8
Estimating Phrase-Query Correlation The contextual relationship between a phrase p i & a user submitted query Q c using their joint occurrence p i is the 2 nd half of the complete query & Q c is the 1 st half Both P(Q c, p i ) & P(p i ) in the previous equation can be estimated using the corpus as follows: where D p i and D Q c represent the sets of documents that contain phrase p i and Q c, respectively 8
9
Experimental Results: Datasets Two datasets were used TREC Consists of more than 200K news articles published in Financial Times between years 1991–1994 Ubuntu: Consists of more than 100K discussion threads crawled from ubuntuforums.org, 25 queries, & relevance judgments 9
10
Baselines Methods The proposed methods was compared with the following two baseline methods Similarity based phrase search (SimSearch) Indexed phrases which contain user queries as sub-phrases are searched & ranked according to their occurrence frequencies CompleteSearch (CompSearch) Offers real-time auto-completion of the last query term being typed by the user Also use frequency as the ranking criterion 10
11
Test Queries Generated 40 partial test queries, created from 20 non- stop words, non-single keyword, randomly-chosen queries, for each dataset Type-A Queries Queries were generated by retaining only the 1 st keyword from each of the 20 original queries Type-B Queries Queries were generated by retaining the 1 st keyword of the query followed by the first randomly-chosen k characters (2 ≤ k ≤ length of the remaining query string) 11
12
Test Queries(cont’d) 12
13
Evaluation For each test query, the top 10 suggestions generated by SimSearch, CompSearch & the proposed Probabilistic method were collected & evaluated by 3 assessors Evaluation was performed w/ the help from 12 volunteers who were colleagues not associated with the project For each query suggestion, each assessor assigned one rating among the four (given below) & major-vote is used 13
14
Suggestions Created by Two Test Queries 14
15
Success Rate of Different Methods A query suggestion method is successful for a given partial query if it is able to generate at least one meaningful suggestion for the partial query 15
16
Quality of Suggestions 16
17
Precision Values Achieved by Different QS 17
18
Effectiveness of Suggested Queries Query clarity score is used to measure the retrieval performance of suggested queries Clarity score of a query increases if we add terms that reduce query ambiguity & it decreases on adding terms that make the query more ambiguous Clarity score for a query q with respect to a collection of documents C is computed using KL-Divergence where V is the vocabulary of the collection 18
19
Clarity Scores Achieved by Different QS 19
20
Conclusions and Future Works Meaningful query suggestions can be made in the absence of query logs with probabilistic approach using the occurrence of terms/phrases in a corpus of documents Future works A future goal is to ensure that the badly formed combination of phrases are eliminated from the suggestions Use of synonyms and synonymous phrases to enable the system to suggest alternatives also needs to be explored Systematic approach towards diversifying the suggested queries Apply to a relatively larger scale 20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.