Download presentation
Presentation is loading. Please wait.
Published byAnne McGee Modified over 9 years ago
1
Generating Query Substitutions Alicia Wood
2
What is the problem to be solved?
3
Problem Imperfect description of need Search engine not able to retrieve documents matching query Need accurate and related query substitutions
4
Problem (cont.) Given a query Want to generate modified query (related) –Improvements (specification) –Neutral (spelling change, synonym) –Loss of original meaning (generalization)
5
Who cares about this problem and why?
6
Who cares? User typing the query Want correct results with imperfect query
7
What have others done to solve this problem and why is this inadequate?
8
Previous Work Relevance/Pseudo relevance feedback Query term deletion Substituting query terms with related terms Latent Semantic Indexing (LSI)
9
Relevance/Pseudo relevance feedback Submit query for initial retrieval Processing resulting documents Modify the query by expanding with additional terms from documents Perform second retrieval with modified query Can cause query drift Computationally expensive
10
Query term deletion Loss of specificity from original query
11
Substituting query terms Relies on an initial retrieval
12
Latent Semantic Indexing (LSI) Identify patterns in relationships between terms and concepts in unstructured collection of text Computationally expensive
13
What is the proposed solution to the problem?
14
Solution Query modification based on pre- computed query and phrase similarity, –Ranking proposed queries –Similar queries /phrases derived from user query sessions –Learned models used to re-rank Based on similarity of new query to original query
15
Contributions 1.Identification of new source of data to identify similar queries and phrases 2.The definition of a scheme for scoring query suggestions 3.An algorithm to combine query and phrase suggestions –Finds highly and broadly relevant phrases 4.Identification of features that are predictive of highly relevant query suggestions
16
Classes of Suggestion Relevance Precise rewriting –Match user’s intent, preserve core meaning automobile insurance automotive insurance Approximate rewriting –direct close relationship to topic, scope narrowed or broadened Apple music player ipod shuffle Possible rewriting –Categorical relationship to initial query, complementary product but distinct Eye glasses contact lenses Clear mismatch – no clear relationship Jaguar xj6 os x jaguar
17
Classes of Rewriting Specific Rewriting (1+2) –closely related query –highly relevant Broad Rewriting (1+2+3) –query expansion –relevant to user interests
18
Substitutables Initial query -> generate relevant queries –Replace query as whole or phrases –Segment query into phrases –Find query pairs where one segment has changed (britney spears) (mp3s) -> (britney spears) (lyrics) Pair Independence Hypothesis Likelihood Ratio –High value = strong dependence between two terms
19
Validation 1000 initial queries –Generate single suggestion (q j ) for each Evaluate accuracy of approaches Train machine learned classifier Evaluate ability to produce higher quality suggestions –Word distance, normalized edit distance, number of substitutions Suggestions criteria: –Some words from initial query –Modifications shouldn’t be made at start of query
20
Future Work Build semantic classifier –Predict semantic class of rewriting Take inspiration from machine translation techniques Introduce language model –Avoid producing nonsensical queries
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.