Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata
Motivation
What is query expansion? Add meaningful search terms to the query…
What is PIR based query expansion? Add meaningful search terms to the query… … related to the use’s interests.
Why PIR based query expansion? More personalization quality! More privacy!
Example Google search: “canon book”
Example Top 3 results: The Canon: A Whirligig Tour of the Beautiful Basics of Science Amazon Western Wikipedia Biblical Wikipedia
Example Top 3 results: The Canon: A Whirligig Tour of the Beautiful Basics of Science Amazon Western Wikipedia Biblical Wikipedia
Example Expanded query: “canon book bible”
Example Top 3 results: Biblical Wikipedia Books of the Wikipedia The Canon of the catholicapologetics.org
Query Expansion using Desktop data
Algorithms Expanding with Local Desktop Analysis Expanding with Global Desktop Analysis
Algorithms Expanding with Local Desktop Analysis Expanding with Global Desktop Analysis
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
Term and Document Frequency
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
Lexical Compounds { adjective? Noun+ }
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
Expanding with Global Desktop Analysis Term Co-occurrence Statistics Thesaurus based Expansion
Expanding with Global Desktop Analysis Term Co-occurrence Statistics Thesaurus based Expansion
Term Co-occurrence Statistics
Expanding with Global Desktop Analysis Term Co-occurrence Statistics Thesaurus based Expansion
Experiments & Evaluation
Experiments 18 users Files indexed within user selected paths, s and Web cache
Experiments They chose 4 queries: – 1 from the top 2% log queries (avg. length = 2.0) – 1 random log query (avg. length = 2.3) – 1 self-selected specific query (avg. length = 2.9) – 1 self-selected ambiguous query (avg. length = 1.8)
Evaluation
Evaluated algorithms: – Google: Google query output – TF, DF: Term and Document Frequency – LC, LC[O]: Regular and Optimized Lexical Compounds – TC[CS], TC[MI], TC[LR]: Term Co-occurrences Statistics using Cosine Similarity, Mutual Information and Likelihood Ratio – WN[SYN], WN[SUB], WN[SUP]: WordNet based expansion with synonyms, sub-concepts and super- concepts.
Results Log queries:
Results Self-selected queries:
Introducing Adaptativity
Query Clarity
Adaptive Expansion
Experiments Same experimental setup as for the previous analyzis.
Results Log queries:
Results Self-selected queries:
Results
Conclusions
Five techniques for determining expansion terms from personal documents. Empirical analysis showed that these approaches perform very well. Expansion process adapts accordingly to query features. Adaptive expansion process proved to yield significant improvements over the static one.
End Any questions?