Download presentation
Presentation is loading. Please wait.
Published byByron Dennis Modified over 9 years ago
1
Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata
2
Motivation
3
What is query expansion? Add meaningful search terms to the query…
4
What is PIR based query expansion? Add meaningful search terms to the query… … related to the use’s interests.
5
Why PIR based query expansion? More personalization quality! More privacy!
6
Example Google search: “canon book”
7
Example Top 3 results: The Canon: A Whirligig Tour of the Beautiful Basics of Science (Hardcover) @ Amazon Western Canon @ Wikipedia Biblical Canon @ Wikipedia
8
Example Top 3 results: The Canon: A Whirligig Tour of the Beautiful Basics of Science (Hardcover) @ Amazon Western Canon @ Wikipedia Biblical Canon @ Wikipedia
9
Example Expanded query: “canon book bible”
10
Example Top 3 results: Biblical Canon @ Wikipedia Books of the Bible @ Wikipedia The Canon of the Bible @ catholicapologetics.org
11
Query Expansion using Desktop data
12
Algorithms Expanding with Local Desktop Analysis Expanding with Global Desktop Analysis
13
Algorithms Expanding with Local Desktop Analysis Expanding with Global Desktop Analysis
14
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
15
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
16
Term and Document Frequency
17
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
18
Lexical Compounds { adjective? Noun+ }
19
Expanding with Local Desktop Analysis Term and Document Frequency Lexical Compounds Sentence Selection
21
Expanding with Global Desktop Analysis Term Co-occurrence Statistics Thesaurus based Expansion
22
Expanding with Global Desktop Analysis Term Co-occurrence Statistics Thesaurus based Expansion
23
Term Co-occurrence Statistics
24
Expanding with Global Desktop Analysis Term Co-occurrence Statistics Thesaurus based Expansion
26
Experiments & Evaluation
27
Experiments 18 users Files indexed within user selected paths, Emails and Web cache
28
Experiments They chose 4 queries: – 1 from the top 2% log queries (avg. length = 2.0) – 1 random log query (avg. length = 2.3) – 1 self-selected specific query (avg. length = 2.9) – 1 self-selected ambiguous query (avg. length = 1.8)
29
Evaluation
30
Evaluated algorithms: – Google: Google query output – TF, DF: Term and Document Frequency – LC, LC[O]: Regular and Optimized Lexical Compounds – TC[CS], TC[MI], TC[LR]: Term Co-occurrences Statistics using Cosine Similarity, Mutual Information and Likelihood Ratio – WN[SYN], WN[SUB], WN[SUP]: WordNet based expansion with synonyms, sub-concepts and super- concepts.
31
Results Log queries:
32
Results Self-selected queries:
33
Introducing Adaptativity
34
Query Clarity
35
Adaptive Expansion
36
Experiments Same experimental setup as for the previous analyzis.
37
Results Log queries:
38
Results Self-selected queries:
39
Results
40
Conclusions
41
Five techniques for determining expansion terms from personal documents. Empirical analysis showed that these approaches perform very well. Expansion process adapts accordingly to query features. Adaptive expansion process proved to yield significant improvements over the static one.
42
End Any questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.