Inferring People’s Site Preference in Web Search Progress Report Bin Tan
Problem Reiterated People prefer some sites to others in web search. Preferences may change with topics. How to remember a user’s site preference?
Examples DAIS: ranked 14th in Google Hints from past queries: “dais” “dais uiuc” Paper search → CiteSeer or ACM Portal?
Basic Approach Store a user’s search history in a log file <Q, W>: Q - query W - clicked websites Given a new query, find similar past queries and use the associated clicked website information to infer the user’s site preference
Basic Approach (cont.) If a search result for the current query is from the most preferred site, move it to the top of the list Nearest Neighbor paradigm
How to represent Q To capture the information need of a past search Query keywords Frequent terms in the results (or only the clicked ones?) Other features like # results in pdf format
What to include in W Clicked result ≠ Relevant result ≠ Preferred site Less time spent viewing a result → More likely the result is irrelevant More clicks → Less confidence in site preference
Efficiency Issues 30 search a day → 10, 000 log entries a year Inverted Index on query terms and sites
Another method? Random variable L: User likes a site for a query Find w that maximize O(L=1|w,q) for a given q P(q|w,L=1) ?