Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011 1.

Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011 1

Relevant result “pia workshop” Query: 2

Outline Approaches to personalization The proposed personalization strategy Evaluation metrics Results Conclusions and Future work 3

Approaches to Personalization Observed user interactions  Short-term interests Sriram et al. [24] and [6], session data is too sparse to personalize  Longer-term interests [23, 16]: model users by classifying previously visited Web pages Joachims [11]: user click-through data to learn a search function PClink [7] and Teevan et al. [28] Other related approaches: [20, 25, 26] Representing the user  Teevan et al. [28], rich keyword-based representations, no use of web page characteristics Commercial personalization systems  Google  Yahoo! rich user profile 4 promote URLs

Personalization Strategy Title Unigrams Metadata description Unigrams Full text Unigrams Metadata keywords Extracted Terms Noun phrases Browsing History User Profile Terms WordNet Dictionary Filtering Google N-Gram Filtering No Filtering TF Weighting TFxIDF Weighting BM25 Weighting User Profile Terms and Weights Visited URLs + number of visits Previous searches &click-through data Data ExtractionFiltering Weighting User Profile Generation Workflow 5

Personalized Search query 6 dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1 Browsing History Firefox add-on: AlterEgo

Personalized Search query dog cat monkey banana food baby infant child boy girl forest hiking walking gorp baby infant child boy girl csail mit artificial research robot web search retrieval ir hunt 7 dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1 Data extraction User Profile Terms

1.60.2 6.0 0.2 2.7 1.3 Personalized Search query web search retrieval ir hunt 1.3 8 dog 1 cat10 india 2 mit 4 search93 amherst12 vegas 1 Term weighting

Term Weighting TF: term frequency TF-IDF: w TF (t i ) cow search cow ir hunt dog = 0.02 9 TF 2 100 w TF (t i )= * w TF (t i ) 1 log(DF t i ) dog cat monkey banana food baby infant child boy cow forest cow walking gorp baby infant child boy girl csail mit artificial research robot cow search cow ir hunt dog * = 0.08 TF-IDF 2 100 1 log(10 3 /10 7 )

0.3 0.7 0.1 0.23 0.6 0.6 0.002 0.7 0.1 0.01 0.6 0.2 0.8 0.1 0.001 0.3 0.4 0.1 0.7 0.001 0.23 0.6 0.1 0.05 0.5 0.35 0.3 N nini Term Weighting Personalized BM25 World riri R (r t i +0.5)(N-n t i +0.5) (n t i +0.5)(R-r t i +0.5) w pBM25 (t i )=log 10

Re-ranking Use the user profile to re-rank top results returned by a search engine Candidate document vs. snippets  Snippets are more effective. Teevan et al. [28]  Allow straightforward personalization implementation Matching  For each term occurs both in snippet and user profile, its weight will be added to the snippet’s score Unique matching  Counts each unique term once Language model  Language model for user profile, weights for terms are used as frequency counts PClink Dou et al. [7] 11 Scoring methods

Evaluation Metrics Relevance judgements  NDCG@10 = Σ Side-by-side  Two alternative rankings side-by-side, ask users to vote for best Clickthrough-based  Look at the query and click logs from large search engine Interleaved  New metric for personalized search  Combine results of two search rankings (alternating between results, omitting duplicates) 12 Z 1 i=1 10 2 rel i - 1 log 2 (1+i)

Offline Evaluation 6 participants, 2 months of browsing history Judge relevance of top 50 pages returned by Google for 12 queries 25 general queries (16 from TREC 2009 Web search track), each participant will judge 6 Most recent 40 search queries, judge 5 Each participant took about 2.5 hours to complete 13

Offline Evaluation 14 StrategyProfile ParametersRanking Parameters Full text TitleMeta keywords Meta Descr. Extracted terms Noun Phrases Term weights Snippet Scoring Google rank URLs visited MaxNDCG-Rel -- TF-IDFLM1/logv=10 MaxQuer----Rel TFLM1/logv=10 MaxNoRank--Rel---TFLM-v=10 MaxBestPar-Rel - -pBM25LM1/logv=10 Personalization strategies. Rel: relative weighting MaxNDCG: yields highest average NDCG MaxQuer: improves the most queries MaxNoRank: the method with highest NDCG that does not take the original Google ranking into account MaxBestPar: obtained by greedily selecting each parameter sequentially

Offline Evaluation 15 MethodAverage NDCG+/=/- Queries Google0.502 ± 0.067- Teevan et al. [28]0.518 ± 0.06244/0/28 PClink0.533 ± 0.05713/58/1 MaxNDCG0.573 ± 0.04248/1/23 MaxQuer0.567 ± 0.04552/2/18 MaxNoRank0.520 ± 0.06013/52/7 MaxBestPar0.566 ± 0.04445/5/22 Offline evaluation performance MaxNDCG and MaxQuer are both significantly better Interestingly, MaxNoRank is significantly better than Google and Teevan (may be due to overfitting on small offline data) PClink improves fewest queries, but better than Teevan on average NDCG

Offline Evaluation 16 Distribution of relevance at rank for Google and MaxNDCG rankings 3600 relevance judgements collected, 9% Very Relevant, 32% Relevant, 58% Non-Relevant Google:places many Very Relevant results in Top 5 MaxNDCG: adds more Very Relevant results into Top 5, and succeeds in adding Very Relevant results between Top 5 and Top 10

Online Evaluation 17 Large-scale interleaved evaluation, users performing day-to-day real searches The first 50 results requested from Google, personalization strategies were picked randomly Exploit Team-Draft interleaving algorithm [18] to produce a combined ranking 41 users, 7997 queries, 6033 query impressions, 6534 queries and 5335 query impressions received a click

Online Evaluation 18 MethodQueriesGoogle VoteRe-ranked Vote MaxNDCG2090624(39.5%)955(60.5%) MaxQuer2273812(47.3%)905(52.7%) MaxBestPar2171734(44.8%)906(55.2%) MethodUnchangedImprovedDeteriorated MaxNDCG1419(67.9%)500(23.9%)171(8.2%) MaxQuer1639(72.1%)423(18.6%)211(9.3%) MaxBestPar1485(68.4%)467(21.5%)219(10.1%) Results of online interleaving test Queries impacted by personalization

Online Evaluation 19 Rank differences for deteriorated(light) and improved(dark) queries for MaxNDCG Degree of personalization per rank For a large majority of deteriorated queries, the clicked results only loss 1 rank The majority of clicked results that improved a query gain 1 rank The gains from personalization are on average more than double the losses MaxNDCG is the most effective personalization method

Conclusions First large-scale personalized search and online evaluation work Proposed personalization techniques: significantly outperform default Google and best previous ones Key to model users: use characteristics and structures of Web pages Long-term, rich user profile is beneficial 20

Future Exploration Parameter extension  Learning parameter weights  Using other fields (e.g., headings in HTML) and learning their weights Incorporating temporal information  How much browsing history?  Whether decaying weights of older terms?  How page visit duration can be used? Making use of more personal data Using extracted profiles for other purposes 21

Thank you! 22

Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011 1.

Similar presentations

Presentation on theme: "Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011 1.

Similar presentations

Presentation on theme: "Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM 2011 1."— Presentation transcript:

Similar presentations

About project

Feedback