Download presentation
Presentation is loading. Please wait.
Published byAlejandro Hamilton Modified over 11 years ago
1
Click Chain Model in Web Search Fan Guo Carnegie Mellon University 11/29/2014WWW'09, Madrid, Spain
2
Chao Liu MSR, ISRC-Redmond Yi-Min Wang MSR, ISRC-Redmond MSR, Cambridge Mike Taylor MSR, Search Lab Anitha Kannan MSR, Cambridge Tom Minka Carnegie Mellon University Christos Faloutsos Joint Work With…
4
1/29/2014WWW'09, Madrid, Spain4
5
Click Logs Auto-generated data keeping important information about search activity. 51/29/2014WWW'09, Madrid, Spain Rank/PositionURL of DocumentClick 1www.metalwayfestival.com0 2www.maquitec. com0 3www.construmat.com0 4www.hispack.com0 5www.themarket.com0 6www.cursabombers.com0 7www.setegibernau.com0 8www2009.org1 9www.solardecathlon.upe.es0 10www.nxtbook.com/nxtbooks/suny/2009spring0 Query www 2009 Time 21 Apr 2009, 9:01:02
6
Problem Definition Given a click log data set, for each query-document pair, compute user-perceived relevance. 61/29/2014WWW'09, Madrid, Spain Rank/PositionDocument IdxClick 110 280 330 470 550 6120 720 851 9420 10200 Querywww 2009 Session Index103 … Document IdxRelevance 1? 2? 3? 4? 5? 6? 7? 8? 9? … Impression Data Click Data
7
Relevance Representation 1/29/2014WWW'09, Madrid, Spain7 Excellent Good Fair Bad 01 Click Chain Model 0.75 Previous Click Models Human Judge Integration
8
Applications Automated Ranking Alterations Search Engine Performance Metric Calibrate Human Judgment Related Application in Sponsored Search 81/29/2014WWW'09, Madrid, Spain
9
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain9
10
1/29/2014WWW'09, Madrid, Spain10
11
Eye-Tracking User Study 111/29/2014WWW'09, Madrid, Spain Fixation Heat Map
12
Overall: Fixation is biased towards higher ranks, so do the clicks. For each position: fixation/clicks are context dependent. 121/29/2014WWW'09, Madrid, Spain Normal Impression Reversed Impression
13
Problem Definition (Recap) Given a click log data set, for each query-document pair, compute user-perceived relevance and the solution should be – Aware of the position bias and context dependency – Scalable to Terabyte data – Incremental to stay updated 1/29/201413WWW'09, Madrid, Spain
14
Examination Hypothesis User behavior abstraction: Fixation binary examination variable Click binary click variable A document must be examined before being clicked. 141/29/2014WWW'09, Madrid, Spain
15
Examination Hypothesis For each position, P(Click=1) = P(Examination=1) * Relevance Relevance = P(Click=1|Examination=1) The position bias is reflected in the derivation of P(Examination). 151/29/2014WWW'09, Madrid, Spain
16
User scans through documents and make decisions in strict linear order. The decision process: E 1, C 1, E 2, C 2,… Essential part of click model: – What is the probability of See Next Doc? Cascade Hypothesis 161/29/2014WWW'09, Madrid, Spain
17
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain17
18
The Context Top-10 organic search results only. Query sessions are independent. Semantic info are not used. 1/29/2014WWW'09, Madrid, Spain18 Suggestions Ads Other Elements
19
User Behavior Description 1/29/2014WWW'09, Madrid, Spain19 Examine the Document Click? See Next Doc? Done No Yes No Yes See Next Doc? Done No
20
C4C4 C3C3 C2C2 C1C1 Click Chain Model 20 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5
21
Why Bayesian? Modeling Benefit: – A principled way of smoothing the relevance estimates; – Offers more flexibility such as computing P(R i >R j ). Computational Benefit: – Avoid iterative optimization procedure in maximum-likelihood estimation 1/29/2014WWW'09, Madrid, Spain21
22
Relevance Inference Given a query, and all its click data compute the posterior for each possible j. Let then focus on click probability for a particular session, and look at different cases 1/29/2014WWW'09, Madrid, Spain22
23
C4C4 C3C3 C2C2 C1C1 Click Chain Model 23 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 Examination Hypothesis Cascade Hypothesis
24
C4C4 C3C3 C2C2 C1C1 24 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101
25
C4C4 C3C3 C2C2 C1C1 25 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101
26
C4C4 C3C3 C2C2 C1C1 26 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101
27
C4C4 C3C3 C2C2 C1C1 27 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101
28
C4C4 C3C3 C2C2 C1C1 28 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … 1/29/2014WWW'09, Madrid, Spain C5C5 R5R5 E5E5 0101
29
Putting them together 291/29/2014WWW'09, Madrid, Spain
30
Summary of the Algorithm Initializing (2*10+2) counts for each pair; Go through the click log once and update the counts; Compute parameter values and get β values; Ready to output results (using numerical integration if necessary). 301/29/2014WWW'09, Madrid, Spain
31
Sanity Check The algorithm should be – Aware of the position bias and context dependency – Scalable to Terabyte data Single Pass, Linear – Incremental to stay updated Update counts 1/29/201431WWW'09, Madrid, Spain
32
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain32
33
Data Set Collected in 2 weeks in July 2008. Preprocessing: – Discard no-click sessions for fair comparison. – 178 most frequent queries removed. Split to training/test sets according to time stamps. 331/29/2014WWW'09, Madrid, Spain
34
Data Set After preprocessing: – 110,630 distinct queries; – 4.8M/4.0M query sessions in the training/test set. 341/29/2014WWW'09, Madrid, Spain
35
Metric Efficiency: – Computational Time Effectiveness: – With known document identities in the test set, – Using the relevance and parameter learned on the training set, – To do Click Prediction. 1/29/2014WWW'09, Madrid, Spain35 (resort to indirect measure)
36
Competitors UBM: User Browsing Model (Dupret et al., SIGIR08) – More parameters – Iterative, more expensive algorithm DCM: Dependent Click Model (WSDM09) – Modeling 1+ clicks per session 1/29/2014WWW'09, Madrid, Spain36
37
Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b. 1/29/2014WWW'09, Madrid, Spain37 CCMUBMDCM 9.8 min333 min5.4 min 1.0340.55
38
Results – Perplexity Perplexity: quality of click prediction for each position individually. 381/29/2014WWW'09, Madrid, Spain Random Guess (p H =0.5): 2.00 Best Guess (p H =0.8): 1.65 Ground Truth (Cheating): 1.00
39
Results – Perplexity 391/29/2014WWW'09, Madrid, Spain Worse Better
40
Results – Perplexity Average Perplexity over top 10 positions. 401/29/2014WWW'09, Madrid, Spain ModelCCMUBMDCM Perplexity1.14791.15771.1590 Equiv. P H 0.03090.03340.0337 Improv.7.5%8.3%
41
Results – Log Likelihood Log-likelihood: log of the chance to recover the entire click vector out of 2 10 possibilities. 411/29/2014WWW'09, Madrid, Spain ModelCCMUBMDCM LL-1.171-1.264-1.302 Likelihood0.31000.27190.2826 Improv.9.7%14%
42
Results – Log Likelihood 421/29/2014WWW'09, Madrid, Spain Better Worse
43
Roadmap Motivation and Problem Definition Click Model Basics CCM and Algorithms Experimental Evaluation Related Work and Conclusion 1/29/2014WWW'09, Madrid, Spain43
44
Related Work User behavior study and hypothesis – Eye-tracking Study (Joachims et al., KDD05, ACM TOIS) – Examination Hypothesis (Richardson et al., WWW07) – Cascade Hypothesis (Craswell et al., WSDM08) Other click models – Logistic Regression (Dupret et al., SIGIR08) – Dynamic Bayesian Network (Chapelle et al., WWW09) – Bayesian Browsing Model (KDD09, To appear) 441/29/2014WWW'09, Madrid, Spain
45
Conclusion Click Chain Model – A probabilistic approach to interpret clicks. – A Bayesian approach to model relevance. – Both scalable and incremental. Future Directions – Validation/Bucket Test. – Pairwise comparison – More on context dependency 451/29/2014WWW'09, Madrid, Spain
46
Thank you :-) 461/29/2014WWW'09, Madrid, Spain
47
Abstract/Document Relevance Relevance of Abstract: – Conditional probability of click as defined by examination hypothesis Relevance of Document: – Determines the probability of See Next Doc – A binary random variable (integrated out under CCM) 1/29/2014WWW'09, Madrid, Spain47
48
Alt. User Behavior Description 1/29/2014WWW'09, Madrid, Spain48 Examine the Document Click? Relevant? Yes No See Next Doc? Yes
49
Results – Perplexity (by Freq) 491/29/2014WWW'09, Madrid, Spain Worse Better
50
Examination/Click Distribution 501/29/2014WWW'09, Madrid, Spain
51
Predicting First/Last Clicks Root-Mean-Square error in predicting the first/last clicked position for the test data. Two approaches (bias/variance tradeoff): – EXPectation: using the expected value (bias) – SIMulation: drawing sample from the model (variance) 511/29/2014WWW'09, Madrid, Spain
52
First Clicked Position 521/29/2014WWW'09, Madrid, Spain
53
Last Clicked Position 531/29/2014WWW'09, Madrid, Spain
54
A Quick Example Here we are interested in R 3 541/29/2014WWW'09, Madrid, Spain
55
A Quick Example Here we are interested in R 3 551/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1
56
A Quick Example Here we are interested in R 3 561/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1
57
A Quick Example Here we are interested in R 3 571/29/2014WWW'09, Madrid, Spain C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1 C4C4 C3C3 C2C2 C1C1
58
A Quick Example Here we are interested in R 3 581/29/2014WWW'09, Madrid, Spain Mean(R 3 ) = 0.52 Std(R 3 ) = 0.22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.