1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin
2 Outline Background and motivation Designing a click model Algorithms Experiments
3
4 How to utilize users’ feedback to improve search engine results?
5 Diverse User Feedback Click-through Browser action Dwelling time Explicit judgment Other page elements 5
6 Web Search Click Log Auto-generated data keeping important information about search activity. PositionURLClick 1cikm2008.org1 2www.cikm.org0 3www.cikm.org/ www.fc.ul.pt/cikm www.comp.polyu.edu.hk/conference/cikm cikmconference.org0 7Ir.iit.edu/cikm www.informatik.uni-trier.de/~ley/db/conf/cikm/index.html0 9www.tzi.de/CIKM www.cikm.com0 Query cikm Session ID f851c5af178384d12f3d
7 A real world example
8 – search logs: 10+ TB/day –In existing publications: [Craswell+08]: 108k sessions [Dupret+08] : 4.5M sessions (21 subsets * 216k sessions) [Guo +09a] : 8.8M sessions from 110k unique queries [Guo+09b]: 8.8M sessions from 110k unique queries [Chapelle+09]: 58M sessions from 682k unique queries [Liu+09a]: 0.26PB data from 103M unique queries How large is the clicklog?
9 Intuition to Utilize Clicks Adapt ranking to user clicks # of clicks received
10 Position Bias Problem # of clicks received
11 Problem Definition Given a click log data set, for each query- document pair, compute user-perceived relevance and the solution should be –Aware of the position bias and context dependency –Scalable to Terabyte data –Incremental to stay updated
12 Outline Background and motivation Designing a click model Algorithms Experiments
13 Examination Hypothesis A document must be examined before a click. The (conditional) probability of click upon examination depends on document relevance.
14 Cascade Hypothesis The first document is always examined. First-order Markov property: –Examination at position (i+1) depends on examination and click at position i only Examination follows a strict linear order: Position iPosition (i+1)
15 User Behavior Description Examine the Document Click? See Next Doc? Done No Yes No Yes See Next Doc? Done No
16 Click Chain Model C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 Examination Hypothesis Cascade Hypothesis
17 Outline Background and motivation Designing a click model Algorithms Experiments
18 A Coin-Toss Example for Bayesian Framework Prior Posterior x 1 (1-x) 0 x 2 (1-x) 0 x 3 (1-x) 0 x 3 (1-x) 1 x 4 (1-x) 1 Density Function (not normalized)
19 Click Data Example Prior Density Function (not normalized) x 1 (1-x) 0 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1- 0.2x) 0 … x 1 (1-x) 1 (1-0.6x) 0 (1+0.3x) 1 (1-0.5x) 0 (1- 0.2x) 0 … x 2 (1-x) 1 (1-0.6x) 0 (1+0.3x) 2 (1-0.5x) 0 (1- 0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 0 (1- 0.2x) 0 … x 3 (1-x) 1 (1-0.6x) 1 (1+0.3x) 2 (1-0.5x) 1 (1- 0.2x) 0 …
20 Estimating P(C|Ri)
21 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101
22 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101
23 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101
24 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101
25 C4C4 C3C3 C2C2 C1C1 R1R1 E1E1 E2E2 R2R2 R3R3 R4R4 E3E3 E4E4 … … … C5C5 R5R5 E5E5 0101
26 Putting them together
27 Alpha Estimation
28 Outline Background and motivation Designing a click model Algorithms Experiments
29 Data Set Collected in 2 weeks in July Preprocessing: –Discard no-click sessions for fair comparison. –178 most frequent queries removed. Split to training/test sets according to time stamps.
30 Data Set After preprocessing: –110,630 distinct queries; –4.8M/4.0M query sessions in the training/test set.
31 Metric Efficiency: –Computational Time Effectiveness: –Perplexity –Log-likely hood –Click Prediction.
32 Competitors UBM: User Browsing Model (Dupret et al., SIGIR’08) DCM: Dependent Click Model (WSDM’09)
33 Results - Time Environment: Unix Server, 2.8GHz cores, MATLAB R2008b. CCMUBMDCM 9.8 min333 min5.4 min
34 Results – Perplexity Worse Better
35 Results – Log Likelihood Better Worse
36 First Clicked Position
37 Last Clicked Position
38 The End