BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University.

BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University

Massive Log Streams Search log – 10+ terabyte each day (keeps increasing!) – Involves billions of distinct (query, url)’s Questions – Can we infer user-perceived relevance for each (query, url) pair? – How many passes of the data are needed? Is one enough? – Can the inference be parallel? Our answer: Yes, Yes, and Yes!

BBM: Bayesian Browsing Model query URL 1 URL 2 URL 3 URL 4 C1C1 C2C2 C3C3 C4C4 S1S1 S2S2 S3S3 S4S4 Relevance E1E1 E2E2 E3E3 E4E4 Examine Snippet ClickThroughs

Dependencies in BBM S1S1 E1E1 E2E2 C1C1 S2S2 C2C2 … … … SiSi EiEi CiCi the preceding click position before i

Road Map Exact Model Inference Algorithms through an Example Experiments Conclusions

Notations For a given query – Top-M positions, usually M=10 Positional relevance M(M+1)/2 combinations of (r, d)’s – n search instances – N documents impressed in total: Document relevance

Model Inference Ultimate goal Observation: conditional independence

P(C|S) by Chain Rule Likelihood of search instance From S to R:

Putting things together Posterior with Re-organize by R j ’s How many times d j was clicked How many times d j was not clicked when it is at position (r + d) and the preceding click is on position r

What Tells US Exact inference with joint posterior in closed form Joint posterior factorizes and hence mutually independent At most M(M+1)/2 + 1 numbers to fully characterize each posterior – Count vector:

LearnBBM: One-Pass Counting Find R j

An Example Compute Count vector for R 4 0 0 00 0 0 0 1 2 3 2 13 2 1 0 N4N4 N 4, r, d 1 1

LearnBBM on MapReduce Map: emit((q,u), idx) Reduce: construct the count vector

Example on MapReduce (U1, 0) (U2, 4) (U3, 0) Map (U1, 1) (U3, 0) (U4, 7) Map (U1, 1) (U3, 0) (U4, 0) Map (U1, 0, 1, 1) (U2, 4)(U4, 0, 7)(U3, 0, 0, 0) Reduce

Experiments Compare with the User Browsing Model ( Dupret and Piwowarski, SIGIR’08 ) – The same dependence structure – But point-estimation of document relevance rather than Bayesian – Approximate inference through iterations Data: – Collected from Aug and Sept 2008 – 10 algorithmic results only – Split to training/test sets according to time stamps for each query – 51 million search instances of 1.15 million distinct queries, 10X larger than the SIGIR’08 study

Overall Comparison on Log-Likelihood Experiments in 20 batches LL Improvement Ratio =

Comparison w.r.t. Frequency Intuition – Hard to predict clicks for infrequent queries – Easy for frequent ones

Model Comparison on Efficiency 57 times faster

Petabyte-Scale Experiment Setup: – 8 weeks data, 8 jobs – Job k takes first k- week data Experiment platform – SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [Chaiken et al, VLDB’08]

Scalability of BBM Increasing computation load – more queries, more urls, more impressions Near-constant elapse time Computation Overload Elapse Time on SCOPE 3 hours Scan 265 terabyte data Full posteriors for 1.15 billion (query, url) pairs

Bayesian Browsing Model for Search streams – Exact Bayesian inference – Joint posterior in closed form – A single pass suffices – Map-Reducible for Parallelism – Admissible to incremental updates – Perfect for mining click streams Models for other stream data – Browsing, twittering, Web 2.0, etc?

Thanks!

BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University.

Similar presentations

Presentation on theme: "BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University.

Similar presentations

Presentation on theme: "BBM: Bayesian Browsing Model from Petabyte-scale Data Chao Liu, MSR-Redmond Fan Guo, Carnegie Mellon University Christos Faloutsos, Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback