Download presentation
Presentation is loading. Please wait.
Published byLuke Elliott Modified over 9 years ago
2
Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time Chao Liu, Ryen White, Susan Dumais Microsoft Research at Redmond
3
Dwell Time as User Implicit Feedbacks The most significant indicator of document relevance besides clickthroughs [Kelly and Belkin, SIGIR’01, SIGIR’04] Leveraged in various applications Learning to rank [Agichtein et al., SIGIR’06] Query expansion [Buscher et al., SIGIR’09] BrowseRank, assuming an exponential dist. [Liu et al., SIGIR’08] …
4
Questions Addressed in this Study Questions: How do we model the dwell time distribution Pr(t|d)? What does Pr(t|d) tell us about user browsing behaviors? How is the distribution related to page-level features, and can we predict the distribution based on page-level features? Takeaways We propose to model Pr(t|d) using Weibull distributions The fitted Weibull distribution exhibits a strong negative aging effect, which indicates a “screen-and-glean” browsing behavior We can predict Pr(t|d) based on page features, which effectively extends the application of dwell time to scenarios where dwell time data is not available
5
Outline A Primer on Weibull Analysis Weibull distribution and analysis Hazard function and aging effects Weibull Analysis on Dwell Time Goodness-of-Fit Screen-and-glean browsing pattern Screening by categories Predicting Dwell Time Distribution Prediction performance Feature importance Conclusions
6
Weibull Analysis Weibull analysis is a method for modeling positive data sets, such as time-to-failure data Predicting product life, Comparing reliability of competing product designs Establishing warranty policies or proactively managing spare parts inventories Success beyond reliability engineering Survival analysis, weather forecasting, fading channels in wireless communication, the length of labor strikes, AIDS mortality and earthquake probabilities, etc. Unfortunately, no prior Weibull analysis on Web data although Web abounds with temporal data Page dwell time, session length, time-to-first-click, etc
7
Weibull Distribution 2-parameter Weibull distribution λ : scale parameter k: shape parameter Exponential dist. when k = 1
8
Weibull Analysis Hazard function at time x Instantaneous failure rate (or hazard rate) at time x Amount of risk associated with an x-survivor at time x Hazard function for Weibull distributions
9
Aging Effects from Hazard Function k = 1: No aging Constant failure rate Exponential distribution 0<k<1: Negative aging Decreasing failure rate An initial screening has to be passed in order to survive longer Smaller k means harsher screening k > 1: Positive aging Increasing failure rate Little to no screening at the beginning but life becomes tougher as time goes by
10
Weibull Analysis on Dwell Time and Beyond Web abounds with temporal data Time to first click, session length, eye fixation, … Weibull analysis is way beyond hazard functions Failure forecasting, corrective actions, … Reliability Analysis Dwell Time AnalysisClick Analysis… Datatime-to-failureTime-to-abandonTime-to-first-click… HazardFailure rateAbandon rateClick rate… E(t|t>t 0 )Mean residual lifeMean residual time on page How soon to click… ……………
11
Outline A Primer on Weibull Analysis Weibull distribution and analysis Hazard function and aging effects Weibull Analysis on Dwell Time Goodness-of-Fit Screen-and-glean browsing pattern Screening by categories Predicting Dwell Time Distribution Prediction performance Feature importance Conclusions
12
Goodness-of-Fit Comparison Dwell time collected for 205,873 pages (URLs) in English (US) market, each of which has a minimum of 10k dwell times Comparison on Goodness-of-Fit (GoF) Dwell times for each page are split into training (80%) and testing (20%) Model fitting on training and evaluated on testing Metrics: Log-likelihood and Kolmogorov–Smirnov distance
13
Fitting λ and k Strong Negative Aging What’s the initial screening? Screen-and-glean browsing pattern?
14
P( k |Category): Aging Effect w.r.t. Categories Screening is harsher for less-entertaining topics
15
Outline A Primer on Weibull Analysis Weibull distribution and analysis Hazard function and aging effects Weibull Analysis on Dwell Time Goodness-of-Fit Screen-and-glean browsing pattern Screening by categories Predicting Dwell Time Distribution Prediction performance Feature importance Conclusions
16
Dwell Time Prediction from Page Features Why predicting dwell time? Extend dwell time to pages with less or no dwell time Enable third parties to leverage dwell time even if they don’t have access to real dwell time data Gain insights into what elements affect dwell time Why using only page-level features? Users decide how long to stay with a page based on the experience and perception, rather than PageRank for example Advanced features like PageRank and inlink counts may not be available to all parties
17
Experiment Setup 5000 randomly sampled pages with fitted λ and k as the target values Pages are crawled using a dynamic crawler, which parses the html, executes all dynamic components (e.g., redirections, flashes, javascripts, etc), and finally renders the page “login” pages are removed as they are likely due to time-out redirection 4771 pages left Page-level features HtmlTag: frequencies of 93 Html tags Content: frequencies of top-1000 terms Dynamic: statistics from dynamic crawling Regressor: Multiple Additive Regression Tree (MART) Effectiveness and feature interpretability
18
Baseline returns the mean λ and k Prediction Results Comparisons with various feature configurations Prediction outperforms the baseline HtmlTag and Dynamic are similar effectively when separated, and complementary to each other when combined Content > HtmlTag+Dynamic Content+Dynamic the best: Dynamic captures what users experience after clicks whereas Content shows what users would see in the end
19
Important Features
20
Outline A Primer on Weibull Analysis Weibull distribution and analysis Hazard function and aging effects Weibull Analysis on Dwell Time Goodness-of-Fit Screen-and-glean browsing pattern Screening by categories Predicting Dwell Time Distribution Prediction performance Feature importance Conclusions
21
Conclusions The first Weibull analysis on Web dwell time Draws an analogy between dwell time and lifetime Opens the door to Weibull analysis for temporal implicit feedbacks Dwell time exhibits a strong negative aging effect, which hints a prevalent “screen and glean” browsing pattern Harsher screening for less-entertaining topics Feasible to predict dwell time based on page-level features Extending applicability to less-visited pages and parties without dwell time data Future work Improving prediction accuracy through better feature engineering Weibull analysis for IR
22
Acknowledgments Yutaka Suzue Krysta Svore Qiang Wu Wen-tau Yih Xiaoxin Yin Alice Zheng
23
Q&A Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.