Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stochastic Models of User-Contributory Web Sites

Similar presentations


Presentation on theme: "Stochastic Models of User-Contributory Web Sites"— Presentation transcript:

1 Stochastic Models of User-Contributory Web Sites
Tad Hogg HP Labs Kristina Lerman USC Information Sciences Institute

2 The Social Web Bugzilla essembly delicious “wisdom of crowds”

3 Activities View existing content Rate existing content Add new content
simple: vote complex: write a review Add new content Link to other users focus of this presentation

4 Aggregate group behavior
Determines structure and usefulness of user-participatory sites Models enable Predicting trends or behaviors E.g., which newly contributed content will become popular Designing web sites E.g., productive information displays Altering user incentives E.g., improve content quality or participation

5 Stochastic Modeling summary
Start with individual user behavior Specify states and transitions between states Determine collective behavior Aggregate behavior of interest Individual user behaviors create transitions among aggregate states Rate equations give dynamics How average collective behavior changes in time How collective behavior depends on user characteristics

6 Illustration – Stochastic Model of Digg
Phenomenology of Digg Users submit and vote on news stories Digg promotes popular stories to front page Digg allows social networking Users can designate Friends and view their friends’ activity on Digg Directed social network Friends of user A are everyone A is watching Fans of A are all users who are watching A Alice’s friend Alice Bob Bob’s fan

7 Lifecycle of a story User submits a story to the Upcoming Stories queue Others vote on (digg) the story If story accumulates enough votes in short time, it is promoted to the Front page The Friends Interface lets users see Stories friends submitted Stories friends voted on, …

8 Model of Digg voting behavior
Stochastic model based on Digg user interface visibility and interestingness  votes Extension to prior model: [Lerman 2007] “law of surfing” for viewing web pages [Huberman et al, 1998] instead of geometric distribution incremental average growth in number of voters’ fans i.e., people who can see story via friends interface Related work: aggregate phenomenological models behavior for Digg, Wikipedia, YouTube, …. e.g., [Wu & Huberman 2007; Crane & Sornette 2008; Wilkinson 2008]

9 Voting on stories combination of visibility: does user see the story?
user interface browse recommended by friends search interest: does user like the story? novelty, … user comes to Digg see the story? vote on the story? yes

10 Story location Digg shows stories as lists
most recent first 15 stories per page user must click to view subsequent pages visibility decreases with distance from top of list A given story moves down the list as new stories added eventually moves to later pages switches from upcoming to top of front page if promoted

11 User behavioral model interest visibility upcoming1 upcomingq … r c n
Ø front1 frontp vote wS r friends Story specific parameters r ‘interestingness’ – prob. story will receive a vote if seen S number of submitter’s fans General parameters n rate users visit Digg c fraction of users viewing upcoming pages w rate fans visit Digg

12 Dynamical model of aggregate behavior
How number of votes Nvote(t) for a story changes nf - rate users find story on the front page queue nu - rate users find story on the upcoming stories queue nfriends - rate users find story through the friends interface r – fraction of users who see the story choose to vote for it visibility

13 Estimating model parameters
Need model parameters for Story visibility Story interestingness Estimate from behavior of sample of users

14 Digg data set Stories from front and upcoming pages
number of votes vs. time since submission for several days in May 2006 prior to availability of Digg API sampled more extensively from front than upcoming pages Number of fans for active users 2152 stories with at least 4 observations submitted by 1212 distinct users 510 of these stories promoted to front page

15 Story visibility User viewing behavior not available:
which stories users look at how they find stories front page, friends interface, … Estimate indirectly from models & data

16 Modeling story visibility
Story location Navigating web sites Number of fans

17 Story location vs. time in each list
For upcoming and front page lists: location on page (1 to 15), which page (1st, 2nd, …) distance from top of list increases linearly with time Rate story position increases: front page: ~0.2 pages/hr upcoming: ~4 pages/hr 1/15th the rates new stories are promoted to front page (~3/hr) submitted as new stories (~60/hr) since each page holds 15 stories Averages over hourly variation [Szabo & Huberman 2008] examples front page p(t) upcoming q(t)

18 Story location: promotion to front page
Digg promotion decision algorithm not public based on popularity expressed by user votes Approximation from data: story promoted if at least 40 votes within 24 hours of submission

19 Modeling story visibility
Story location Navigating web sites Number of fans

20 Navigating through a web site
Empirical model of user following links on a Web site “law of surfing” [Huberman et al. 1998] Inverse Gaussian distribution of #pages viewed before leaving web site few users go beyond 1st page parameters estimated from Digg data & model

21 Modeling story visibility
Story location Navigating web sites Number of fans: visibility via friends interface

22 Story visibility via friends interface
Each voter enables their fans to see story via friends interface Model of number of fans not yet viewing story, s(t) based on number of votes on the story story visible to submitter’s fans at submission time: s(0) fans of prior voters visit Digg new fans from new votes

23 Story interestingness
Reasons users vote for story not available, e.g., topic novelty [Wu & Huberman 2007] popularity (determining interest, not just visibility) e.g., “cool” fashion or gadgets One approach: web-based experiments e.g., [Salganik et al. 2006] Estimate from models & data from vote history after accounting for visibility

24 Model results

25 Solutions: votes vs. time
model vs. observations for 6 stories S r Final votes 5 0.51 2229 0.44 1921 40 0.32 1297 0.28 1039 160 0.19 740 100 0.13 458 model captures qualitative features slow growth initially influence of fans on promotion rapid growth if story promoted (much more visible to users)

26 Model: requirements for promotion
Values of S and r to get the story on front page number of votes number of fans not yet seeing story promotion time 40-vote promotion threshold parameters for plot at right are S=50, r=0.15

27 Promotion to front page: model prediction vs. data: 95% accurate
promotion threshold from model logarithmic scale most stories not promoted, and from people with no fans

28 Additional model insights
Heterogeneity users activity content quality (“interestingness”) Predictability from early reactions to new story

29 Story interestingness
Long-tail distribution (lognormal) a few stories much more interesting than average after accounting for visibility via user interface part of model Open question: why? A multiplicative process underlying user interests? lognormal fit quantile-quantile plot shows good fit good fit with Kolmogorov-Smirnov test distribution of estimated interestingness values

30 Predictions from early behavior
Estimate story interestingness from full history, or using initial votes Behavior predictable from early reaction to story also with YouTube e.g., [Crane & Sornette 2008; Lerman & Galstyan 2008; Szabo & Huberman 2008] example: use first 4 observations r estimates correlate 0.9 with those based on full history prediction of final votes account for 75% of variance rms prediction error: 244 votes

31 Model based on votes only?
Estimate based on initial votes only not including visibility model i.e., ignore effects of ‘law of surfing’ and social network user comes to Digg see the story? vote on the story? yes

32 Model based on votes only?
Full model Votes only variance accounted for 75% 56% rms prediction error 244 327 full model is better than not including visibility (differences significant, p-value <10-4)

33 Future work on models of activities: new content & links
View existing content Rate existing content Add new content What motivates high-quality contribution? Link to other users How do users chose who to link to? What does link signify? common interests? trust in recommendations? focus of this presentation

34 Conclusion Stochastic process approach Applicability:
connect user and system behaviors Applicability: users have limited information and actions limited use of personalized history e.g., user communities on the web not face-to-face small group interactions Example: news aggregator Digg votes from visibility + interestingness user model from info and actions provided by Digg UI


Download ppt "Stochastic Models of User-Contributory Web Sites"

Similar presentations


Ads by Google