Stochastic Models of User-Contributory Web Sites

Slides:



Advertisements
Similar presentations
Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.
Advertisements

Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France.
Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center.
Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study.
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
Simulation Models as a Research Method Professor Alexander Settles.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Add image. 3 “ Content is NOT king ” today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.
Mining Cross-network Association for YouTube Video Promotion Ming Yan Institute of Automation, C hinese Academy of Sciences May 15, 2014.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Sharing Online Resources Social Bookmarking. Ambition in Action Facilitators Stephan Ridgway, Workforce Development.
The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson Ziyad Aljarboua Monday,
Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.
An Introduction to the Powerful Social Network and What it Means for Your Business.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Using a Model of Social Dynamics to Predict Popularity of News Kristina Lerman Tad Hogg USC Information Sciences Institute HP Labs WWW 2010.
The Tube Over Time: Characterizing Popularity Growth of YouTube Videos ` Abstract In this work, we characterize the growth patterns of video popularity.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Dynamics of Collaborative Document Rating Systems (SNA-KDD’07) Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
A measurement-driven Analysis of Information Propagation in the Flickr Social Network Meeyoung Cha Alan Mislove Krisnna P. Gummadi.
ASSIST: Adaptive Social Support for Information Space Traversal Jill Freyne and Rosta Farzan.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Using a model of Social Dynamics to Predict Popularity of News Kristina Lerman, Tad Hogg USC Information Sciences Institute, Institute for Molecular Manufacturing.
Announcements Topics: -Introduction to (review of) Differential Equations (Chapter 6) -Euler’s Method for Solving DEs (introduced in 6.1) -Analysis of.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Characteristics of Information on the Web Dania Bilal IS 530 Spring 2005.
How to Maximize the Reaches with The Right Blog Topics.
Machine Learning: Ensemble Methods
Multiple Regression.
Learning Profiles from User Interactions
Random Walk for Similarity Testing in Complex Networks
Clickprints on the Web: Are there Signatures in Web Browsing Data?
Delicious Social Bookmarking
Manuel Gomez Rodriguez
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
The Power of Networks Six Principles That Connect Our Lives
Travel Time Perception and Learning in Traffic Networks
Science Behind Cross-device Conversion Tracking
What Stops Social Epidemics?
Discover How Your Business Can Benefit from a Facebook Fanpage
Discover How Your Business Can Benefit from a Facebook Fanpage
User Joining Behavior in Online Forums
Personalized Social Image Recommendation
Center for Complexity in Business, R. Smith School of Business
On the Scale and Performance of Cooperative Web Proxy Caching
The Knowledge Center.
Using Google Plus Skills: Use Google Plus
Effective Social Network Quarantine with Minimal Isolation Costs
Multiple Regression.
A New Product Growth Model for Consumer Durables
Do Social Games Help Us Get Socialized
Ensembles.
Model Combination.
Social Software: Wikis
Low-fi Prototyping & Pilot Usability Testing
Chapter 11 Variable Selection Procedures
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
“The Spread of Physical Activity Through Social Networks”
Social Media Google+ Marketing.
GhostLink: Latent Network Inference for Influence-aware Recommendation
Using Instagram as a Marketing Tool
Presentation transcript:

Stochastic Models of User-Contributory Web Sites Tad Hogg HP Labs Kristina Lerman USC Information Sciences Institute

The Social Web Bugzilla essembly delicious “wisdom of crowds”

Activities View existing content Rate existing content Add new content simple: vote complex: write a review Add new content Link to other users focus of this presentation

Aggregate group behavior Determines structure and usefulness of user-participatory sites Models enable Predicting trends or behaviors E.g., which newly contributed content will become popular Designing web sites E.g., productive information displays Altering user incentives E.g., improve content quality or participation

Stochastic Modeling summary Start with individual user behavior Specify states and transitions between states Determine collective behavior Aggregate behavior of interest Individual user behaviors create transitions among aggregate states Rate equations give dynamics How average collective behavior changes in time How collective behavior depends on user characteristics

Illustration – Stochastic Model of Digg Phenomenology of Digg Users submit and vote on news stories Digg promotes popular stories to front page Digg allows social networking Users can designate Friends and view their friends’ activity on Digg Directed social network Friends of user A are everyone A is watching Fans of A are all users who are watching A Alice’s friend Alice Bob Bob’s fan

Lifecycle of a story User submits a story to the Upcoming Stories queue Others vote on (digg) the story If story accumulates enough votes in short time, it is promoted to the Front page The Friends Interface lets users see Stories friends submitted Stories friends voted on, …

Model of Digg voting behavior Stochastic model based on Digg user interface visibility and interestingness  votes Extension to prior model: [Lerman 2007] “law of surfing” for viewing web pages [Huberman et al, 1998] instead of geometric distribution incremental average growth in number of voters’ fans i.e., people who can see story via friends interface Related work: aggregate phenomenological models behavior for Digg, Wikipedia, YouTube, …. e.g., [Wu & Huberman 2007; Crane & Sornette 2008; Wilkinson 2008]

Voting on stories combination of visibility: does user see the story? user interface browse recommended by friends search interest: does user like the story? novelty, … user comes to Digg see the story? vote on the story? yes

Story location Digg shows stories as lists most recent first 15 stories per page user must click to view subsequent pages visibility decreases with distance from top of list A given story moves down the list as new stories added eventually moves to later pages switches from upcoming to top of front page if promoted

User behavioral model interest visibility upcoming1 upcomingq … r c n Ø front1 … frontp vote wS r friends Story specific parameters r ‘interestingness’ – prob. story will receive a vote if seen S number of submitter’s fans General parameters n rate users visit Digg c fraction of users viewing upcoming pages w rate fans visit Digg

Dynamical model of aggregate behavior How number of votes Nvote(t) for a story changes nf - rate users find story on the front page queue nu - rate users find story on the upcoming stories queue nfriends - rate users find story through the friends interface r – fraction of users who see the story choose to vote for it visibility

Estimating model parameters Need model parameters for Story visibility Story interestingness Estimate from behavior of sample of users

Digg data set Stories from front and upcoming pages number of votes vs. time since submission for several days in May 2006 prior to availability of Digg API sampled more extensively from front than upcoming pages Number of fans for active users 2152 stories with at least 4 observations submitted by 1212 distinct users 510 of these stories promoted to front page

Story visibility User viewing behavior not available: which stories users look at how they find stories front page, friends interface, … Estimate indirectly from models & data

Modeling story visibility Story location Navigating web sites Number of fans

Story location vs. time in each list For upcoming and front page lists: location on page (1 to 15), which page (1st, 2nd, …) distance from top of list increases linearly with time Rate story position increases: front page: ~0.2 pages/hr upcoming: ~4 pages/hr 1/15th the rates new stories are promoted to front page (~3/hr) submitted as new stories (~60/hr) since each page holds 15 stories Averages over hourly variation [Szabo & Huberman 2008] examples front page p(t) upcoming q(t)

Story location: promotion to front page Digg promotion decision algorithm not public based on popularity expressed by user votes Approximation from data: story promoted if at least 40 votes within 24 hours of submission

Modeling story visibility Story location Navigating web sites Number of fans

Navigating through a web site Empirical model of user following links on a Web site “law of surfing” [Huberman et al. 1998] Inverse Gaussian distribution of #pages viewed before leaving web site few users go beyond 1st page parameters estimated from Digg data & model

Modeling story visibility Story location Navigating web sites Number of fans: visibility via friends interface

Story visibility via friends interface Each voter enables their fans to see story via friends interface Model of number of fans not yet viewing story, s(t) based on number of votes on the story story visible to submitter’s fans at submission time: s(0) fans of prior voters visit Digg new fans from new votes

Story interestingness Reasons users vote for story not available, e.g., topic novelty [Wu & Huberman 2007] popularity (determining interest, not just visibility) e.g., “cool” fashion or gadgets … One approach: web-based experiments e.g., [Salganik et al. 2006] Estimate from models & data from vote history after accounting for visibility

Model results

Solutions: votes vs. time model vs. observations for 6 stories S r Final votes 5 0.51 2229 0.44 1921 40 0.32 1297 0.28 1039 160 0.19 740 100 0.13 458 model captures qualitative features slow growth initially influence of fans on promotion rapid growth if story promoted (much more visible to users)

Model: requirements for promotion Values of S and r to get the story on front page number of votes number of fans not yet seeing story promotion time 40-vote promotion threshold parameters for plot at right are S=50, r=0.15

Promotion to front page: model prediction vs. data: 95% accurate promotion threshold from model logarithmic scale most stories not promoted, and from people with no fans

Additional model insights Heterogeneity users activity content quality (“interestingness”) Predictability from early reactions to new story

Story interestingness Long-tail distribution (lognormal) a few stories much more interesting than average after accounting for visibility via user interface part of model Open question: why? A multiplicative process underlying user interests? lognormal fit quantile-quantile plot shows good fit good fit with Kolmogorov-Smirnov test distribution of estimated interestingness values

Predictions from early behavior Estimate story interestingness from full history, or using initial votes Behavior predictable from early reaction to story also with YouTube e.g., [Crane & Sornette 2008; Lerman & Galstyan 2008; Szabo & Huberman 2008] example: use first 4 observations r estimates correlate 0.9 with those based on full history prediction of final votes account for 75% of variance rms prediction error: 244 votes

Model based on votes only? Estimate based on initial votes only not including visibility model i.e., ignore effects of ‘law of surfing’ and social network user comes to Digg see the story? vote on the story? yes

Model based on votes only? Full model Votes only variance accounted for 75% 56% rms prediction error 244 327 full model is better than not including visibility (differences significant, p-value <10-4)

Future work on models of activities: new content & links View existing content Rate existing content Add new content What motivates high-quality contribution? Link to other users How do users chose who to link to? What does link signify? common interests? trust in recommendations? focus of this presentation

Conclusion Stochastic process approach Applicability: connect user and system behaviors Applicability: users have limited information and actions limited use of personalized history e.g., user communities on the web not face-to-face small group interactions Example: news aggregator Digg votes from visibility + interestingness user model from info and actions provided by Digg UI