1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason.

Slides:



Advertisements
Similar presentations
On-line media tools for strategic communications purposes When using media tools for communication we try to use the latest technologies such us blogging,
Advertisements

Our Digital World Second Edition
A Systematic Study of the Mobile App Ecosystem Thanasis Petsas, Antonis Papadogiannakis, Evangelos P. Markatos Michalis PolychronakisThomas Karagiannis.
WEB 2.0. What we are speaking about… Transformation of WEB, the WEB 2.0 –New generation of websites… –Importance of Open Data… –Importance of Users… –Web.
Telligent Social Analytics Research & Tools Marc A. Smith Chief Social Scientist Telligent Systems.
Toyota InfoTechnology Center U.S.A, Inc. 1 Mixture Models of End-host Network Traffic John Mark Agosta, Jaideep Chandrashekar, Mark Crovella, Nina Taft.
Designing Online Communities: If We Build it, Will They Come? Yvonne Clark Instructional Designer Penn State University.
Asking Questions on the Internet
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
GlobeTraff A traffic workload generator for the performance evaluation of ICN architectures K.V. Katsaros, G. Xylomenos, G.C. Polyzos A.U.E.B. (presented.
More data on this topic available from:: Web 2.0 & Paid Content: A 5-Step Tour Anne Holland, President MarketingSherpa.
Web 2.0 Is the Future of Education. We're about to have the biggest discussion about education in decades, maybe longer.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
1 Web Performance Modeling Chapter New Phenomena in the Internet and WWW Self-similarity - a self-similar process looks bursty across several time.
Traffic Characteristics and Communication Patterns in Blogosphere A brilliant and insightful analysis of the access methods of the blogosphere community.
1 Web 2.0 Presenter: Katie Cunningham Date: February 11, 2009 National Aeronautics and Space Administration
A Hierarchical Characterization of a Live Streaming Media Workload IEEE/ACM Trans. Networking, Feb Eveline Veloso, Virg í lio Almeida, Wagner Meira,
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
 2008 Pearson Education, Inc. All rights reserved What Is Web 2.0?  Web 1.0 focused on a relatively small number of companies and advertisers.
Investigating Forms of Simulating Web Traffic Yixin Hua Eswin Anzueto Computer Science Department Worcester Polytechnic Institute Worcester, MA.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Consumers on the Web: Identification of usage patterns Consumers on the Web: Identification of usage patterns by Nina Koiso-Kanttila
Creating Online Class Communities Jennifer Dorman Discovery Education
Top Objectives: 1.Increase web traffic and exposure 2.Become definitive authority on Coffee 3.Increase sales to coffee centric Food Service Operators 4.Engage.
Social Networking – The Ways and Means Rosey Broderick May 2011.
IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio.
Add image. 3 “ Content is NOT king ” today 3 40 analog cable digital cable Internet 100 infinite broadcast Time Number of TV channels.
1 The Stretched Exponential Distribution of Internet Media Access Patterns Lei Yahoo! Inc. Enhua Ohio State University Songqing George.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.
SEO RAISERS is a group of young, dynamic and like-minded individuals who have an immense passion for technology. In the rapidly changing world of Internet.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Golder and Huberman, 2006 Journal of Information Science Usage Patterns of Collaborative Tagging System.
1 Measurements, Analysis, and Modeling of BitTorrent-like Systems Lei Guo 1, Songqing Chen 2, Zhen Xiao 3, Enhua Tan 1, Xiaoning Ding 1, and Xiaodong Zhang.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Influence and Correlation in Social Networks Azad University KurdistanSocial Network.
P.1Service Control Technologies for Peer-to-peer Traffic in Next Generation Networks Part2: An Approach of Passive Peer based Caching to Mitigate P2P Inter-domain.
1 Computing with Social Networks on the Web (2008 slide deck) Jennifer Golbeck University of Maryland, College Park Jim Hendler Rensselaer Polytechnic.
Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, Erel Uziel SIGIR ’ 10.
Models and Algorithms for Complex Networks Power laws and generative processes.
Poking Facebook: Characterization of OSN Applications Minas Gjoka, Michael Sirivianos, Athina Markopoulou, Xiaowei Yang University of California, Irvine.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
FITT Fostering Interregional Exchange in ICT Technology Transfer Communication & Collaboration Tools.
Social software YEFI P. TELAUMBANUA What is Social Software? It is a kind of an interactive tools handle mediated interactions between a pair or.
Feedback Effects between Similarity and Social Influence in Online Communities David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth.
1 Insights into Access Patterns of Internet Media Systems Lei Guo.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
1 Why is P2P the Most Effective Way to Deliver Internet Media Content Xiaodong Zhang Ohio State University In collaborations with Lei Guo, Yahoo! Songqing.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Lecture 5 Web 2.0 Teaser Instructor: Jie Yang Department of Computer Science University of Massachusetts Lowell Exploring the Internet, Fall 2011.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
10/1/2008 Hessie Jones The Art of Conversation Social Media: The Key to Effectively Attracting and Retaining Customers for Life.
Quality Is in the Eye of the Beholder: Meeting Users ’ Requirements for Internet Quality of Service Anna Bouch, Allan Kuchinsky, Nina Bhatti HP Labs Technical.
LIBRARY 2.0 Cleveland State University Library July 10, 2008.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
FELICIAN UNIVERSITY Creating a Learning Community Using Knowledge Management and Social Media Dr. John Zanetich, Associate Professor Felician University.
Social Media & Social Networking 101 Canadian Society of Safety Engineering (CSSE)
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VIII Web Performance Modeling (Book, Chapter 10)
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
The simultaneous evolution of author and paper networks
Measurements, Analysis, and Modeling of BitTorrent-like Systems
An IP-based multimedia traffic generator
User Joining Behavior in Online Forums
The Use of Social Media in Nursing: Pitfalls and Opportunities
A long written work by an expert, giving a broad overview of a topic, aimed at students. Textbook.
Presentation transcript:

1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason University Xiaodong Zhang, Ohio State University Yihong (Eric) Zhao, Yahoo!

2 Online social networks: platforms for social connections and content sharing social connections common interest topics Networking oriented OSNs –Knowledge-sharing mainly among friends Knowledge-sharing oriented OSNs –Content sharing is among all users Content network User network

3 UGC content in online social networks User generated content (UGC) –Users are basic elements of OSNs –OSNs are driven by user contributions Understanding UGC content generation patterns is important –Business success: attract new users and clients –Identify and distinguish active users from spamming users –Predict hot spots and the trends of topics in user communities –Perform efficient resource management in the underlying supporting system Users create new contents Contents attract new users advertisement

4 Existing studies about user contributions in online social networks Wikipedia –Power law: core users contribute most articles (ISSI’05) Number of articles a user edited Number of co-authors of a Wiki article Heavy tailed, scale free: highly skewed towards top users –User contribution shifts from “elite” users to common users (CHI’07) Log analysis from 2001 to 2006 Power law or not: no conclusion Delicious social bookmark (CHI’07) –Similar shifts for user contribution as in Wikipedia –Power law or not: no conclusion log i log y slope: -  i y heavy tail i : contribution rank of a user y i : contribution of the user

5 Our study UGC content in three large online social networks –Blog, social bookmark, question answer User posting over time Distribution of user contributions Implications of UGC generation patterns Concluding remarks User posting over time

6 UGC creation traffic overview Weekly patterns –Blog (article/photo): weekday and weekend posts are similar—daily web journaling –Bookmark/Answer: weekend posts are smaller than weekdays Daily patterns –Peak times are all 11:00 PM local time –Bottom times are different for US and Asia: different cultures Blog article, Asia Bookmark, USAnswer, US Blog photo, Asia

7 Dynamics of user joining and posting in OSNs BlogBookmark User join rate (new users per day) –increases with time –bursty in large time scales User increase rate –decrease with time –bursty in large time scales Post increase rate –decrease with time –less bursy than user increase rate Implications –total user population and content do not increase exponentially –User join bursts: post inc rate < user inc rate –Bursts and dynamics need to be considered for data analysis

8 User activity over time User’s posting frequency over time –The age of the user in OSN when an UGC object is posted –Bookmark: almost uniform distribution –Blog: a little skewed towards small ages –Answer: more skewed towards small ages User’s lifetime (active duration) in OSNs –Assumed exponential distribution before –For user posting behavior Long lifetime users Short lifetime users Other users: a wide range of lifetime Author’s OSN age of posts Author’s OSN lifetime

9 Outline User generated content User posting over time Distribution of user contributions Implications of UGC generation patterns Concluding remarks

10 Original and non-original UGC content Three kinds of UGC objects –Original UGC objects –Cut-and-paste objects –Spam and advertisement Spam: filtered out with ML model Cut-and-paste objects in Blog –Posted by a small number of users –No clear posting peak time –Focused on recreation and social event categories Spam users and cut-and-paste users are removed in our analysis Cut-and-paste posts

11 Stretched exponential distribution User contribution in a social network follows the stretched exponential distribution log i ycyc b slope: -a i : rank of users ( N users) y : number of objects created by the user Rank order distribution: fat head and thin tail in log-log scale straight line in log x - y c scale (SE scale) log i log y fat head thin tail c : stretch factor

12 UGC creation patterns of Blog article photo log scale in x axis powered scale y c log scale thin tail c = R 2 = Y left: y^c scale Y right: log scale Parameters: maximum likelihood method R 2 : coefficient of determination (1 means a perfect fit) x : contribution rank of user y : number of original posts by the user fat head

13 UGC creation patterns of Bookmark Bookmark imports: bookmarks imported from user’s Web browser when joining the system Bookmark posts: bookmarks posted to the system by the bookmark plug-in of web browser Bookmark (imports)Bookmark (all posts) x : contribution rank of user y : number of bookmark posts by the user log scale in x axis powered scale y c log scale thin tail fat head

14 UGC creation patterns of Answer Answer (all posts)Answer (best) x : contribution rank of user y : number of answer posts by the user log scale in x axis powered scale y c log scale thin tail fat head Best answer: the asker can select a best answer from all received answers. Best answers are high quality UGC posts since they are judged by the askers themselves.

15 Model validation Chi-square test Validation on users joined the system simultaneously –Users join rate increases with time –Some users may become inactive Validation on different parts of workloads –follow SE distribution with the same c –parameter c is the shape factor, not change for different parts of a workload Data set k 22  2 ( ,k-c) Result Blog article pass Blog photo pass Bookmark (all posts) pass Bookmark (imports) pass Answer (all posts) pass Answer (best ans) pass Chi-square test results (  = 0.05) k : number of bins, O i : total observed posts, E i : expected number of posts

16 Outline User generated content User posting over time Distribution of user contributions Implications of UGC generation patterns Conclusion and future work

17 The “80-20” rule rule of power law distributions –Pareto principle: 20% people own 80% social wealth –Internet systems: 20% web pages account for 80% requests –…–… In social networks –Blog: 20% users for 80% posts –Bookmark: 17% users for 83% posts –Answer: 13% users for 87% posts What is the difference between user contribution distribution in online social networks and user income distribution in a real society? Roughly follows the rule User contribution is stretched exponential

18 Asymptotical properties of top users log i log y Power law log i log y Stretched exponential Highly skewed towards top users Less skewed towards top users Contributions of top users The cumulative contribution ratio of top-k users among all n users in an OSN A small number of top users cannot dominate the content in an OSN

19 The “core” users in social networks Looking for a threshold to identify most important users Power law distribution: hard threshold –By number of or fraction of users –By a predefined user contribution threshold Stretched exponential distribution: general threshold for all systems Let decrease rate of user contribution along rank increase rate of user contribution rank ykyk k/nk/ncumsumn Blog article %73.3%348 K Blog photo %64.0%269 K Bookmark %67.6%1.7 M Answer %63.7%10.3 M X 0 = log k, Y 0 = y k :

20 Creation patterns of different types of UGC typec all posts0.42 > 1 KB0.39 > 2 KB0.31 with tags0.30 Blog article typec all posts0.25 best ans0.19 typec imports0.33 all posts0.32 Blog photoBookmarkAnswer c 0.32 more effort, smaller c longer articles need more effort to compose, adding tags needs extra effort higher quality, smaller c (more effort to compose) no difference in effort more effort than short blog taking photo, transferring, editing, uploading, writing desc, … Power law ! higher quality and more effort than best answer would have much smaller c user participating effort is even smaller Our conjecture: larger c, flatter user contribution distribution

21 Discussion: UGC production vs. UGC consumption Internet media access patterns (PODC’08) –Number of requests to an media object is stretched exponential for different kinds of media systems Media request is content consumption –Stretch factor increases with file length (duration a user views) UGC creation is content production –Stretch factor decreases with the effort to create a UGC object UGC social networks rely on user contribution to attract traffic –Relationship between UGC creation and consumption –More general model for both UGC creation and consumption Understand the driving force of a social network Design effective participation mechanisms for social applications Provide efficient data management for underlying supporting systems

22 Outline User generated content User posting over time Distribution of user contributions Implications of UGC generation patterns Concluding remarks

23 Conclusion User activities and contributions are critical for knowledge-sharing social networks We have analyzed three large OSNs, and found –User lifetime in OSNs does not follow exponential distribution –User contribution distribution is stretched exponential –Different types of UGC content generation patterns can be modeled with different parameters in SE User contribution model: distribution of individual user behaviors –Building block to understand more complex social network phenomena –Foundation to guide design, modeling and simulation of OSNs

24 Thank you!