Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason.

Similar presentations


Presentation on theme: "1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason."— Presentation transcript:

1 1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason University Xiaodong Zhang, Ohio State University Yihong (Eric) Zhao, Yahoo!

2 2 Online social networks: platforms for social connections and content sharing social connections common interest topics Networking oriented OSNs –Knowledge-sharing mainly among friends Knowledge-sharing oriented OSNs –Content sharing is among all users Content network User network

3 3 UGC content in online social networks User generated content (UGC) –Users are basic elements of OSNs –OSNs are driven by user contributions Understanding UGC content generation patterns is important –Business success: attract new users and clients –Identify and distinguish active users from spamming users –Predict hot spots and the trends of topics in user communities –Perform efficient resource management in the underlying supporting system Users create new contents Contents attract new users advertisement

4 4 Existing studies about user contributions in online social networks Wikipedia –Power law: core users contribute most articles (ISSI’05) Number of articles a user edited Number of co-authors of a Wiki article Heavy tailed, scale free: highly skewed towards top users –User contribution shifts from “elite” users to common users (CHI’07) Log analysis from 2001 to 2006 Power law or not: no conclusion Delicious social bookmark (CHI’07) –Similar shifts for user contribution as in Wikipedia –Power law or not: no conclusion log i log y slope: -  i y heavy tail i : contribution rank of a user y i : contribution of the user

5 5 Our study UGC content in three large online social networks –Blog, social bookmark, question answer User posting over time Distribution of user contributions Implications of UGC generation patterns Concluding remarks User posting over time

6 6 UGC creation traffic overview Weekly patterns –Blog (article/photo): weekday and weekend posts are similar—daily web journaling –Bookmark/Answer: weekend posts are smaller than weekdays Daily patterns –Peak times are all 11:00 PM local time –Bottom times are different for US and Asia: different cultures Blog article, Asia Bookmark, USAnswer, US Blog photo, Asia

7 7 Dynamics of user joining and posting in OSNs BlogBookmark User join rate (new users per day) –increases with time –bursty in large time scales User increase rate –decrease with time –bursty in large time scales Post increase rate –decrease with time –less bursy than user increase rate Implications –total user population and content do not increase exponentially –User join bursts: post inc rate < user inc rate –Bursts and dynamics need to be considered for data analysis

8 8 User activity over time User’s posting frequency over time –The age of the user in OSN when an UGC object is posted –Bookmark: almost uniform distribution –Blog: a little skewed towards small ages –Answer: more skewed towards small ages User’s lifetime (active duration) in OSNs –Assumed exponential distribution before –For user posting behavior Long lifetime users Short lifetime users Other users: a wide range of lifetime Author’s OSN age of posts Author’s OSN lifetime

9 9 Outline User generated content User posting over time Distribution of user contributions Implications of UGC generation patterns Concluding remarks

10 10 Original and non-original UGC content Three kinds of UGC objects –Original UGC objects –Cut-and-paste objects –Spam and advertisement Spam: filtered out with ML model Cut-and-paste objects in Blog –Posted by a small number of users –No clear posting peak time –Focused on recreation and social event categories Spam users and cut-and-paste users are removed in our analysis Cut-and-paste posts

11 11 Stretched exponential distribution User contribution in a social network follows the stretched exponential distribution log i ycyc b slope: -a i : rank of users ( N users) y : number of objects created by the user Rank order distribution: fat head and thin tail in log-log scale straight line in log x - y c scale (SE scale) log i log y fat head thin tail c : stretch factor

12 12 UGC creation patterns of Blog article photo log scale in x axis powered scale y c log scale thin tail c = 0.418 R 2 = 0.997 Y left: y^c scale Y right: log scale Parameters: maximum likelihood method R 2 : coefficient of determination (1 means a perfect fit) x : contribution rank of user y : number of original posts by the user fat head

13 13 UGC creation patterns of Bookmark Bookmark imports: bookmarks imported from user’s Web browser when joining the system Bookmark posts: bookmarks posted to the system by the bookmark plug-in of web browser Bookmark (imports)Bookmark (all posts) x : contribution rank of user y : number of bookmark posts by the user log scale in x axis powered scale y c log scale thin tail fat head

14 14 UGC creation patterns of Answer Answer (all posts)Answer (best) x : contribution rank of user y : number of answer posts by the user log scale in x axis powered scale y c log scale thin tail fat head Best answer: the asker can select a best answer from all received answers. Best answers are high quality UGC posts since they are judged by the askers themselves.

15 15 Model validation Chi-square test Validation on users joined the system simultaneously –Users join rate increases with time –Some users may become inactive Validation on different parts of workloads –follow SE distribution with the same c –parameter c is the shape factor, not change for different parts of a workload Data set k 22  2 ( ,k-c) Result Blog article 1111.40314.067 pass Blog photo 1214.07215.507 pass Bookmark (all posts) 1011.48612.592 pass Bookmark (imports) 119.36714.067 pass Answer (all posts) 1113.34014.067 pass Answer (best ans) 107.00112.592 pass Chi-square test results (  = 0.05) k : number of bins, O i : total observed posts, E i : expected number of posts

16 16 Outline User generated content User posting over time Distribution of user contributions Implications of UGC generation patterns Conclusion and future work

17 17 The “80-20” rule 80-20 rule of power law distributions –Pareto principle: 20% people own 80% social wealth –Internet systems: 20% web pages account for 80% requests –…–… In social networks –Blog: 20% users for 80% posts –Bookmark: 17% users for 83% posts –Answer: 13% users for 87% posts What is the difference between user contribution distribution in online social networks and user income distribution in a real society? Roughly follows the 80-20 rule User contribution is stretched exponential

18 18 Asymptotical properties of top users log i log y Power law log i log y Stretched exponential Highly skewed towards top users Less skewed towards top users Contributions of top users The cumulative contribution ratio of top-k users among all n users in an OSN A small number of top users cannot dominate the content in an OSN

19 19 The “core” users in social networks Looking for a threshold to identify most important users Power law distribution: hard threshold –By number of or fraction of users –By a predefined user contribution threshold Stretched exponential distribution: general threshold for all systems Let decrease rate of user contribution along rank increase rate of user contribution rank ykyk k/nk/ncumsumn Blog article 4714.8%73.3%348 K Blog photo 2097.7%64.0%269 K Bookmark 2488.3%67.6%1.7 M Answer 2874.7%63.7%10.3 M X 0 = log k, Y 0 = y k :

20 20 Creation patterns of different types of UGC typec all posts0.42 > 1 KB0.39 > 2 KB0.31 with tags0.30 Blog article typec all posts0.25 best ans0.19 typec imports0.33 all posts0.32 Blog photoBookmarkAnswer c 0.32 more effort, smaller c longer articles need more effort to compose, adding tags needs extra effort higher quality, smaller c (more effort to compose) no difference in effort more effort than short blog taking photo, transferring, editing, uploading, writing desc, … Power law ! higher quality and more effort than best answer would have much smaller c user participating effort is even smaller Our conjecture: larger c, flatter user contribution distribution

21 21 Discussion: UGC production vs. UGC consumption Internet media access patterns (PODC’08) –Number of requests to an media object is stretched exponential for different kinds of media systems Media request is content consumption –Stretch factor increases with file length (duration a user views) UGC creation is content production –Stretch factor decreases with the effort to create a UGC object UGC social networks rely on user contribution to attract traffic –Relationship between UGC creation and consumption –More general model for both UGC creation and consumption Understand the driving force of a social network Design effective participation mechanisms for social applications Provide efficient data management for underlying supporting systems

22 22 Outline User generated content User posting over time Distribution of user contributions Implications of UGC generation patterns Concluding remarks

23 23 Conclusion User activities and contributions are critical for knowledge-sharing social networks We have analyzed three large OSNs, and found –User lifetime in OSNs does not follow exponential distribution –User contribution distribution is stretched exponential –Different types of UGC content generation patterns can be modeled with different parameters in SE User contribution model: distribution of individual user behaviors –Building block to understand more complex social network phenomena –Foundation to guide design, modeling and simulation of OSNs

24 24 Thank you!


Download ppt "1 Analyzing Patterns of User Content Generation in Online Social Networks Lei Guo, Yahoo! Enhua Tan, Ohio State University Songqing Chen, George Mason."

Similar presentations


Ads by Google