Download presentation
Presentation is loading. Please wait.
Published byMay Douglas Modified over 6 years ago
1
The Spread of Media Content through the Blogosphere
TU Berlin Deutsche Telekom Lab Flash Floods and Ripples: The Spread of Media Content through the Blogosphere Meeyoung Cha Juan A. Navarro Max Planck Institute for Software Systems (MPI-SWS) Hamed Haddadi ICWSM Data Challenge 2009
2
How does content spread in blogs? What kinds of content are shared?
Motivation Blogs play a significant role in today’s Internet culture Blogs are used for information propagation purposes Discuss political issues Review new products and online contents Form communities and special interest groups Increasingly, media content is shared through blogs How does content spread in blogs? What kinds of content are shared?
3
Our goal Characterize how the structure of the blogosphere influences the patterns of content spreading 1. Understand the structure of the blogosphere Is the structure ideal for content dissemination? 2. Understand the spreading patterns of content What types of content spread? How quickly does content spread?
4
Part1. Measurement methodology
Part2. Analysis of network properties Part3. Analysis of spreading patterns
5
Spinn3r dataset Extracted post URL, site, host, language, timestamps, etc. Step1: Focus on top 15 blog domains Step2: Scrape content to find embedded HTML links Code available at Limitations Comments and blogrolls missing Some blogs only post summaries Only used dataset with numbered ‘tiers’
6
Step1: Top 15 blog sites # blogs # posts posts/blog Language 390,812
1,217,757 3.1 English 321,730 1,161,103 3.6 Chinese 254,225 1,666,165 6.6 72,376 1,127,383 15.6 Japanese 66,598 2,120,474 31.8 … Total 1,196,412 8,794,983 7.4 English
7
Step2: Extracting HTML links
Links to media content Links to other blogs
8
Part1. Measurement methodology
Part2. Analysis of network properties Part3. Analysis of spreading patterns
9
Directed network of 85,013 nodes and 129,079 edges
Network of blogs Directed network of 85,013 nodes and 129,079 edges A B
10
Network structure Average node degree 1.5
Power-law degree distribution 6% of links are reciprocal 35% of links cross blog domains 7% of links cross language boundaries [ 73% of blogs in the largest connected component ]
11
Network structure is more sparse than social networks
Density = Ratio of observed links, out of all possible links Network structure is more sparse than social networks
12
Insights for information propagation
Sparse structure & power-law degree distribution Clear preference for bloggers to particular topics or sources Trend setters (high in-degree) and recommenders (high out-degree) Potential factors that can limit spreading Blog domains had no visible effect on linking Language barriers inhibit the flow of information
13
Part1. Measurement methodology
Part2. Analysis of network properties Part3. Analysis of spreading patterns
14
Spreading of media content
What types of content are shared? How quickly does information spread? media
15
Types of content shared
Popular sharing of user-generated content Rank Website # posts 1 youtube.com 206,803 2 photobucket.com 140,194 3 flickr.com 135,327 4 imageshack.us 41,997 5 amazon.com 36,379 6 nytimes.com 33,801 7 twitter.com 30,572 8 technorati.com 27,583 9 tinypic.com 23,899 10 bbc.co.uk 20,893
16
Popularity of YouTube videos
Video popularity follows a power-law distribution: Very large diffusion processes exist Preferential attachment may drive linking
17
Popular video categories
We downloaded metadata of top 10,000 videos Music most popular Category % of links % of videos Music 27.2 19.6 Taken down 19.9 22.3 News & Politics 18.4 23.5 Comedy 9.7 8.9 Entertainment 8.0 8.8 Film & Animation 3.8 4.7 People & Blogs 2.5 2.9 Science & Technology 2.4 1.5 Pets & Animals 1.7 1.2 Education 1.4 0.9 Still spread! Keen on politics
18
Time lag in the spread of videos
Median video age 2 days 72 days 125 days 357 days Flash floods Ripples
19
Example spreading pattern
Other Blogs linking the same video are connected = Diffusion through the blogosphere McCain’s political campaign linked by 79 blogs
20
Insights from spreading patterns
Videos in different genres spread with very different patterns Flash floods: found quickly and spread rapidly Ripples: took longer to spread, re-discovered years after upload Diffusion through links in the blogosphere 24% of videos had any spreading in the blog graph Other spreading factors: featuring and search
21
Part1. Measurement methodology
Part2. Analysis of network properties Part3. Analysis of spreading patterns
22
Conclusion Identified spreading patterns and factors that limit spreading Blogs serve as a medium to filter and spread media content Potential implication: Recommendation systems can take into account and exploit different spreading patterns Future work: spreading patterns of other types of content
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.