A Measurement -driven Analysis of Information Propagation in the Flickr Social Network Offense April 29
Outline 1 、 Measurement Methodology 2 、 Network Topology and Picture Popularity 3 、 Topology Distribution of Picture Popularity 4 、 Temporal Evolution of Picture Popularity 5 、 Information Propagation via Social Links 6 、 some other issues
Measurement Methodology Data Set 2.5 million users? One user can apply two, three or more accounts in order to get larger upload(100MB/Month limited) and store more pictures (the latest 200 pictures displayed in personal page limited). The paper didn ’ t investigate data filter issue before data analysis.
Measurement Methodology The paper start with randomly selected user for “ snowball ” method __ the dataset is biased 25% of the users __the dataset is not enough to represent whole Flickr network. The base of the research, especially user seeds; Flickr social network the paper captured are biased data. So, the results are confident or not?
Measurement Methodology Flickr social network graph Does it correctly reflect characters of Flickr social network? For example, B is a friend of A in the graph, the following research considers the communication between B and A must be in the social network. But in fact, B might have gotten there by other means. eg.B via goole found A and then they become friends, not via social network, similar, after they became friends, a same picture appears together in the A ’ s and B ’ s page is not only via social network graph.
Network Topology and Picture Popularity The three figures are similar, the paper doesn ’ t provide insights about the difference between them, but only presents some results (heavy-tailed dist.) which can be concluded easily?
Topology Distribution of Picture Popularity It is observed that value of 1-hop away is decreasing with the fans increasing, but for the 2-hops away and 3-hops away, it ’ s opposite. Why?
Temporal Evolution of Picture Popularity The paper defines three distinct growth pattern, why the paper doesn ’ t display general data distribution or something else according to the definition? How to utilize the patterns to analyse more problems?
Information Propagation via Social Links Why the percentage is 100% which is pointed out above? Another various mechanisms are not at play? especially at high fans popularity? paper tell us at least 5 kinds of mechanisms Why percentage of “ photos ” in “ cascades from uploaders ” column) increases with popularity (Fans) increasing?
some other issues 1 、 Flickr can not represent a general network, because it propagates locally, spreads slowly, and information exchange is delay, how to apply the research to general cases? Why are we studying it? 2 、 Can we do the same or contrary analysis for much larger networks?
some other issues 3 、 Why do the paper repeat some things from their previous study on Flickr, the previous paper they or someone else wrote. 4 、 How to estimate the impact only making research on friends relationship network, so, if their finding will in according with the results of research on another relationship set in same network.