Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France.
Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Advanced Topics in Data Mining Special focus: Social Networks.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
CS728 Lecture 5 Generative Graph Models and the Web.
Trends in Object-Oriented Software Evolution: Investigating Network Properties Alexander Chatzigeorgiou George Melas University of Macedonia Thessaloniki,
Graphs (Part I) Shannon Quinn (with thanks to William Cohen of CMU and Jure Leskovec, Anand Rajaraman, and Jeff Ullman of Stanford University)
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
Mining and Searching Massive Graphs (Networks)
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
On the Structure, Properties and Utility of Internal Corporate Blogs Pranam Kolari Tim Finin, Yelena Yesha, Yaacov Yesha Kelly Lyons, Stephen Perelgut,
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Advanced Topics in Data Mining Special focus: Social Networks.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Models of Influence in Online Social Networks
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Graph Algorithms: Properties of Graphs? William Cohen.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Models of Web-Like Graphs: Integrated Approach
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
The simultaneous evolution of author and paper networks
Topics In Social Computing (67810)
Generative Model To Construct Blog and Post Networks In Blogosphere
A Locality Model of the Evolution of Blog Networks
Lecture 13 Network evolution
The likelihood of linking to a popular website is higher
Peer-to-Peer and Social Networks Fall 2017
Peer-to-Peer and Social Networks
Discovery of Blog Communities based on Mutual Awareness
Lecture 21 Network evolution
Network Models Michael Goodrich Some slides adapted from:
Presentation transcript:

Masters Thesis Defense Amit Karandikar Advisor: Dr. Anupam Joshi Committee: Dr. Finin, Dr. Yesha, Dr. Oates Date: 1 st May 2007 Time: 9:30 am Place: ITE 325B Generative Model To Construct Blog and Post Networks In Blogosphere

2 Outline Introduction Motivation Thesis Contribution Interactions in Blogosphere Proposed Model Experiments and Results Conclusion

3 Generative model: A generative model is a model for randomly / systematically generating the observed data using some input parameters. Parameters could be latent or input to the model. Blogosphere: Blogosphere is the collective term encompassing all blogs linked together forming as a community or social network. Blog network: Network formed by considering each blog single node. Post Network: Network formed considering post as a node; ignoring its parent blog. Introduction Generative Model To Construct Blog and Post Networks In Blogosphere finin.livejournal.com joshi.blogspot.com oates.myspace.comyesha.blogspot.com

4 Basics.. Graphs are everywhere.. and so are Power laws!! Internet Mapping Project [lumeta.com] Friendship Network [Moody ‘01] In simple words, power law can be explained by “rich get richer phenomenon” OR “20% of the population holds 80% of the wealth” Considering web as a graph: Scale-free network: Structure and properties independent of network size Few high connectivity node (hubs) Properties of interest (graph theory) Average degree of node, degree distribution, degree correlation, distribution of strongly/weakly connected components, clustering coefficient and reciprocity

5 Motivation Why simulate blog graphs? Reduce time to generate data - crawling the blogosphere over a few weeks - sampling the right blogs to get a representative sample Reduce time in preprocessing and data cleaning - removing links pointing outside the dataset, outside the time frame - splog removal [1] Generate graphs of different properties\sizes - average degree of node, degree distributions Testing of new algorithms for blog graphs - e.g. spread of influence in blogosphere [2], community detection [3] Extrapolation - how will fast growth affect the blogosphere properties? - how does this affect the connected components? [1] Kolari et al “Svms for the blogosphere: Blog identification and splog detection,” in AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, [2] Java et al “Modeling the spread of influence on the blogosphere,” tech. rep., University of Maryland, Baltimore County, March [3] Lin et al “Discovery of Blog Communities based on Mutual Awareness

6 Thesis Contribution 1.To propose a generative model for a blog-blog network using preferential attachment and uniform random attachment by modeling the interactions among bloggers 2.To generate post-post network as part of the generative model for blog graphs. 3.Compare the properties of the simulated blog and post networks with the properties observed in the available real blog datasets. Datasets Workshop on the Weblogging Ecosystem (WWE 2006) International Conference on Weblogs and Social Media (ICWSM 2007)

7 Why existing models are not enough? Erdos-Renyi random model Barabasi Albert preferential attachment web model Preferential Attachment: The likelihood of linking to a popular website is higher [1] M. Newman, “The structure and function of complex networks,” 2003 [3] R. Albert, Statistical mechanics of complex networks. PhD thesis, [7] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst, “Cascading behavior in large blog graphs”, ICWSM, 2007 [32] X. Shi, B. Tseng, and L. Adamic, “Looking at the blogosphere topology through different lenses” ICWSM, 2007 Two level network: blog and post level Inlinks and outlinks to and from posts NEED to model blogger interactions

8 Interactions in blogosphere Interesting findings from PEW Internet survey [1] - Blog writers are enthusiastic blog readers - Most bloggers post infrequently - Linking in the neighborhood: preferential or random? (friends blog, blogroll) Blogger tend to link to some (how many?) of the posts that they read recently (often preferentially, sometimes random) Is popularity (inlinks) proportional to blogger activity (outlinks)? [NO] [2] [1] A. Lenhart and S. Fox, “Bloggers: A portrait of the internet’s new storytellers.” [2] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst, “Cascading behavior in large blog graphs”, ICWSM 2007 Model parameters

9 Model Parameters 1.Probability of random reads (rR) 2.Probability of randomly selecting writer (rW) 3.Probability that new node does not link to the existing network (pD) 4.Growth exponent (g) – how many links should be added every step?

10 Proposed Model: Blog view Should I link to someone? If yes who? >> Preferentially based on indegree of node 1. Add new blog node 2. Select writer 3. Writers read blog posts, write posts Writer selection: randomly? OR >> Preferentially based on outdegree? Should I read- randomly?- preferentially? I will not link to anyone! Random writer Random destination Reciprocal links Strongly connected components Subset of nodes having directed path from every node to every other node Weakly connected components Information flow Step=1 Step=2 michellemalkin dailykos

11 Proposed Model: Post view Number of links? Blogger A Blogger B Post 1 Post 2 Post 3

12 Growth of blog graphs: Densification Densification [1] has been observed in various real networks including blogosphere Number of edges grows faster than number of nodes: super linear growth function [1] ] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst, “Cascading behavior in large blog graphs”, ICWSM 2007 Reciprocity and clustering coefficient increase with growth exponent Average degree increases with growth (evolution time)

13 Properties of simulated blog network

14 Properties of simulated post network

15 Blogosphere: Blog Inlinks distribution Blogosphere follows power law distribution for blog inlinks and outlinks, post inlinks and post outlinks, component sizes, posts per blog, size of cascades … Power law distribution Slope = Very few blog nodes have very high inlinks Large number of blog nodes have very few inlinks

16 Simulation: Blog Inlinks distribution Similar curves are observed for properties of simulated blog and posts networks Power law distribution Slope = -1.72

17 Power law distributions for various network sizes Similar shape of curves for degree distributions as observed by Shi et al [1] in the “real” blogosphere. [1] X. Shi, B. Tseng, and L. Adamic, “Looking at the blogosphere topology through different lenses,” in ICWSM, 2007

18 Hop plot Average neighborhood size Vs. Hop count Hop plot shows the reachability of nodes in the network After N hops, hop plot becomes constant Comparison of hop plots for ICWSM, WWE and Blogosphere (650K blog nodes, 1.4 million links) Reachability? pD = probability that new node remains disconnected

19 Simulation: Scatter plot and degree correlations Correlation Coefficients ICWSM: WWE: 0.02 Simulation: 0.1 Correlation coefficient close to zero means there is NO definite relation between indegree and outdegree of blog nodes Random writers (rW) helps to model low correlation coefficient Popular blogs (high inlinks) Popular avid writers (high inlinks and outlinks) Avid writers (high outlinks) BA model correlation coefficient = 1

20 Distribution of SCC in blog and post network (WWE and Simulation) Community detection, modeling influence uses connected components

21 Distribution of WCC in post network (WWE and Simulation) Power law distribution in WCC for post network

22 Simulation: Posts per blog distribution Posts per blog also follows a power law distribution [1] [1] ] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst, “Cascading behavior in large blog graphs”, ICWSM 2007 Power law distribution Slope = -1.71

23 Effect of increase in blogs Degree distributions almost the same Reciprocity increases Average degree increases Clustering coefficient and reciprocity of the post network is much less compared to the blog network

24 Effect of parameters Random reads (rR), random writers (rW), disconnected nodes (pD) Increasing rR (random reads), decreases reciprocity because it reduces the likelihood of getting reverse link Empirically rW = 0.35 (random writers) gives low degree correlation and similar values for other parameters as the blogosphere Increasing pD reduces the size of largest WCC

25 Conclusion 1.Simulation resembles blogosphere in degree distributions, degree correlations, reciprocity, average degree, clustering coefficient, component distribution for blog and post networks. 2.Simulated post network is sparse compared to blog network and posts per blogs follows a power law distribution as observed in blogosphere. 3.Useful tool for analysis of blogosphere, testing new algorithms and extrapolation (how will increase in X affect some Y?)

26 Future work Can we model buzz and popularity in the post network? What is the effect of buzz on the properties of the network? In-depth temporal analysis of evolving blog graphs Can we enrich the model with topical information? How can we model the blogroll?

27 Questions? Thank you! Acknowledgements Advisor, committee members, coauthors, friends at UMBC Data BlogPulse, ICWSM, WWE