Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

Complex Networks Advanced Computer Networks: Part1.
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Scale Free Networks.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Networks FIAS Summer School 6th August 2008 Complex Networks 1.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
Alon Arad Alon Arad Hurst Exponent of Complex Networks.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Link Analysis, PageRank and Search Engines on the Web
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Advanced Topics in Data Mining Special focus: Social Networks.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Computer Science 1 Web as a graph Anna Karpovsky.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Presented By: - Chandrika B N
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Soon-Hyung Yook, Sungmin Lee, Yup Kim Kyung Hee University NSPCS 08 Unified centrality measure of complex networks: a dynamical approach to a topological.
Analyzing the Vulnerability of Superpeer Networks Against Attack Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
Synchronization in complex network topologies
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Properties of Growing Networks Geoff Rodgers School of Information Systems, Computing and Mathematics.
1 CS 430: Information Discovery Lecture 5 Ranking.
Urban Traffic Simulated From A Dual Perspective Hu Mao-Bin University of Science and Technology of China Hefei, P.R. China
Class 2: Graph Theory IST402.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
On the behaviour of an edge number in a power-law random graph near a critical points E. V. Feklistova, Yu.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Centralities (Gephi and Python)
Random Walk for Similarity Testing in Complex Networks
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
The PageRank Citation Ranking: Bringing Order to the Web
The PageRank Citation Ranking: Bringing Order to the Web
Sequential Algorithms for Generating Random Graphs
DTMC Applications Ranking Web Pages & Slotted ALOHA
Effective Social Network Quarantine with Minimal Isolation Costs
Katz Centrality (directed graphs).
Modelling and Searching Networks Lecture 5 – Random graphs
Modelling and Searching Networks Lecture 6 – PA models
Network Science: A Short Introduction i3 Workshop
Discrete Mathematics and its Applications Lecture 5 – Random graphs
Lecture 10 Graph Algorithms
Discrete Mathematics and its Applications Lecture 6 – PA models
Presentation transcript:

Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer School Paris 9 August

Motivations Networks – are becoming ubiquitous at different levels and scales, within natural and artificial systems Related to many and various domains. – are abstract representation of systems Networks may be – homogeneous or heterogeneous – static or dynamic – Finite or open – … 2

Project’s Aim Explore dynamic network properties Build a dynamic process Implement a simulator Try to understand and explain the evolution of networks of time by the chosen process 3 Chosen Dynamic process : Rewiring based on PageRank method Chosen Dynamic process : Rewiring based on PageRank method

PageRank method Intuitively a page has a high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page has many backlinks and when a page has a few highly ranked backlinks Mathematically, PageRank corresponds to the principal eigenvector of the normalized link matrix of the graph 4

PageRank™ Given directed connections, if you randomly put people on each node, then let them walk the graph edges forever, where do they end up? A A B B Alpha “ghost edge” to add randomness and for stability in disconnected networks Alpha is “democracy factor” (that person jumps to a random page)

Overview Description of PageRank method Model Description Simulator presentation Virtual Experiments First Results Conclusions & lessons learnt 6

Model: PageRank based Rewiring (PR2) Step 1: Build an initialGraph which is a random directed network (i.e. having N vertices, connect each pair (or not) with probability p) ; Step2 : let g = initialGraph Rewiring: – Select randomly one edge from g old_E=(source_node,end_node); – For a fixed probability alpha (the probability that an internet user may choose to visit a vertex), compute the PageRank Vector PV of g PV = (p1,p2,p3,p4,…,pn); – Using PV, compute a list of values L=[p1,p1+p2,p1+p2+p3, ….., p1+p2+p3+…pn]; – Select randomly a real value, than match it with the corresponding value in L in order to deduce its associated node Check that this node is different from source_node and from end_node, otherwise repeat this selection (this result is the new target_node); – Remove edge old_E from g – Add new edge (source_node, target_node) to g Step3 : Repeat step 2 on the modified network g during TimeSteps step 7 Model parameters N P Alpha TimeSteps Model parameters N P Alpha TimeSteps Probabilities and Randomness - Creation of the initial graph - Selection of edge to be rewired - Target selection - Alpha ? Probabilities and Randomness - Creation of the initial graph - Selection of edge to be rewired - Target selection - Alpha ?

PR2 Characteristics The networks evolving at each simulation run are : – Finite : fixed total number of nodes and edges, – Directed: – The initial graph is random (Erdos Renyi) – There are no weights on edges 8

Main research questions How does the structure of the networks evolves over time ? Does network degree distribution converge to power-law? How does PageRank change over time? Can we represent express the transition rate as a function of alpha ? How does degree Distribution change over time and with rate as a function of alpha ? Can critical values of p, alpha and attractors, etc. be identified? Does it converge towards a “stable” topology? How does highly ranked nodes may evolve? How does the network “size”(#nodes, # edges) impact on this dynamic? 9

Expected Statistical Characterizations inDegree Distribution – Expect this to converge to power-law PageRank Distribution – Not sure, possibly also power-law PageRank vector evolution – Not sure what to expect, possibly continual change even for low alpha values 10

Thoughts on Probabilities Defs: P = probability of edges between two nodes when creating graph PRt = PageRank vector at time t Prob(PRt) = Prob(PRt-1)/# random numbers in space * 1/(num edges) - double counting? Prob(PR0) = (# graphs w/PR = PR0(N,P))/(# graphs with (N,P)) Goal: Prob(PRt) = f(alpha, p, N) - Some formulation should be possible! 11

Phase Space Remind you, PR is a little bit like in-degree, but gives “deeper” information about the network. My popularity = how much am I liked by popular nodes? - recursive 12 A B C 1...N1...N PR points towards “popular” nodes

Phase Space Want to understand relationships between: – PR and time – how long does it take to settle to certain behavior? – PR and alpha – how alpha affects PR dynamics? – PR and p – how p affects PR dynamics? Problem: Many ways an N-dimensional vector can change over time… – Can we identify critical points, attractors, divergent areas of PR variance, PR change over time as a function of p, alpha? 13

Phase Space So, try plotting: -PR distribution over time – does this converge? -Variance of PR over time – converge? -Variance of PR for different alpha values – critical values, attractors? -Variance of PR for different p values 14

Simulator Development – Previous Work Python, using Numpy, Scipy, Networkx Showed hub migration over time 15

Simulator Development (1) Mathematica 1 st Interactive simulator – Run and “see” 16

Interactive simulator overview 17

Simulator Development (1) 2 nd “batch mode”simulator – Run and save into files Mathematica verrrrryyy sloooooooww 18

Experiments N = 50 For each P in [ 0.01 …1 ]  10 steps in a logarithmic scale Create initialGraph randomly Save initialGraph For each Alpha in [0...3 ] step =.01 For each NumRuns in [1..20] For each TimeSteps = 500 Save nested list of PageRank vectors  [TimeSteps * NumRuns] (10000 PageRank vectors) Save nested list of InDegree  [TimeSteps * NumRuns] (10000 degree vectors) Save finalGraph  20 hours to run 19

First Results Use Matlab or C instead! Slowness of Mathematica and problems with behavior consistency were not at all expected. 20

Conclusions (1/2) Investigating dynamic processes on large scale networks requires: – Incremental Modeling Models should be Kept as Simple as possible, then enriched little by little – “Good”/adequate choice of programming tools – Validation is a crucial issue Model and Simulator MUST be validated Need data, other models – Experiments & Analysis are a long time activities which should also be conducted gradually 21

Conclusions(2/2) Collaboration among various disciplines is necessary : NOT only computer science experts – need physics and social science, among others 22

THANK YOU 23