Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer School Paris 9 August
Motivations Networks – are becoming ubiquitous at different levels and scales, within natural and artificial systems Related to many and various domains. – are abstract representation of systems Networks may be – homogeneous or heterogeneous – static or dynamic – Finite or open – … 2
Project’s Aim Explore dynamic network properties Build a dynamic process Implement a simulator Try to understand and explain the evolution of networks of time by the chosen process 3 Chosen Dynamic process : Rewiring based on PageRank method Chosen Dynamic process : Rewiring based on PageRank method
PageRank method Intuitively a page has a high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page has many backlinks and when a page has a few highly ranked backlinks Mathematically, PageRank corresponds to the principal eigenvector of the normalized link matrix of the graph 4
PageRank™ Given directed connections, if you randomly put people on each node, then let them walk the graph edges forever, where do they end up? A A B B Alpha “ghost edge” to add randomness and for stability in disconnected networks Alpha is “democracy factor” (that person jumps to a random page)
Overview Description of PageRank method Model Description Simulator presentation Virtual Experiments First Results Conclusions & lessons learnt 6
Model: PageRank based Rewiring (PR2) Step 1: Build an initialGraph which is a random directed network (i.e. having N vertices, connect each pair (or not) with probability p) ; Step2 : let g = initialGraph Rewiring: – Select randomly one edge from g old_E=(source_node,end_node); – For a fixed probability alpha (the probability that an internet user may choose to visit a vertex), compute the PageRank Vector PV of g PV = (p1,p2,p3,p4,…,pn); – Using PV, compute a list of values L=[p1,p1+p2,p1+p2+p3, ….., p1+p2+p3+…pn]; – Select randomly a real value, than match it with the corresponding value in L in order to deduce its associated node Check that this node is different from source_node and from end_node, otherwise repeat this selection (this result is the new target_node); – Remove edge old_E from g – Add new edge (source_node, target_node) to g Step3 : Repeat step 2 on the modified network g during TimeSteps step 7 Model parameters N P Alpha TimeSteps Model parameters N P Alpha TimeSteps Probabilities and Randomness - Creation of the initial graph - Selection of edge to be rewired - Target selection - Alpha ? Probabilities and Randomness - Creation of the initial graph - Selection of edge to be rewired - Target selection - Alpha ?
PR2 Characteristics The networks evolving at each simulation run are : – Finite : fixed total number of nodes and edges, – Directed: – The initial graph is random (Erdos Renyi) – There are no weights on edges 8
Main research questions How does the structure of the networks evolves over time ? Does network degree distribution converge to power-law? How does PageRank change over time? Can we represent express the transition rate as a function of alpha ? How does degree Distribution change over time and with rate as a function of alpha ? Can critical values of p, alpha and attractors, etc. be identified? Does it converge towards a “stable” topology? How does highly ranked nodes may evolve? How does the network “size”(#nodes, # edges) impact on this dynamic? 9
Expected Statistical Characterizations inDegree Distribution – Expect this to converge to power-law PageRank Distribution – Not sure, possibly also power-law PageRank vector evolution – Not sure what to expect, possibly continual change even for low alpha values 10
Thoughts on Probabilities Defs: P = probability of edges between two nodes when creating graph PRt = PageRank vector at time t Prob(PRt) = Prob(PRt-1)/# random numbers in space * 1/(num edges) - double counting? Prob(PR0) = (# graphs w/PR = PR0(N,P))/(# graphs with (N,P)) Goal: Prob(PRt) = f(alpha, p, N) - Some formulation should be possible! 11
Phase Space Remind you, PR is a little bit like in-degree, but gives “deeper” information about the network. My popularity = how much am I liked by popular nodes? - recursive 12 A B C 1...N1...N PR points towards “popular” nodes
Phase Space Want to understand relationships between: – PR and time – how long does it take to settle to certain behavior? – PR and alpha – how alpha affects PR dynamics? – PR and p – how p affects PR dynamics? Problem: Many ways an N-dimensional vector can change over time… – Can we identify critical points, attractors, divergent areas of PR variance, PR change over time as a function of p, alpha? 13
Phase Space So, try plotting: -PR distribution over time – does this converge? -Variance of PR over time – converge? -Variance of PR for different alpha values – critical values, attractors? -Variance of PR for different p values 14
Simulator Development – Previous Work Python, using Numpy, Scipy, Networkx Showed hub migration over time 15
Simulator Development (1) Mathematica 1 st Interactive simulator – Run and “see” 16
Interactive simulator overview 17
Simulator Development (1) 2 nd “batch mode”simulator – Run and save into files Mathematica verrrrryyy sloooooooww 18
Experiments N = 50 For each P in [ 0.01 …1 ] 10 steps in a logarithmic scale Create initialGraph randomly Save initialGraph For each Alpha in [0...3 ] step =.01 For each NumRuns in [1..20] For each TimeSteps = 500 Save nested list of PageRank vectors [TimeSteps * NumRuns] (10000 PageRank vectors) Save nested list of InDegree [TimeSteps * NumRuns] (10000 degree vectors) Save finalGraph 20 hours to run 19
First Results Use Matlab or C instead! Slowness of Mathematica and problems with behavior consistency were not at all expected. 20
Conclusions (1/2) Investigating dynamic processes on large scale networks requires: – Incremental Modeling Models should be Kept as Simple as possible, then enriched little by little – “Good”/adequate choice of programming tools – Validation is a crucial issue Model and Simulator MUST be validated Need data, other models – Experiments & Analysis are a long time activities which should also be conducted gradually 21
Conclusions(2/2) Collaboration among various disciplines is necessary : NOT only computer science experts – need physics and social science, among others 22
THANK YOU 23