1 “Expansion” in Power Law and Scale Free Graphs Milena Mihail Georgia Tech with Christos Gkantsidis, Christos Papadimitriou and Amin Saberi
2 Graphs with Skewed Degree Sequences Communication Networks This Talk: Algorithmic Issues, “Expansion”, spectral gap determine performace of key algorithms. Metabolic Networks
3 How does Congestion Scale? Sprint AT&T Demand: n 2, uniform. What is load of max congested link, in optimal routing ? ISPs: K Routers: K WWW: 500K-3B P2P: hundred Ks
4 CAIDA Degrees not Concentrated around mean E[degree]~3 Real Internet Topologies Not Erdos-Renyi
5 Degree-Frequency Power Law frequenc y E[d] = const., but No sharp concentration degree E[d] = const., but No sharp concentration Erdos-Renyi sharp concentration Models by Kumar et al 00, x Bollobas et al 01, x Fabrikant et al 02
6 Power Laws Degree-Frequency Rank-Degree Eigenvalues (Adjacency Matrix) [WWW: Kumar et al 99, Barabasi-Albert 99] [Interdomain Routing: Faloutsos et al 99]
7 Models for Power Law Graphs EVOLUTIONARY Macroscopic : Growth & Preferential Attachment Simon 55, Barabasi-Albert 99, Kumar et al 00, Bollobas-Riordan 01. Microscopic : Growth & Multiobjective Optimization, QoS vs Cost Fabrikant-Koutsoupias-Papadimitriou 02. STRUCTURAL (aka CONFIGURATIONAL) “Random” graph with “power law” degree sequence.
8 Structural Random Graph Model Given Choose random perfect matching over Molley&Reed 95-98, Aiello,Chung,Lu 00, Tagmunarunkit et al 02 minivertices
9 Congestion in the “Core” Theorem [Gkantsidis,MM, Saberi 02]: For a random graph arising from degree sequence O(n ½ ) ≥ d 1 ≥d 2 ≥…≥d n ≥3 there is a flow that routes demand d i * d j between all vertices i and j with max link congestion O(n log 2 n) almost surely.
10 Proof : Step 1 : Approximation algorithms for multicommodity flow reduce congestion to conductance (special case of sparsest cut). Step 2 : Bound conductance - MAIN LEMMA.
11 By Maximum multi-commodity flow, [Leighton & Rao 95] Proof, Step 1 : Reduce to Conductance
12 Proof, Step 2 : Main Lemma [Gkantsidis,MM, Saberi 02]:
13 Proof of MAIN LEMMA:
14 Proof of MAIN LEMMA:
15 Proof of MAIN LEMMA: Stirling
16 Proof of MAIN LEMMA: ignore Stirling BIGSMALL
17 In an Evolutionary Model ? Growth with Pref. Attachment One vertex at a time New vertex attaches to d existing vertices
18 Reduction to Random Matching [Bollobas & Riordan 01] t=2t=1t=3 t=4 t=5
19 Reduction to Random Matching [Bollobas & Riordan 01] t=2t=1t=3 t=4 t=5
20 Reduction to Random Matching [Bollobas & Riordan 01]
21 In an Evolutionary Model ? Growth with Pref. Attachment Theorem [MM, Saberi 02]: For a graph grown with preferential attachment with d ≥ 3 there is a flow that routes demand d i * d j between all vertices i and j with max link congestion O(n log n) almost surely. Main Lemma: almost surely. Open Question: Analyze a graph grown one vertex or edge at a time, where with probability a new vertex comes and attaches preferentially and with probability a new edge grows preferentially between existing vertices.
22 Spectral Implication Theorem: Eigenvalue separation for stochastic normalization of adjacency matrix [Alon 85, Jerrum&Sinclair 88]
23 Spectra of “Real” Internet
24 Spectral Implications Theorem: Eigenvalue separation for stochastic normalization of adjacency matrix [Alon 85, Jerrum&Sinclair 88] Using matrix perturbation [Courant-Fisher theorem] in a sparse random graph model. Rank-Degree Eigenvalues (Adjacency Matrix) On the eigenvalue Power Law [M.M. & Papadimitriou 02]
25 Theorem : Ffor large enough Wwith probability at least [M.M. & Papadimitriou 02]
26 Proof : Step 1. Decomposition Vertex Disjoint StarsLR-extra RR LL LR =-
27 Proof: Step 2: Vertex Disjoint Stars Degrees of each Vertex Disjoint Stars Sharply Concentrated around its Mean d_i Hence Principal Eigenvalue Sharply Concentrated around
28 Proof: Step 3: LL, RR, LR-extra LR-extra has max degree LL has edges RR has max degree
29 Proof: Step 3: LL, RR, LR-extra LR-extra has max degree RR has max degree LL has edges
30 Proof: Step 4: Matrix Perturbation Theory Vertex Disjoint Stars have principal eigenvalues All other parts have max eigenvalue QED
31 Implication for Info Retrieval Spectral filtering, without preprocessing, reveals only the large degrees. Term-Norm Distribution Problem :
32 Implication for Info Retrieval Term-Norm Distribution Problem : Spectral filtering, without preprocessing, reveals only the large degrees. Local information. No “latent semantics”.
33 Implication for Information Retrieval Application specific preprocessing (normalization of degrees) reveals clusters: WWW: related to searching, Kleinberg 97 IR, collaborative filtering, … Internet: related to congestion, Gkantsidis et al 02 Open : Formalize “preprocessing”. Term-Norm Distribution Problem :
34 Routing Integral paths? Short paths? Reliability? Cover time? Related to Crawling Hitting time? Related to Searching Planted model? Information Retrieval Further Directions: (Experimental work: Gkantsidis, MM, Zegura 02.) Generalize theory of Regular Expanders Peleg&Upfal’88 … Broder,Frieze&Upfal’01 Kleinberg&Rubinfeld’97
35 Metabolic Networks Statistics of fixed size subgraphs? Related to “motifs” in metabolic networks. Model (explain) heavy tailed statistics in noncoding part of DNA? Related to stages of species evolution.
36 Evaluation of Synthetic Topology Generators Core of the Network Entire Topology