Download presentation
Presentation is loading. Please wait.
1
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006 http://www.ee.technion.ac.il/courses/049011
2
2 Web Structure I : Power Laws and Small World Phenomenon
3
3 Outline Power laws The preferential attachment model Small-world networks The Watts-Strogatz model
4
4 Observed Phenomena Few multi-billionaires, but many with modest income [Pareto, 1896] Few frequent words, but many infrequent words [Zipf, 1932] Few “mega-cities” but many small towns [Zipf, 1949] Few web pages with high degree, but many with low degree [Kumar et al, 99] [Barabási & Albert, 99] All the above obey power laws.
5
5 Power Law (Pareto) Distribution > 0: shape parameter (“slope”) k > 0: location parameter Ex: (k = $1000, = 2) 1/100 earn ≥ $10,000 1/10,000 earn ≥ $100,000 1/1,000,000 earn ≥ $1,000,000
6
6 Power Law Properties PDF: Infinite mean for ≤ 1 Infinite variance for ≤ 2 When X is discrete,
7
7 Power Law Graphs Linear Scale PlotLog-Log Plot Slope = -
8
8 Scale-Free Distributions Power laws are invariant to scale Ex: (k = arbitrary, = 2) 1/100 earn ≥ 10k 1/10,000 earn ≥ 100k 1/1,000,000 earn ≥ 1000k
9
9 Heavy Tailed Distributions In many “classical” distributions Ex: normal, exponential In power law distributions “heavy tail” “light tail”
10
10 Zipf’s Law Size of r-th largest city is Equivalent to a power law: X = size of a city Change variables:
11
11 Power Laws and the Internet Web Graph In- and out-degrees (in slope: ~2.1, out slope: ~2.7) [Kumar et al. 99, Barabási & Albert 99, Broder et al 00] Sizes of connected components [Broder et al 00] Website sizes [Huberman & Adamic 99] Internet graph Degrees [Faloutsos 3 99] Eigenvalues [Mihail & Papadimitriou 02] Traffic Number of visits to websites
12
12 Power Laws and Graphs If X is a random web page, then What random graph model explains this phenomenon?
13
13 Erdős-Rényi Random Graphs G n,p n: size of the graph (fixed) p: edge existence probability (fixed): Every pair u,v is connected by an edge with probability p. Theorem [Erdős & Rényi, 60] For any node x in G n,p,
14
14 Preferential Attachment [ Barabási & Albert 99] A novel random graph model Initialization: graph starts with a single node with two self loops. Growth: At every step a new node v is added to the graph. v has a self loop and connects to one neighbor. Preferential attachment: v connects to u with probability The rich get richer / The winner takes it all
15
15 : # of nodes whose indegree = k after t steps k > 1: Expected growth: Why Does it Work? k = 1:
16
16 Why Does it Work? (2) Fact: After sufficiently many steps, reaches a “steady state”. c k = value of at the steady state. Since at steady state, Hence, Therefore:
17
17 Why Does it Work? (3) Then: And: Therefore:
18
18 Six Degrees of Separation [Stanley Milgram, 67] “Random starters” at Nebraska, Kansas, etc. Destinations: in Boston Intermediaries send postcards to Milgram Findings: average of 6 postcards “Conclusion”: every two people in the US are connected by a path of length ~ 6
19
19 Small-World Networks Average diameter: length of shortest path from u to v, averaged over all pairs u,v Clustering coefficient: fraction of neighbors of v that are neighbors of each other, averaged over all v Small-world network: a sparse graph with average diameter O(log n) and a constant clustering coefficient
20
20 The Web as a Small World Network Low diameter Study of a synthetic web graph model [Albert, Jeong, Barabási 99] Average diameter of the Web is ~19 Grows logarithmically with size of the Web. Study of a large crawl [Broder et al 00] Average diameter of the SCC is ~ 16 Maximum diameter of the SCC is ≥ 28 Diameter of host graph [Adamic 99] Average diameter of SCC: ~4 High clustering coefficient Clustering coefficient of host graph [Adamic 99] Clustering coefficient: ~0.08 (compared to 0.001 in a comparable random graph)
21
21 Model for Small-World Networks [Watts & Strogatz 98] One extreme: random networks Low diameter Low clustering coefficient Other extreme: “regular” networks (e.g., a lattice) High clustering coefficient High diameter Small-world: interpolation between the two Low diameter High clustering coefficient Regularity: social networking Randomness: individual interests
22
22 Random Network The model: n vertices Every pair u,v is connected by an edge with probability p = d/n Properties: Expected number of edges: ~dn Graph is connected w.h.p Diameter: O(log n) w.h.p. Clustering coefficient: ~ p = d/n = o(1)
23
23 Ring Lattice The model: n vertices on a circle Every vertex has d neighbors: the d/2 vertices to its right and the d/2 vertices to its left Properties: Number of edges: dn/2 Graph is connected Diameter: O(n/d) Clustering coefficient:
24
24 Random Rewiring Start from a ring lattice for i = 1 to d/2 do for v = 1 to n do Pick i-th clockwise nearest neighbor of v With probability p, replace this neighbor by a random vertex
25
25 Analysis If p = 0, ring lattice High clustering coefficient High diameter If p = 1, random network Logarithmic diameter Low clustering coefficient However, Diameter goes down rapidly as p grows Clustering coefficient goes down slowly as p grows Therefore, for small p, we get a small-world network. Logarithmic diameter High clustering coefficient
26
26 End of Lecture 7
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.