Topics In Social Computing (67810)

Topics In Social Computing (67810)
Module 1 (Structure) Random Graph Models

Random Graph Models

Why Study Random Graph Models?
Simplified “toy” models – clearly not exact models for social networks Teach us about different properties of graphs A benchmark to compare to actual social network graphs

G(n,p) A random graph model due to Erdos & Renyi 𝑛 vertices
Place an edge between each pair of vertices 𝑢,𝑣 independently with probability 𝑝.

G(n,p) Most often explored in the limit 𝑛→∞ 𝑝 is often scaled as a function of 𝑛 for example 𝑝= 𝑑 𝑛 or 𝑝=𝜆 l𝑛 𝑛 𝑛 (for some constants 𝑑,𝜆)

Phase transitions At 𝑛→∞ there are critical values of 𝑝 for which the graph changes drastically (with probability →1) These changes are called phase transitions Example: 𝑛𝑝<1 only small components of size O(log⁡𝑛) 𝑛𝑝>1 one giant component of size greater than Ω(𝑛 ) all other components of size O log 𝑛

A simulation adding edges one at a time. From: (Link)

Giant Component -- Intuition
Start at some vertex in the graph Imagine a BFS sweep of the graph As we reach a new vertex, we check which of its incident edges exist with probability 𝑝= 𝑑 𝑛

Suppose d>1, it is still possible for the BFS process to end quickly if we are “unlucky”. Even the first vertex may have no children. These “unlucky” cases are the small components.

If d>1 and some layer has “enough” vertices: law of large numbers predicts: next layer will have very close to 𝑑 times as many vertices The layers grow. When does growth stop? Once we start running out of vertices.

(Proof of simpler statement: see lecture notes)
G(n,p) connectivity If we set 𝑝=𝜆 ln 𝑛 𝑛 then 𝜆<1 Graph is disconnected w.h.p. 𝜆>1 Graph is connected w.h.p. (Proof of simpler statement: see lecture notes)

Clustering Coefficient of G(n,p)
One reason that G(n,p) is not a good model for social networks is that its expected clustering coefficient is very low In fact, it is exactly the expected edge density in the graph: 𝑝

Small World Networks Watts & Strogatz suggest a random network model (in a Nature paper). Interpolates between a highly clustered graph & a random graph. Consider 𝑛 nodes in a circle. Connect each node to its 𝑘 left and right neighbors (mod 𝑛). i k k

Small World Networks Each edge is rewired with some probability 𝑝
Rewiring simply attaches it to a random destination. As p grows the network becomes closer to a completely random graph Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of ‘small-world’networks." Nature (1998):

Small World Networks Watts & Strogatz show via simulation that for some range of the parameter p, clustering can be high, and average path length can be low. Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of ‘small-world’networks." Nature (1998):

Small World Networks Kleinberg’s 2D grid with additional random links is a (2D) version of a small world network. (He describes links from each node to its neighbors in radius 𝑟 + random links) High clustering + short path lengths

Models Based on Underlying Community Structure
People in the same community / group / club are more likely to be friends Probability of friendship decreases with the size of the group Example: Two people who attend the same advanced course (with few participants) vs. two people who attend the same university

Models Based on Underlying Community Structure
Example Model: Assume subsets 𝑆 𝑖 ⊂𝑉 That represent communities Add an edge randomly between two people with probability max i 𝛼 | 𝑆 𝑖 | Set systems like these can even allow for greedy routing algorithms if large sets are partitioned into a “well structured” hierarchy of sets. e.g., this class ⊂ CS&E ⊂ HUJI

Degree Distribution In 𝐺(𝑛,𝑝) when 𝑝= 𝑑 𝑛 we expect each vertex to have ~𝑑 connections. But how many vertices have more /less? Pr 𝑑𝑒𝑔 𝑣 =𝑘 = 𝑛−1 𝑘 1−𝑝 𝑛−𝑘−1 𝑝 𝑘

G(n,p) Degree Distribution
Pr 𝑑𝑒𝑔 𝑣 =𝑘 = 𝑛−1 𝑘 1−𝑝 𝑛−𝑘−1 𝑝 𝑘 Not likely at all

From some point on, the probability decays exponentially fast: Assume 𝑘>𝛼⋅𝑛𝑝>𝑛𝑝 Then: Pr 𝑑𝑒𝑔 𝑣 =𝑘−1 Pr⁡( deg 𝑣 =𝑘) = 𝑛−1 𝑘−1 1−𝑝 𝑛−𝑘 𝑝 𝑘−1 𝑛−1 𝑘 1−𝑝 𝑛−𝑘−1 𝑝 𝑘 = = 𝑛−1 ! (𝑘−1)! 𝑛−𝑘 ! 𝑛−1 ! 𝑘! 𝑛−𝑘−1 ! ⋅ 1−𝑝 𝑝 = 𝑘 1−𝑝 𝑛−𝑘 𝑝 > > 𝑘 1−𝑝 𝑛−𝑛𝑝 𝑝 = 𝑘 𝑛𝑝 >𝛼>1

Degree Distribution Small world networks also have very concentrated degree distributions. So how about networks found in the wild?

How does this compare with empirical evidence?
Surprisingly, most real world social networks have a qualitatively different degree distribution: Pr deg 𝑣 =𝑘 ∝ 𝑘 −𝛾 These are called Scale-Free Networks. Typically 𝛾 is between 2 and 3.

Scale Free Networks High degree nodes are still rare, but can be found. For example: with 𝛾=2, nodes of degree 2𝑘 are only 4 times more rare than nodes of degree 𝑘. (This will make a big difference later when we discuss things like epidemic models)

How can we identify scale free distributions
How can we identify scale free distributions? Expect to see a straight line on a log-log plot: 𝑃 𝑘 ∝ 𝑘 −𝛾 Implies: log 𝑃 𝑘 =𝛽−𝛾 log 𝑘 Webpage in-links Broder, Andrei, et al. "Graph structure in the web." Computer networks 33.1 (2000):

Paper Citations Slope -3 Number of papers # Citations
Redner, Sidney. "How popular is your paper? An empirical study of the citation distribution." Eur Phys B-Cond Matter and Complex Systems 4.2 (1998):

To smooth out the data use a Zipfian plot of citation data: Plot citation numbers for each rank (rather than histogram of citation numbers. Slope is related to that of the histogram. Link) Distribution decays when number of citations approaches number of papers in data set Citations Rank of paper Redner, Sidney. "How popular is your paper? An empirical study of the citation distribution." Eur Phys B-Cond Matter and Complex Systems 4.2 (1998):

Zipf’s Law An empirical law:
Frequency of word is inversely proportional to its rank in the frequency table. Word frequencies on Wikipedia:

Preferential Attachment (The Barabasi-Albert Model)
A random model used to generate a scale free network. Add vertices one at a time. With probability p, connect the new vertex to 𝑑 uniformly selected nodes. With probability 1-p, connect the new vertex to 𝑑 nodes selected with probability proportional to the degree of nodes: Pr 𝐿𝑖𝑛𝑘 𝑡𝑜 𝑣 = 𝑑 𝑣 𝑢 𝑑(𝑢) This generates power law distributions (with an exponent that depends on p)

Preferential Attachment (The Barabasi-Albert Model)
Suppose links are just added uniformly at random to existing nodes. A node that was added earlier will have a higher expected degree. Degrees are even further inflated when we add links weighted by degree. “The Rich get Richer” effect

Example: d=1, nodes attached with prob. weighted by degree.
Play with a nice simulation here:

Degree Distribution for a Preferential Attachment Model.
Nodes connect with probability ∝ degree

The Rich get Richer This intuitively happens in the real world too:
Popular web-pages are easier to find so they attract more links Popular individuals similarly attract even more friends

The Role of Search Engines.
Relevant results appear but… Popular results appear higher, making them even more popular. Search engines do not just reflect the web, they also shape it.

The Copying Model Add nodes one by one, with out-degree 𝑑≥1
For each new node: Pick an existing “prototype” node uniformly Copy each link of prototype with probability 𝑝 (independently) If link is not copied (prob 1−𝑝), link to a uniformly selected target instead Also generates a power law distribution.

Topics In Social Computing (67810)

Similar presentations

Presentation on theme: "Topics In Social Computing (67810)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topics In Social Computing (67810)

Similar presentations

Presentation on theme: "Topics In Social Computing (67810)"— Presentation transcript:

Similar presentations

About project

Feedback