Analysis of Social Media MLD , LTI William Cohen
Recap: What are we trying to do? Like the normal curve: Fit real-world data Find an underlying process that “explains” the data Enable mathematical understandingl (closed- form?) Modelssome small but interesting part of the data
Graphs Some common properties of graphs: – Distribution of node degrees – Distribution of cliques (e.g., triangles) – Distribution of paths Diameter (max shortest- path) Effective diameter (90 th percentile) Connected components – … Some types of graphs to consider: – Real graphs (social & otherwise) – Generated graphs: Erdos-Renyi “Bernoulli” or “Poisson” Watts-Strogatz “small world” graphs Barbosi-Albert “preferential attachment” …
Graphs Some types of graphs to consider: – Real graphs (social & otherwise) – Generated graphs: Erdos-Renyi “Bernoulli” or “Poisson” Watts-Strogatz “small world” graphs Barbosi-Albert “preferential attachment” … All pairs connected with probability p
Graphs Some types of graphs to consider: – Real graphs (social & otherwise) – Generated graphs: Erdos-Renyi “Bernoulli” or “Poisson” Watts-Strogatz “small world” graphs Barbosi-Albert “preferential attachment” … Regular, high-homophily lattice Plus random “shortcut” links
Graphs Some types of graphs to consider: – Real graphs (social & otherwise) – Generated graphs: Erdos-Renyi “Bernoulli” or “Poisson” Watts-Strogatz “small world” graphs Barbosi-Albert “preferential attachment” … New nodes have m neighbors High-degree nodes are preferred “Rich get richer”
Graphs Some common properties of graphs: – Distribution of node degrees – Distribution of cliques (e.g., triangles) – Distribution of paths Diameter (max shortest- path) Effective diameter (90 th percentile) Connected components – … Some types of graphs to consider: – Real graphs (social & otherwise) – Generated graphs: Erdos-Renyi “Bernoulli” or “Poisson” Watts-Strogatz “small world” graphs Barbosi-Albert “preferential attachment” …
Graphs Some common properties of graphs: – Distribution of node degrees – Distribution of cliques (e.g., triangles) – Distribution of paths Diameter (max shortest- path) Effective diameter (90 th percentile) Connected components – …
Graphs Some common properties of graphs: – Distribution of node degrees – Distribution of cliques (e.g., triangles) – Distribution of paths Diameter (max shortest- path) Effective diameter (90 th percentile) Connected components – … In a big Erdos-Renyi graph this is very small (1/n) In social graphs, not so much More later…
Graphs Some common properties of graphs: – Distribution of node degrees – Distribution of cliques (e.g., triangles) – Distribution of paths Diameter (max shortest- path) Effective diameter (90 th percentile) Mean diameter Connected components – … In a big Erdos-Renyi graph this is small (logn/logz) In social graphs, it is also small (“6 degrees”)
Graphs Some common properties of graphs: – Distribution of node degrees – Distribution of cliques (e.g., triangles) – Distribution of paths Diameter (max shortest- path) Effective diameter (90 th percentile) Mean diameter Connected components – … In a big Erdos-Renyi graph there is one “giant connected component”… … because two giant connected components cannot co-exist for long.
n/a Poor fit
More terms Centrality and betweenness: how does your position in a network affect what you do and how you do it? – And how can we define these precisely? High centrality: ringleaders? High betweenness: go-between, conduit between different groups? – “Structural hole” Group cohesiveness: number of edges within a (sub)group
More terms
Association network: bipartite network where nodes are people or organizations
A larger association network
Triads and clustering coefficients In a random Erdos-Renyi graph: In natural graphs two of your mutual friends might well be friends: Like you they are both in the same class (club, field of CS, …) You introduced them
Watts-Strogatz model Start with a ring Connect each node to k nearest neighbors homophily Add some random shortcuts from one point to another small diameter Degree distribution not scale-free Generalizes to d dimensions
Even more terms Homophily: tendency for connected nodes to have similar properties Social contagion: connected nodes become similar over time Associative sorting: similar nodes tend to connect Disassociative sorting: vice-versa Association network: bipartite network where nodes are people or organizations
A big question Homophily: similar nodes ~= connected nodes Which is cause and which is effect? – Do birds of a feather flock together? – Do you change your behavior based on the behavior of your peers? – Do both happen in different graphs? Can there be a combination of associative sorting and social contagion in the same graph?
A big question about homophily Which is cause and which is effect? – Do birds of a feather flock together? – Do you change your behavior based on the behavior of your peers? How can you tell? – Look at when links are added and see what patterns emerge (triadic closure): Pr(new link btwn u and v | #common friends)
T(k) = 1 – (1-p)^k T(k) = 1 – (1-p)^(k-1) Triadic closure
Changing behavior
Final example: spatial segregation How picky do people have to be about their neighbors for homophily to arise? Imagine a grid world where – Agents are red or blue – Agents move to a random location if they are unhappy Agents are happy unless <k neighbors are the same color they are (k= i.e., they prefer not to be in a small minority – What’s the result over time? helling/ helling/