CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information
Christopher Manning Lecture 19: (Social) Networks: Fat Tails, Small Worlds, and Weak Ties Slides from Dan Jurafsky who mainly got them from Lada Adanic; some also from James Moody, Bing Liu, James Moody

Networks We’ve looked a lot at information in language
We’ve seen that you can also get information from the patterns of pages across the Web … PageRank But there are also patterns of people who interact Maybe it’s also useful to study and understand that? Many goals: Sociology … Better ad targeting Slide from Chris Manning

Social network analysis
Social network analysis is the study of entities (people in an organization), and their interactions and relationships. The interactions and relationships can be represented with a network or graph, each vertex (or node) represents an actor and each link represents a relationship. May be directed or not. CS583, Bing Liu, UIC Nov 5, 2009

Centrality Important or prominent actors are those that are involved with other actors extensively. A person with extensive contacts or communications with many other people in the organization is considered more important The contacts can also be called links or ties. A central actor is one involved in many ties. Degree centrality: The number of direct connections a node has What really matters is where those connections lead to and how they connect the otherwise unconnected. CS583, Bing Liu, UIC

Prestige Prestige is a more refined measure of prominence of an actor than centrality. Distinguish: ties sent (out-links) and ties received (in-links). A prestigious actor is one who is the object of extensive ties as a recipient. To compute the prestige: we use only in-links. PageRank is based on prestige CS583, Bing Liu, UIC

Social Network Analysis
2. Betweenness Centrality: A node with high betweenness has great influence over what flows in the network indicating important links and single point of failure. 3. Closeness Centrality: The measure of nodes which are close to everyone else. The pattern of direct and indirect ties allows the nodes to get to any other node in the network more quickly than anyone else. They have the shortest paths to all others. from Amit Sharma UT Austin

Outline Sketch of some real networks Fat Tails Small Worlds Weak Ties

High school dating Peter S. Bearman, James Moody and Katherine Stovel
Chains of affection: The structure of adolescent romantic and sexual networks American Journal of Sociology 110 44-91 (2004) Image drawn by Mark Newman Slide from Drago Radev

More networks

Europe/Middle East/Central Asia/Africa – Green
5 million edges Graph Colors: Asia Pacific – Red Europe/Middle East/Central Asia/Africa – Green North America – Blue Latin American and Caribbean – Yellow RFC1918 IP Addresses – Cyan Unknown - White Slide from Drago Radev

The Harlem Shake: Anatomy of a Viral Meme by Gilad Lotan (SocialFlow)

Slide from Chris Manning

Degree of nodes Many nodes on the internet have low degree
One or two connections A few (hubs) have very high degree The number P(k) of nodes with degree k follows a power law: Where alpha for the internet is about 2.1

What is a heavy tailed-distribution?
Unlike a normal distribution, power-law distributions have no “scale” [I’ll explain that later…] Right skew normal distribution (not heavy tailed) Zipf’s or power-law distribution (heavy tailed) High ratio of max to min human heights vs. city sizes Slide from Lada Adamic

Normal (Gaussian) distribution of human heights
average value close to most typical distribution close to symmetric around average value Slide from Lada Adamic

Power-law distribution
linear scale log-log scale high skew (asymmetry) straight line on a log-log plot Slide from Lada Adamic

Logarithmic axes 20=1, 21=2, 22=4, 23=8, 24=16, 25=32, 26=64,….
powers of a number will be uniformly spaced 1 2 3 10 20 30 100 200 20=1, 21=2, 22=4, 23=8, 24=16, 25=32, 26=64,…. Slide from Lada Adamic

Power laws are seemingly everywhere note: these are cumulative distributions
Moby Dick scientific papers AOL users visiting sites ‘97 Slide from Lada Adamic bestsellers AT&T customers on 1 day California

Yet more power laws wars (1816-1980) Moon Solar flares
Slide from Lada Adamic richest individuals 2003 US family names 1990 US cities 2003

Power law distribution
Straight line on a log-log plot Exponentiate both sides to get that p(x), the probability of observing an item of size ‘x’ is given by normalization constant (probabilities over all x must sum to 1) power law exponent a Slide from Lada Adamic

What does it mean to be scale free?
A power law looks the same no mater what scale we look at it on (2 to 50 or 200 to 5000) Only true of a power-law distribution! p(bx) = g(b) p(x) – shape of the distribution is unchanged except for a multiplicative constant p(bx) = (bx)-a = b-a x-a log(p(x)) x →b*x Slide from Lada Adamic log(x)

Many real world networks are power law
exponent a (in/out degree) film actors co-appearance 2.3 telephone call graph 2.1 networks 1.5/2.0 sexual contacts 3.2 WWW 2.3/2.7 internet 2.5 peer-to-peer metabolic network 2.2 protein interactions 2.4 Slide from Lada Adamic

Power laws and “fat tails”

“Fat tails” in the news

Hey, not everything is a power law
number of sightings of 591 bird species in the North American Bird survey in 2003. cumulative distribution another examples: size of wildfires (in acres) Slide from Lada Adamic

Not every network is power law distributed
address books power grid Roget’s thesaurus company directors… Slide from Lada Adamic

Zipf’s law is a power-law
George Kingsley Zipf how frequent is the 3rd or 8th or 100th most common word? Intuition: small number of very frequent words (“the”, “of”) lots and lots of rare words (“expressive”, “Jurafsky”) Zipf's law: the frequency of the r'th most frequent word is inversely proportional to its rank: y ~ r -b , with b close to unity.

Pareto’s law and power-laws
The Italian economist Vilfredo Pareto was interested in the distribution of income. Pareto’s law is expressed in terms of the cumulative distribution (the probability that a person earns X or more). P[X > x] ~ x-k Here we recognize k as just a -1, where a is the power-law exponent Slide from Lada Adamic

80/20 rule The fraction W of the wealth in the hands of the richest P of the population is given by W = P(a-2)/(a-1) Example: US wealth: a = 2.1 richest 20% of the population holds 86% of the wealth Slide from Lada Adamic

Generative processes for power-laws
Many different processes can lead to power laws There is no one unique mechanism that explains it all Slide from Lada Adamic

First considered by [Price 65] as a model for citation networks
One mechanism for generating a power-law distribution: Preferential attachment First considered by [Price 65] as a model for citation networks each new paper is generated with m citations (mean) new papers cite previous papers with probability proportional to their in-degree (citations) what about papers without any citations? each paper is considered to have a “default” citation probability of citing a paper with degree k, proportional to k+1 Power law with exponent α = 2+1/m Slide from Lada Adamic

Small worlds Slide from Lada Adamic

Small world experiments
NE MA Stanley Milgram’s experiment (1967): Given a target individual and a particular property, pass the message to a person you correspond with who is “closest” to the target. Slide from Lada Adamic

Milgram’s small world experiment
Target person worked in Boston as a stockbroker. 296 senders from Boston and Omaha. 20% of senders reached target. average chain length = 6.5. “Six degrees of separation” Slide from Lada Adamic

Milgram’s Findings: Distance to target person, by sending group.
Slide from James Moody

Duncan Watts: Networks, Dynamics and
the Small-World Phenomenon Asks why we see the small world pattern and what implications it has for the dynamical properties of social systems. His contribution is to show that globally significant changes can result from locally insignificant network change. Slide from James Moody

Duncan Watts: Networks, Dynamics and the Small-World Phenomenon
Watts says there are 4 conditions that make the small world phenomenon interesting: 1) The network is large - O(Billions) 2) The network is sparse - people are connected to a small fraction of the total network 3) The network is decentralized -- no single (or small #) of stars 4) The network is highly clustered -- most friendship circles are overlapping Slide from James Moody

Duncan Watts: Networks, Dynamics and the Small-World Phenomenon
Formally, we can characterize a graph through 2 statistics. 1) The characteristic path length, L (the diameter) The average length of the shortest paths connecting any two actors. 2) The clustering coefficient, C Is the average local density. That is, Cv = ego-network density, and C = Cv /n A small world graph is any graph with a relatively small L and a relatively large C. Slide from James Moody

Local clustering coefficient (Watts&Strogatz 1998)
For a vertex i The fraction pairs of neighbors of the node that are themselves connected Let ni be the number of neighbors of vertex i number of connections between i’s neighbors maximum number of possible connections between i’s neighbors # directed connections between i’s neighbors ni * (ni -1) # undirected connections between i’s neighbors ni * (ni -1)/2 Ci = Ci directed = Ci undirected = Slide from Lada Adamic

Local clustering coefficient (Watts&Strogatz 1998)
Average over all n vertices ni = 4 max number of connections: 4*3/2 = 6 3 connections present Ci = 3/6 = 0.5 i link present link absent Slide from Lada Adamic

The most clustered graph is Watt’s “Caveman” graph:
Slide from Lada Adamic

Why does this work? Key is fraction of shortcuts in the network
In a highly clustered, ordered network, a single random connection will create a shortcut that lowers L dramatically Watts demonstrates that Small world graphs occur in graphs with a small number of shortcuts Slide from Lada Adamic

Watts and Strogatz model [WS98]
Start with a ring, where every node is connected to the next z nodes ( a regular lattice) With probability p, rewire every edge (or, add a shortcut) to a uniformly chosen destination. order p = 0 0 < p < 1 p = 1 Slide from Lada Adamic Small world randomness

Clustering and Path Length
Random Graphs have a low clustering coefficient but a low diameter Regular Graphs have a high clustering coefficient but also a high diameter Slide from Lada Adamic

Weak links Mark Granovetter (1960s) studied how people find jobs. He found out that most job referrals were through personal contacts But more by acquaintances and not close friends. Accepted by the American Journal of Sociology after 4 years of unsuccessful attempts elsewhere. One of the most cited papers in sociology. Slide from Drago Radev

Mark Granovetter: The strength of weak ties
Strength of ties amount of time spent together emotional intensity intimacy (mutual confiding) reciprocal services Many strong ties are transitive we meet our friends through other friends if we spend a lot of time with our strong ties, they will tend to overlap balance theory – if two of my friends do not like each other – we will all be unhappy. Triad closure is the happy solution = Homophily Slide from James Moody

Strength of weak ties Weak ties can occur between cohesive groups
old college friend former colleague from work weak ties will tend to have low transitivity Slide from James Moody

Strength of weak ties Evidence from small world experiments
Small world experiment at Columbia: acquaintanceship ties more effective than family, close friends Slide from James Moody

Strength of weak ties – how to get a job
Granovetter: How often did you see the contact that helped you find the job prior to the job search 16.7% often (at least once a week) 55.6% occasionally (more than once a year but less than twice a week) 27.8% rarely – once a year or less Weak ties will tend to have different information than we and our close contacts do Long paths rare 39.1 % info came directly from employer 45.3 % one intermediary 3.1 % > 2 (more frequent with younger, inexperienced job seekers) Compatible with Watts/Strogatz small world model: short average shortest paths thanks to ‘shortcuts’ that are non- transitive Slide from James Moody

Finding paths Watts model shows how these short paths can exist
small world networks But how do people find the paths? People seem to be successful by making greedy local decisions The existence of findable short paths depends on further elucidating the structure of the network Slide from Lada Adamic

Spatial search nodes are placed on a lattice and
Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’ Proc. 32nd ACM Symposium on Theory of Computing, 2000. (Nature 2000) “The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain” S.Milgram ‘The small world problem’, Psychology Today 1,61,1967 nodes are placed on a lattice and connect to nearest neighbors additional links placed with puv~ Slide from Lada Adamic

Increasing r favors near nodes
d -r(u,v) =1 , Uniform Distribution r=0, Link to each other node equally likely r=1, inverse of distance If a node is twice as far away, 1/2 as likely r=2, inverse squared If a node is twice as far away, 1/4 as likely Slide from Lada Adamic

Kleinberg’s SW network is Greedy Routable iff r=2
Greedy routing algorithm using local information only, find a short path from s to t v u s When u is the current node, choose next v: the closest to t (use lattice distance) with (u,v) a local or random edge. (i.e. the view at this node). Slide from Lada Adamic

A greedy routing algorithm using local information only, find a short path from s to t v u s The number of hops is the the ‘delivery time’ This greedy routing achieves expected ‘delivery time’ of O(log2n), i.e. the st paths have expected length O(log2n). (i.e. the view at this node). Slide from Lada Adamic

A greedy routing algorithm using local information only, find a short path from s to t v u s This greedy routing achieves expected `delivery time’ of O(log2n), i.e. the st paths have expected length O(log2n). This does not work unless r=2 : for r2, >0 such that the expected delivery time of any decentralized algorithm is (n). (i.e. the view at this node). Slide from Lada Adamic

Overly localized links on a lattice
When r>2 expected search time ~ N(r-2)/(r-1) Slide from Lada Adamic

no locality When r=0, links are randomly distributed, ASP ~ log(n), n size of grid When r=0, any decentralized algorithm is at least a0n2/3 When r<2, expected time at least arn(2-r)/3 Good paths exist, but are not greedily findable Slide from Lada Adamic

Links balanced between long and short range
When r=2, expected time of a DA is at most C (log N)2 Slide from Lada Adamic

Outline Quick sketch of some networks Fat Tails Small Worlds Weak Ties

CS 124/LINGUIST 180 From Languages to Information

Similar presentations

Presentation on theme: "CS 124/LINGUIST 180 From Languages to Information"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 124/LINGUIST 180 From Languages to Information

Similar presentations

Presentation on theme: "CS 124/LINGUIST 180 From Languages to Information"— Presentation transcript:

Similar presentations

About project

Feedback