EK Ch 17: Power laws and rich-get-richer phenomena (with an application of Web Spam detection Spam, Damn Spam and Statistics ) Spam, Damn Spam and Statistics
Numbers Your grades so far in this class. The weight of an apple. The temperature in Chicago on July 4 th. The height of a Dutch man. The speed of a car on I-90. Most instances are typical. Seeing a rare number is very surprising. These numbers are well-characterized by the average and the standard deviation.
City populations 1. New York8,310, Los Angeles 3,834, Chicago2,836, Cambridge, MA 101, Gainesville, FL 95, McKinney, TX 54,369 A few cities with high population Many cities with low population
City populations
Power Law: Fraction f(k) of items with popularity k is proportional to k -c. f(k) k -c log [f(k)] log [k -c ] log [f(k)] -c log [k]
City populations
Number of Web page in-links (Broder+)
Other examples
Length of the URL’s host
Number of host name resolutions to a single IP
Web page out-degrees
Web page in-degrees
Word count variance
Content evolution
Cluster size
… because they care to know ;-)
Why does data exhibit power laws? ImitationPower law
Constructing the web 1. Pages are created in order, named 1, 2, …, N 2. When created, page j links to a page by a) With probability p, picking a page i uniformly at random from 1, …, j-1 b) With probability (1-p), pick page i uniformly at random and link to the page that i links too Imitation
The rich get richer 2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too 1/43/4
The rich get richer 2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too Equivalently, 2 b)With prob. (1-p), pick a page proportional to its in- degree and link to it
Food for thought Why is Harry Potter popular? If we could re-play history, would we still read Harry Potter, or would it be some other book?
Information cascades and the rich Information cascade = so some people get a little bit richer by chance and then rich-get-richer dynamics = the random rich people get a lot richer very fast
Music download site – 8 worlds 1.“Let’s go driving,” Barzin 2.“Silence is sexy,” Einstu ̈ rzende Neubauten 3.“Go it alone,” Noonday Underground 10.“Picadilly Lilly,” Tiger Lillies 1.“Let’s go driving,” Barzin 2.“Silence is sexy,” Einstu ̈ rzende Neubauten 3.“Go it alone,” Noonday Underground 10.“Picadilly Lilly,” Tiger Lillies