Long Tails and Navigation Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.

Slides:



Advertisements
Similar presentations
Scale Free Networks.
Advertisements

The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
The Small World Phenomenon: An Algorithmic Perspective Speaker: Bradford Greening, Jr. Rutgers University – Camden.
1 Analyzing Kleinberg’s Small-world Model Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.
Navigation in Networks Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
Analysis and Modeling of Social Networks Foudalis Ilias.
Online Social Networks and Media Navigation in a small world.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Models of Network Formation Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
What did we see in the last lecture?. What are we going to talk about today? Generative models for graphs with power-law degree distribution Generative.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Information Networks Small World Networks Lecture 5.
Lecture 7 CS 728 Searchable Networks. Errata: Differences between Copying and Preferential Attachment In generative model: let p k be fraction of nodes.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Company LOGO 1 Identity and Search in Social Networks D.J.Watts, P.S. Dodds, M.E.J. Newman Maryam Fazel-Zarandi.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
CS728 Lecture 5 Generative Graph Models and the Web.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Long Tails and Navigation Networked Life CSE 112 Spring 2007 Prof. Michael Kearns.
Strategic Models of Network Formation Networked Life CIS 112 Spring 2010 Prof. Michael Kearns.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Long Tails and Navigation Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
CS Lecture 6 Generative Graph Models Part II.
News and Notes, 2/17 New Kleinberg article added to SNT readings –Watts chapters plus four articles Homework 2 distributed today, due Feb 26 –heading towards.
Advanced Topics in Data Mining Special focus: Social Networks.
Economic Models of Network Formation Networked Life CIS 112 Spring 2008 Prof. Michael Kearns.
Social Network Theory Networked Life CSE 112 Spring 2006 Prof. Michael Kearns.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Economic Models of Network Formation Networked Life CSE 112 Spring 2006 Prof. Michael Kearns.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Information Networks Power Laws and Network Models Lecture 3.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Graph Coloring Update Best Times by Graph Buenos Aires Toronto Tehran Moscow Tokyo Taipei Santiago Madrid.
The Science of Networks 6.1 Overview Social Goal. Explain why information and disease spread so quickly in social networks. Mathematical Approach. Model.
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Models and Algorithms for Complex Networks Power laws and generative processes.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Online Social Networks and Media
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
How Do “Real” Networks Look?
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Navigation in Networks, Revisited Networked Life MKSE 112 Fall 2012 Prof. Michael Kearns.
Contagion in Networks Networked Life NETS 112 Fall 2015 Prof. Michael Kearns.
Navigation in Networks
Lecture 1: Complex Networks
Topics In Social Computing (67810)
Navigation in Networks
How Do “Real” Networks Look?
Networked Life NETS 112 Fall 2018 Prof. Michael Kearns
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Social Network Analysis
Models of Network Formation
Models of Network Formation
Navigation in Networks
Networked Life NETS 112 Fall 2017 Prof. Michael Kearns
Models of Network Formation
How Do “Real” Networks Look?
Networked Life NETS 112 Fall 2016 Prof. Michael Kearns
Models of Network Formation
Peer-to-Peer and Social Networks
Network Models Michael Goodrich Some slides adapted from:
Networked Life NETS 112 Fall 2019 Prof. Michael Kearns
Presentation transcript:

Long Tails and Navigation Networked Life CIS 112 Spring 2010 Prof. Michael Kearns

One More ( Structural ) Property… A properly tuned  -model can simultaneously explain –small diameter –high clustering coefficient –other models can, too (e.g. cycle+random rewirings) But what about connectors and heavy-tailed degree distributions? –  -model and simple variants will not explain this –intuitively, no “bias” towards large degree evolves –“all vertices are created equal” As usual, we want a “natural” model to explain this

Quantifying Connectors: Heavy-Tailed Distributions

Heavy-tailed Distributions Pareto or power law distributions: –for random variables assuming integer values > 0 –probability of value x ~ 1/x^  –typically 0 <  < 2; smaller  gives heavier tail –here are some examplesexamples –sometimes also referred to as being scale-free For binomial, normal, and Poisson distributions the tail probabilities approach 0 exponentially fast Inverse polynomial decay vs. inverse exponential decay What kind of phenomena does this distribution model? What kind of process would generate it?

Distributions vs. Data All these distributions are idealized models In practice, we do not see distributions, but data Thus, there will be some largest value we observe Also, can be difficult to “eyeball” data and choose model So how do we distinguish between Poisson, power law, etc? Typical procedure: –might restrict our attention to a range of values of interest –accumulate counts of observed data into equal-sized bins –look at counts on a log-log plot –note that power law: –log(Pr[value = x]) = log(1/x^  ) = -  log(x) –linear, slope – linear, slope –  Normal/Gaussian: –log(Pr[value = x]) = log(a exp(-x^2/b)) = log(a) – x^2/b –non-linear, concave near meannon-linear, concave near mean Poisson: –log(Pr[value = x]) = log(exp(- ) ^x/x!) –also non-linear Let’s look at the paper on dollar bill migrationdollar bill migration

Heavy Tails Recap We plot the distribution or histogram of some “resource” –on the x-axis, we put the amount or quantity of this resource (e.g. degrees) –on the y-axis, we put the number or fraction of the population with the corresponding amount By “heavy-tailed”, we broadly mean that the rate of decay is “slow” as we move to the right on the x-axis In mathland, we can write explicit equations for a slow rate of decay –e.g. Pareto or power-law distributions When confronted with data, the empirical test for heavy tails is: –plot the histogram in log-log form (e.g. x = log(degree), y = log(# with that degree)) –log-log plot should look roughly linear, especially towards the right A related concept: ranking by some quantity –e.g. look at most popular iPhone app, second most popular, third most popular –x-axis: rank by popularity –y-axis: how popular is it (# downloads) Again, heavy-tailed means rate of decay is slow as we move to the right –and again, signature is linearity of log(rank) vs. log(popularity) Claim: many interesting quantities have heavy-tailed distributions/rankings

Long Tail of iPhone App Popularity

Zipf’s Law Look at the frequency of English words: –“the” is the most common, followed by “of”, “to”, etc. –claim: frequency of the n-th most common ~ 1/n (power law,  ~ 1) General theme: –rank events by their frequency of occurrence –resulting distribution often is a power law! Other examples: –North America city sizes –personal income –file sizes –genus sizes (number of species) –the “long tail of search” (on which more later…) –let’s look at log-log plots of theselog-log plots People seem to dither over exact form of these distributions –e.g. value of  –but not over heavy tails

Generating Heavy-Tailed Degrees: (Just) One Model

Preferential Attachment Let’s warm up with a little Matlab demo…Matlab demo Start with (say) two vertices connected by an edge At each step, add one new vertex v with one edge back to previous vertices Probability a previously added vertex u receives the new edge from v is proportional to the (current) degree of u –more precisely, probability u gets the edge = (current degree of u)/(sum of all current degrees) Vertices with high degree are likely to get even more links! –…just like the crowded nightclub “Rich Get Richer” or “Matthew Effect” Natural model for many processes: –hyperlinks on the web –new business and social contacts –technology adoption (e.g. online social networks) Generates a power law distribution of degrees Let’s look at the NetLogo simulationsimulation Variation: each new vertex initially gets k edges

Two Out of Three Isn’t Bad… Preferential attachment explains –heavy-tailed degree distributions –small diameter (~log(N), via “hubs”) Will not generate very high clustering coefficients –no bias towards local connectivity, but towards hubs Can we simultaneously capture all three properties? –probably, but we’ll stop here –soon there will be a fourth property anyway…

Navigation (Search) Revisited

Finding the Short Paths Milgram’s experiment, Columbia Small Worlds, E-R,  -model, P.A.… –all emphasize existence of short paths between pairs (small diameter) How do individuals find short paths? –in an incremental, next-step fashion –using purely local information about the NW and location of target –note: shortest path might require taking steps “away” from the target! This is not (only) a structural question, but an algorithmic one –statics vs. dynamics Navigability may impose additional restrictions on formation model! Briefly investigate two alternatives: –a local/long-distance mixture model [Kleinberg] –a “social identity” model [Watts, Dodd, Newman]

Kleinberg’s Model Start with an n by n grid of vertices (so N = n^2) –add some long-distance connections to each vertex: k additional connections probability of connection to grid distance d: ~ (1/d)^rprobability of connection to grid distance d –c.f. dollar bill migration paper, r ~ 1.6 –so full model given by choice of k and r –large r: heavy bias towards “more local” long-distance connections –small r: approach uniformly random Kleinberg’s question: –what value of r permits effective navigation? –# hops << N, e.g. log(N) Assume parties know only: –grid address of target –addresses of their own immediate neighbors Algorithm: pass message to nbr closest to target in grid

Kleinberg’s Result Intuition: –if r is too large (strong local bias), then “long-distance” connections never help much; short paths may not even exist (remember, grid has large diameter, ~ sqrt(N)) –if r is too small (no local bias), we may quickly get close to the target; but then we’ll have to use grid links to finish think of a transport system with only long-haul jets or donkey carts –effective search requires a delicate mixture of link distances The result (informally): –r = 2 is the only value that permits rapid navigation (~log(N) steps) –any other value of r will result in time ~ N^c for 0 < c <= 1 N^c >> log(N) for large N –a critical value phenomenon or “knife’s edge”; very sensitive –contrast with 1/d^(1.59) from dollar bill migration paper Note: locality of information crucial to this argument –centralized, “birds-eye” algorithm can still compute short paths at r < 2! –can recognize when “backwards” steps are beneficial Later in the course: What happens when distance-d edges cost d^r?

Navigation via Identity Watts et al.: –we don’t navigate social networks by purely “geographic” information –we don’t use any single criterion; recall Dodds et al. on Columbia SW –different criteria used at different points in the chain Represent individuals by a vector of attributes –profession, religion, hobbies, education, background, etc… –attribute values have distances between them (tree-structured) –distance between individuals: minimum distance in any attribute –only need one thing in common to be close! Algorithm: –given attribute vector of target –forward message to neighbor closest to target Let’s look a bit at the paperpaper Permits fast navigation under broad conditions –not as sensitive as Kleinberg’s model all jobs scientists athletes chemistry CS baseball tennis