Models and Algorithms for Complex Networks Power laws and generative processes.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Modeling Grid Job Time Properties Lovro.
Introduction Simple Random Sampling Stratified Random Sampling
Paradigms of complex systems
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Metabolic theory and ecological scaling Geoffrey WestJames Brown Brian Enquist.
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Random Sampling and Data Description
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Integration of sensory modalities
Information Networks Small World Networks Lecture 5.
On Power-Law Relationships of the Internet Topology Michalis Faloutsos Petros Faloutsos Christos Faloutsos.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Network Measurements.
Web Graph Characteristics Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!)
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
Chapter 8: Probability: The Mathematics of Chance Lesson Plan
Visual Recognition Tutorial
Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
Power Laws Otherwise known as any semi- straight line on a log-log plot.
Advanced Topics in Data Mining Special focus: Social Networks.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
The moment generating function of random variable X is given by Moment generating function.
1 Dynamic Models for File Sizes and Double Pareto Distributions Michael Mitzenmacher Harvard University.
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13.
Standard error of estimate & Confidence interval.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Moment Generating Functions 1/33. Contents Review of Continuous Distribution Functions 2/33.
Information Networks Power Laws and Network Models Lecture 3.
Traffic Modeling.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.
1 Statistical Distribution Fitting Dr. Jason Merrick.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Percolation Percolation is a purely geometric problem which exhibits a phase transition consider a 2 dimensional lattice where the sites are occupied with.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Power law distribution
Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Chapter 8: Probability: The Mathematics of Chance Probability Models and Rules 1 Probability Theory  The mathematical description of randomness.  Companies.
Percolation Percolation is a purely geometric problem which exhibits a phase transition consider a 2 dimensional lattice where the sites are occupied with.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Probability Distributions
Review of Probability Theory
Probability Theory and Parameter Estimation I
Competition under Cumulative Advantage
Maximum Likelihood Estimation
Sample Mean Distributions
15-826: Multimedia Databases and Data Mining
Statistics of Extreme Fluctuations in Task Completion Landscapes
Competition under Cumulative Advantage
Chapter 8: Probability: The Mathematics of Chance Lesson Plan
Modelling and Searching Networks Lecture 6 – PA models
Discrete Mathematics and its Applications Lecture 6 – PA models
Chapter 8: Probability: The Mathematics of Chance Lesson Plan
Presentation transcript:

Models and Algorithms for Complex Networks Power laws and generative processes

Power Laws - Recap  A (continuous) random variable X follows a power-law distribution if it has density function  A (continuous) random variable X follows a Pareto distribution if it has cumulative function  A (discrete) random variable X follows Zipf’s law if the frequency of the r-th largest value satisfies power-law with α=1+β power-law with α=1+1/γ

Power laws are ubiquitous

But not everything is power law

Measuring power laws Simple log-log plot gives poor estimate

Logarithmic binning  Bin the observations in bins of exponential size

Cumulative distribution  Fit a line on the log-log plot of the cumulative distribution  it also follows a power-law with exponent α -1

Maximum likelihood estimation  Assume that the data are produced by a power-law distribution with some exponent α  Find the exponent that maximizes the probability P( α |x)

Divergent(?) mean

The 80/20 rule  Cumulative distribution is top-heavy

Power Laws – Generative processes  We have seen that power-laws appear in various natural, or man-made systems  What are the processes that generate power-laws?  Is there a “universal” mechanism?

Preferential attachment  The main idea is that “the rich get richer”  first studied by Yule for the size of biological genera  revisited by Simon  reinvented multiple times  Also known as  Gibrat principle  cumulative advantage  Mathew effect

The Yule process  The setting:  a set of species defines a genus  the number of species in genera follows a power-law  The Yule process:  at the n-th step of the process we have n genera  m new species are added to the existing genera through speciation evens: an existing species splits into two  the generation of the (m+1)-th species causes the creation of the (n+1)-th genera containing 1 species  The sizes of genera follows a power law with

Critical phenomena  When the characteristic scale of a system diverges, we have a phase transition.  Critical phenomena happen at the vicinity of the phase transition. Power-law distributions appear  Phase transitions are also referred to as threshold phenomena

Percolation on a square lattice  Each cell is occupied with probability p  What is the mean cluster size?

Critical phenomena and power laws  For p < p c mean size is independent of the lattice size  For p > p c mean size diverges (proportional to the lattice size - percolation)  For p = p c we obtain a power law distribution on the cluster sizes p c = …

Self Organized Criticality  Consider a dynamical system where trees appear in randomly at a constant rate, and fires strike cells randomly  The system eventually stabilizes at the critical point, resulting in power-law distribution of cluster (and fire) sizes

The idea behind self-organized criticality (more or less)  There are two contradicting processes  e.g., planting process and fire process  For some choice of parameters the system stabilizes to a state that no process is a clear winner  results in power-law distributions  The parameters may be tunable so as to improve the chances of the process to survive  e.g., customer’s buying propensity, and product quality.

Combination of exponentials  If variable Y is exponentially distributed  If variable X is exponentially related to Y  Then X follows a power law  Model for population of organisms

Monkeys typing randomly  Consider the following generative model for a language [Miller 57]  The space appears with probability q s  The remaining m letters appear with equal probability (1-q s )/m  Frequency of words follows a power law!  Real language is not random. Not all letter combinations are equally probable, and there are not many long words

Least effort principle  Let C j be the cost of transmitting the j-th most frequent word  The average cost is  The average information content is  Minimizing cost per information unit C/H yields

The log-normal distribution  The variable Y = log X follows a normal distribution

Lognormal distribution  Generative model: Multiplicative process  Central Limit Theorem: If X 1,X 2,…,X n are i.i.d. variables with mean m and finite variance s, then if S n = X 1 +X 2 +…+X n

Example – Income distribution  Start with some income X 0  At time t with probability 1/3 double the income, with probability 2/3 cut the income in half  The probability of having income x after n steps follows a log-normal distribution  BUT… if we have a reflective boundary  when reaching income X 0 with probability 2/3 maintain the same income  then the distribution follows a power-law!

Double Pareto distribution  Double Pareto: Combination of two Pareto distributions

Double Pareto distribution  Run the multiplicative process for T steps, where T is an exponentially distributed random variable

References  M. E. J. Newman, Power laws, Pareto distributions and Zipf's law, Contemporary Physics  M. Mitzenmacher, A Brief History of Generative Models for Power Law and Lognormal Distributions, Internet Mathematics  Lada Adamic, Zipf, power-laws and Pareto -- a ranking tutorial. ing/ranking.html