1 New Directions for Power Law Research Michael Mitzenmacher Harvard University (With thanks to David Parkes!)

Slides:



Advertisements
Similar presentations
1 New Directions for Power Law Research Michael Mitzenmacher Harvard University.
Advertisements

Scale Free Networks.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Chapter 6 Sampling and Sampling Distributions
Analysis and Modeling of Social Networks Foudalis Ilias.
The Connectivity and Fault-Tolerance of the Internet Topology
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Synopsis of “Emergence of Scaling in Random Networks”* *Albert-Laszlo Barabasi and Reka Albert, Science, Vol 286, 15 October 1999 Presentation for ENGS.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
A Brief History of Lognormal and Power Law Distributions and an Application to File Size Distributions Michael Mitzenmacher Harvard University.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
CS728 Lecture 5 Generative Graph Models and the Web.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Chapter 7 Sampling and Sampling Distributions
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
1 A History of and New Directions for Power Law Research Michael Mitzenmacher Harvard University.
Advanced Topics in Data Mining Special focus: Social Networks.
1 Toward Validation and Control of Network Models Michael Mitzenmacher Harvard University.
Analysis of Social Information Networks Thursday January 27 th, Lecture 3: Popularity-Power law 1.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Algorithmic Models for Sensor Networks Stefan Schmid and Roger Wattenhofer WPDRTS, Island of Rhodes, Greece, 2006.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Personality, 9e Jerry M. Burger
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
1 A Random-Surfer Web-Graph Model Avrim Blum, Hubert Chan, Mugizi Rwebangira Carnegie Mellon University.
Knowledge is Power Marketing Information System (MIS) determines what information managers need and then gathers, sorts, analyzes, stores, and distributes.
Computer Science 1 Web as a graph Anna Karpovsky.
1 Dynamic Models for File Sizes and Double Pareto Distributions Michael Mitzenmacher Harvard University.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Sublinear time algorithms Ronitt Rubinfeld Computer Science and Artificial Intelligence Laboratory (CSAIL) Electrical Engineering and Computer Science.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Information Networks Power Laws and Network Models Lecture 3.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.2.
1 Networks The Internet may be described as part technology and part human interaction. To describe it as one or the other is not quite accurate. Unlike.
Models and Algorithms for Complex Networks Power laws and generative processes.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
A Graph-based Friend Recommendation System Using Genetic Algorithm
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint
Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.
Question paper 1997.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Finishing up: Statistics & Developmental designs Psych 231: Research Methods in Psychology.
How Do “Real” Networks Look?
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
1 Dr. Michael D. Featherstone Introduction to e-Commerce Network Theory 101.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
1 New Directions for Power Law Research Michael Mitzenmacher Harvard University.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
The simultaneous evolution of author and paper networks
Lecture 1: Complex Networks
Introduction to e-Commerce
How Do “Real” Networks Look?
How Do “Real” Networks Look?
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Diffusion in Networks
Network Models Michael Goodrich Some slides adapted from:
Presentation transcript:

1 New Directions for Power Law Research Michael Mitzenmacher Harvard University (With thanks to David Parkes!)

2 Warning This talk does not have specific results. Meant to be provocative and inspire future research directions.

3 Warning This talk does not have specific results. Meant to be provocative and inspire future research directions. (And you have to be careful when Harvard people preface a talk this way.)

4 Motivation: General Power laws (and/or scale-free networks) are now everywhere. –See the popular texts Linked by Barabasi or Six Degrees by Watts. –In computer science: file sizes, download times, Internet topology, Web graph, etc. –Other sciences: Economics, physics, ecology, linguistics, etc. What has been and what should be the research agenda?

5 My (Biased) View There are 5 stages of power law research. 1)Observe: Gather data to demonstrate power law behavior in a system. 2)Signify: Explain the import of this observation in the system context. 3)Model: Propose an underlying model for the observed behavior of the system. 4)Validate: Find data to validate (and if necessary specialize or modify) the model. 5)Control: Design ways to control and modify the underlying behavior of the system based on the model.

6 My (Biased) View In networks, we have spent a lot of time observing and signifying power laws. We are currently in the modeling stage. –Many, many possible models. –I’ll talk about some of my favorites later on. We need to now put much more focus on validation and control. –And these are specific areas where computer science has much to contribute!

7 History In 1990’s, the abundance of observed power laws in networks surprised the community. –Perhaps they shouldn’t have… power laws appear frequently throughout the sciences. Pareto : income distribution, 1897 Zipf-Auerbach: city sizes, 1913/1940’s Zipf-Estouf: word frequency, 1916/1940’s Lotka: bibliometrics, 1926 Yule: species and genera, Mandelbrot: economics/information theory, 1950’s+ Observation/signification were/are key to initial understanding. My claim: but now the mere existence of power laws should not be surprising, or necessarily even noteworthy. My (biased) opinion: The bar should now be very high for observation/signification.

8 Models After observation, the natural step is to explain/model the behavior. Outcome: lots of modeling papers. –And many models rediscovered. Big survey article: Lots of history…

9 Preferential Attachment Consider dynamic Web graph. –Pages join one at a time. –Each page has one outlink. Let X j (t) be the number of pages of degree j at time t. New page links: –With probability , link to a random page. –With probability (1-  ), a link to a page chosen proportionally to indegree. (Copy a link.)

10 Simple Analysis Assume limiting distribution where

11 Preferential Attachment History The previous analysis was derived in the 1950’s by Herbert Simon. –… who won a Nobel Prize in economics for entirely different work. –His analysis was not for Web graphs, but for other preferential attachment problems.

12 Optimization Model: Power Law Mandelbrot experiment: design a language over a d-ary alphabet to optimize information per character. –Probability of jth most frequently used word is p j. –Length of jth most frequently used word is c j. Average information per word: Average characters per word:

13 Optimization Model: Power Law Optimize ratio A = C/H.

14 Monkeys Typing Randomly Miller (psychologist, 1957) suggests following: monkeys type randomly at a keyboard. –Hit each of n characters with probability p. –Hit space bar with probability 1 - np > 0. –A word is sequence of characters separated by a space. Resulting distribution of word frequencies follows a power law. Conclusion: Mandelbrot’s “optimization” not required for languages to have power law

15 Miller’s Argument All words with k letters appear with prob. There are n k words of length k. –Words of length k have frequency ranks Manipulation yields power law behavior Recently extended by Conrad, Mitzenmacher to case of unequal letter probabilities. –Non-trivial: utilizes complex analysis.

16 Multiplicative Models Start with an organism of size X 0. At each time step, size changes by a random multiplicative factor. If F t are independent, identically distributed then (by CLT) X t converges to lognormal distribution. BUT! if there exists a lower bound: then X t converges to a power law distribution. (Champernowne, 1953)

17 Double Pareto Distributions Consider continuous version of multiplicative generative model. –At time t, log X t is normal with mean  t and variance  2 t Suppose observation time is randomly distributed. –Income model: observation time depends on age, generations in the country, etc.

18 Double Pareto Distributions Reed (2000,2001) analyzes case where time distributed exponentially. –Also Adamic, Huberman (1999). Simplest case: 

19 Double Pareto Behavior Double Pareto behavior, density –On log-log plot, density is two straight lines –Between lognormal (curved) and power law (one line) Can have lognormal shaped body, Pareto tail. –The ccdf has Pareto tail; linear on log-log plots. –But cdf is also linear on log-log plots. Mitzenmacher gives a natural generative model that yields double Pareto distributions based on recursive trees.

20 And So Many More… New variations coming up all of the time. Question : What makes a new power law model sufficiently interesting to merit attention and/or publication? –Strong connection to an observed process. Many models claim this, but few demonstrate it convincingly. (Willinger’s talk.) –Theory perspective: new mathematical insight or sophistication. My (biased) opinion: the bar should start being raised on model papers.

21 Validation: The Current Stage We now have so many models. It may be important to know the right model, to extrapolate and control future behavior. Given a proposed underlying model, we need tools to help us validate it. We appear to be entering the validation stage of research…. BUT the first steps have focused on invalidation rather than validation.

22 Examples : Invalidation Lakhina, Byers, Crovella, Xie –Show that observed power-law of Internet topology might be because of biases in traceroute sampling. Chen, Chang, Govindan, Jamin, Shenker, Willinger –Show that Internet topology has characteristics that do not match preferential-attachment graphs. –Suggest an alternative mechanism. But does this alternative match all characteristics, or are we still missing some?

23 My (Biased) View Invalidation is an important part of the process! BUT it is inherently different than validating a model. Validating seems much harder. Indeed, it is arguable what constitutes a validation. Question: what should it mean to say “This model is consistent with observed data.”

24 Trace Analysis Many models posit some sort of actions. –New pages linking to pages in the Web. –New routers joining the network. –New files appearing in a file system. A validation approach: gather traces and see if the traces suitably match the model. –Trace gathering can be a challenging systems problem. –Check model match requires using appropriate statistical techniques and tests. –May lead to new, improved, better justified models.

25 Sampling and Trace Analysis Often, cannot record all actions. –Internet is too big! Sampling –Global: snapshots of entire system at various times. –Local: record actions of sample agents in a system. Examples: –Snapshots of file systems: full systems vs. actions of individual users. –Router topology: Internet maps vs. changes at subset of routers. Question: how much/what kind of sampling is sufficient to validate a model appropriately? –Does this differ among models?

26 To Control In many systems, intervention can impact the outcome. –Maybe not for earthquakes, but for computer networks! –Typical setting: individual agents acting in their own best interest, giving a global power law. Agents can be given incentives to change behavior. General problem: given a good model, determine how to change system behavior to optimize a global performance function. –Distributed algorithmic mechanism design. –Mix of economics/game theory and computer science.

27 Possible Control Approaches Adding constraints: local or global –Example: total space in a file system. –Example: preferential attachment but links limited by an underlying metric. Add incentives or costs –Example: charges for exceeding soft disk quotas. –Example: payments for certain AS level connections. Limiting information –Impact decisions by not letting everyone have true view of the system.

28 Conclusion : My (Biased) View There are 5 stages of power law research. 1)Observe: Gather data to demonstrate power law behavior in a system. 2)Signify: Explain the import of this observation in the system context. 3)Model: Propose an underlying model for the observed behavior of the system. 4)Validate: Find data to validate (and if necessary specialize or modify) the model. 5)Control: Design ways to control and modify the underlying behavior of the system based on the model. We need to focus on validation and control. –Lots of open research problems.

29 A Chance for Collaboration The observe/signify stages of research are dominated by systems; modeling dominated by theory. Validation and control require a strong theoretical foundation. –Need universal ideas and methods that span different types of systems. –Need understanding of underlying mathematical models. But also a large systems buy-in. –Getting/analyzing/understanding data. –Find avenues for real impact. Good area for future systems/theory collaboration and interaction.