CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #13: Power laws Potential causes and explanations C. Faloutsos.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
1 A B C
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
5.1 Rules for Exponents Review of Bases and Exponents Zero Exponents
Variations of the Turing Machine
AP STUDY SESSION 2.
1
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
David Burdett May 11, 2004 Package Binding for WS CDL.
NTDB ® Annual Report 2009 © American College of Surgeons All Rights Reserved Worldwide Percent of Hospitals Submitting Data to NTDB by State and.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Media-Monitoring Final Report April - May 2010 News.
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
A sample problem. The cash in bank account for J. B. Lindsay Co. at May 31 of the current year indicated a balance of $14, after both the cash receipts.
Turing Machines.
PP Test Review Sections 6-1 to 6-6
Regression with Panel Data
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
Chapter 1: Expressions, Equations, & Inequalities
Adding Up In Chunks.
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.
Artificial Intelligence
When you see… Find the zeros You think….
Before Between After.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
Subtraction: Adding UP
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Static Equilibrium; Elasticity and Fracture
FIGURE 12-1 Op-amp symbols and packages.
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
CSE20 Lecture 15 Karnaugh Maps Professor CK Cheng CSE Dept. UC San Diego 1.
Clock will move after 1 minute
famous photographer Ara Guler famous photographer ARA GULER.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
CMU SCS Carnegie Mellon Univ. School of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture #4: Relational Algebra.
15-826: Multimedia Databases and Data Mining
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #13: Power laws Potential causes and explanations C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2012)2 Must-read Material Mark E.J. Newman: Power laws, Pareto distributions and Zipfs law, Contemporary Physics 46, (2005), or

CMU SCS Copyright: C. Faloutsos (2012)3 Optional Material (optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991) – ch. 15.

CMU SCS Copyright: C. Faloutsos (2012)4 Outline Goal: Find similar / interesting things Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2012)5 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods –z-ordering –R-trees –misc fractals –intro –applications text...

CMU SCS Copyright: C. Faloutsos (2012)6 Indexing - Detailed outline fractals –intro –applications disk accesses for R-trees (range queries) … dim. curse revisited … –Why so many power laws?

CMU SCS Copyright: C. Faloutsos (2012)7 This presentation Definitions Clarification: 3 forms of P.L. Examples and counter-examples Generative mechanisms

CMU SCS Copyright: C. Faloutsos (2012)8 Definition p(x) = C x ^ (-a) (x >= xmin) Eg., prob( city pop. between x + dx) log (x) log(p(x)) log(xmin)

CMU SCS Copyright: C. Faloutsos (2012)9 For discrete variables Or, the Yule distribution:

CMU SCS Copyright: C. Faloutsos (2012)10 [Newman, 2005]

CMU SCS Copyright: C. Faloutsos (2012)11 Estimation for a

CMU SCS Copyright: C. Faloutsos (2012)12 This presentation Definitions Clarification: 3 forms of P.L. Examples and counter-examples Generative mechanisms

CMU SCS Jumping to the conclusion: Copyright: C. Faloutsos (2011)13

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)14 NCDF = CCDF x Prob( area >= x ) -a PDF = frequency-count plot x Prob( area = x ) -a-1 Zipf plot = Rank-frequency -1/a area rank IF ONE PLOT IS P.L., SO ARE THE OTHER TWO

CMU SCS Details, and proof sketches: Copyright: C. Faloutsos (2011)15

CMU SCS Copyright: C. Faloutsos (2011)16 More power laws: areas – Korcaks law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area) Reminder Vaenern

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)17 NCDF = CCDF x Prob( area >= x ) -a

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)18 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x )

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)19 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x ) -a-1

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)20 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x ) -a-1

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)21 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x ) -a-1 Zipf plot = Rank-frequency

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)22 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x ) -a-1 Zipf plot = Rank-frequency -1/a area rank

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)23 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x ) -a-1 Zipf plot = Rank-frequency -1/a area rank

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)24 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x ) -a-1 Zipf plot = Rank-frequency -1/a area rank

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)25 NCDF = CCDF x Prob( area >= x ) -a PDF x Prob( area = x ) -a-1 Zipf plot = Rank-frequency -1/a frequency rank

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)26 NCDF = CCDF x Prob( area >= x ) -a PDF = frequency-count plot x Prob( area = x ) -a-1 Zipf plot = Rank-frequency -1/a area rank

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)27 NCDF = CCDF x Prob( area >= x ) -a PDF = frequency-count plot frequency count -a-1 Zipf plot = Rank-frequency -1/a frequency rank the

CMU SCS Sanity check: Zipf showed that if –Slope of rank-frequency is -1 –Then slope of freq-count is -2 Check it! Copyright: C. Faloutsos (2011)28

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)29 NCDF = CCDF x Prob( area >= x ) -a PDF = frequency-count plot frequency count -a-1 Zipf plot = Rank-frequency -1/a frequency rank the slope = -1 slope = -2

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)30 NCDF = CCDF x Prob( area >= x ) -a PDF = frequency-count plot frequency count -a-1 Zipf plot = Rank-frequency -1/a frequency rank the slope = -1 slope = -2

CMU SCS 3 versions of P.L Copyright: C. Faloutsos (2011)31 NCDF = CCDF x Prob( area >= x ) -a PDF = frequency-count plot x Prob( area = x ) -a-1 Zipf plot = Rank-frequency -1/a area rank IF ONE PLOT IS P.L., SO ARE THE OTHER TWO

CMU SCS Copyright: C. Faloutsos (2012)32 This presentation Definitions Clarification: 3 forms of P.L. Examples and counter-examples Generative mechanisms

CMU SCS Copyright: C. Faloutsos (2012)33 Examples Word frequencies Citations of scientific papers Web hits Copies of books sold Magnitude of earthquakes Diameter of moon craters …

CMU SCS Copyright: C. Faloutsos (2012)34 [Newman 2005] Rank-frequency plots Or Cumulative D.F.

CMU SCS Copyright: C. Faloutsos (2012)35 NOT following P.L. Cumul. D.F. Size of forest fires Number of addresses abundance of species

CMU SCS Copyright: C. Faloutsos (2012)36 This presentation Definitions - clarification Examples and counter-examples Generative mechanisms –Combination of exponentials –Inverse –Random walk –Yule distribution = CRP –Percolation –Self-organized criticality –Other

CMU SCS Copyright: C. Faloutsos (2012)37 Combination of exponentials Let p(y) = e ay eg., radioactive decay, with half-life –a (= collection of people, playing russian roulette) Let x ~ e by (every time a person survives, we double his capital) p(x) = p(y)*dy/dx = 1/b x (-1+a/b) Ie, the final capital of each person follows P.L.

CMU SCS Copyright: C. Faloutsos (2012)38 Combination of exponentials Monkey on a typewriter: m=26 letters equiprobable; space bar has prob. q s THEN: Freq( x-th most frequent word) = x (-a) see Eq. 47 of [Newman]: a = [2 ln(m) – ln (1 –q s )] / [ln m – ln (1 – q s )]

CMU SCS Copyright: C. Faloutsos (2012)39 This presentation Definitions Clarification Examples and counter-examples Generative mechanisms –Combination of exponentials –Inverse –Random walk –Yule distribution = CRP –Percolation –Self-organized criticality –Other

CMU SCS Copyright: C. Faloutsos (2012)40 Inverses of quantities y follows p(y) and goes through zero x = 1/y Then p(x) = … = - p(y) / x 2 For y~0, x has power law tail. y-> speed x-> travel time

CMU SCS Copyright: C. Faloutsos (2012)41 This presentation Definitions Clarification Examples and counter-examples Generative mechanisms –Combination of exponentials –Inverse –Random walk –Yule distribution = CRP –Percolation –Self-organized criticality –Other

CMU SCS Copyright: C. Faloutsos (2012)42 Random walks Inter-arrival times PDF: p(t) ~ t -3/2 ??

CMU SCS Copyright: C. Faloutsos (2012)43 Random walks Inter-arrival times PDF: p(t) ~ t -3/2

CMU SCS Copyright: C. Faloutsos (2012)44 Random walks J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF

CMU SCS Copyright: C. Faloutsos (2012)45 This presentation Definitions - clarification Examples and counter-examples Generative mechanisms –Combination of exponentials –Inverse –Random walk –Yule distribution = CRP –Percolation –Self-organized criticality –Other

CMU SCS Copyright: C. Faloutsos (2012)46 Yule distribution and CRP Chinese Restaurant Process (CRP): Newcomer to a restaurant Joins an existing table (preferring large groups Or starts a new table/group of its own, with prob 1/m a.k.a.: rich get richer; Yule process

CMU SCS Copyright: C. Faloutsos (2012)47 Yule distribution and CRP Then: Prob( k people in a group) = p k = (1 + 1/m) B( k, 2+1/m) ~ k -(2+1/m) (since B(a,b) ~ a ** (-b) : power law tail)

CMU SCS Copyright: C. Faloutsos (2012)48 Yule distribution and CRP Yule process Gibrat principle Matthew effect Cumulative advantage Preferential attachement rich get richer

CMU SCS Copyright: C. Faloutsos (2012)49 This presentation Definitions - clarification Examples and counter-examples Generative mechanisms –Combination of exponentials –Inverse –Random walk –Yule distribution = CRP –Percolation –Self-organized criticality –Other

CMU SCS Copyright: C. Faloutsos (2012)50 Percolation and forest fires A burning tree will cause its neighbors to burn next. Which tree density p will cause the fire to last longest?

CMU SCS Copyright: C. Faloutsos (2012)51 Percolation and forest fires density 0 1 Burning time N N ?

CMU SCS Copyright: C. Faloutsos (2012)52 Percolation and forest fires density 0 1 Burning time N N

CMU SCS Copyright: C. Faloutsos (2012)53 Percolation and forest fires density 0 1 Burning time N N Percolation threshold, pc ~ 0.593

CMU SCS Copyright: C. Faloutsos (2012)54 Percolation and forest fires At pc ~ 0.593: No characteristic scale; patches of all sizes; Korcak-like law.

CMU SCS Copyright: C. Faloutsos (2012)55 This presentation Definitions - clarification Examples and counter-examples Generative mechanisms –Combination of exponentials –Inverse –Random walk –Yule distribution = CRP –Percolation –Self-organized criticality –Other

CMU SCS Copyright: C. Faloutsos (2012)56 Self-organized criticality Trees appear at random (eg., seeds, by the wind) Fires start at random (eg., lightning) Q1: What is the distribution of size of forest fires?

CMU SCS Copyright: C. Faloutsos (2012)57 Self-organized criticality A1: Power law-like Area of cluster s CCDF

CMU SCS Copyright: C. Faloutsos (2012)58 Self-organized criticality Trees appear at random (eg., seeds, by the wind) Fires start at random (eg., lightning) Q2: what is the average density?

CMU SCS Copyright: C. Faloutsos (2012)59 Self-organized criticality A2: the critical density pc ~ 0.593

CMU SCS Copyright: C. Faloutsos (2012)60 Self-organized criticality [Bak]: size of avalanches ~ power law: Drop a grain randomly on a grid It causes an avalanche if height(x,y) is >1 higher than its four neighbors [Per Bak: How Nature works, 1996]

CMU SCS Copyright: C. Faloutsos (2012)61 This presentation Definitions - clarification Examples and counter-examples Generative mechanisms –Combination of exponentials –Inverse –Random walk –Yule distribution = CRP –Percolation –Self-organized criticality –Other

CMU SCS Copyright: C. Faloutsos (2012)62 Other Random multiplication Fragmentation -> lead to lognormals (~ look like power laws)

CMU SCS Copyright: C. Faloutsos (2012)63 Others Random multiplication: Start with C dollars; put in bank Random interest rate s(t) each year t Each year t: C(t) = C(t-1) * (1+ s(t)) Log(C(t)) = log( C ) + log(..) + log(..) … -> Gaussian

CMU SCS Copyright: C. Faloutsos (2012)64 Others Random multiplication: Log(C(t)) = log( C ) + log(..) + log(..) … -> Gaussian Thus C(t) = exp( Gaussian ) By definition, this is Lognormal

CMU SCS Copyright: C. Faloutsos (2012)65 Others Lognormal: $ = e h pdf 0 h = body height

CMU SCS Copyright: C. Faloutsos (2012)66 Others Lognormal: log ($) log(pdf) parabola

CMU SCS Copyright: C. Faloutsos (2012)67 Others Lognormal: log ($) log(pdf) parabola 1c

CMU SCS Copyright: C. Faloutsos (2012)68 Other Random multiplication Fragmentation -> lead to lognormals (~ look like power laws)

CMU SCS Copyright: C. Faloutsos (2012)69 Other Stick of length 1 Break it at a random point x (0<x<1) Break each of the pieces at random Resulting distribution: lognormal (why?)

CMU SCS Fragmentation -> lognormal Copyright: C. Faloutsos (2012)70 … p11-p1 p1 * … …

CMU SCS Copyright: C. Faloutsos (2012)71 Conclusions Power laws and power-law like distributions appear often (fractals/self similarity -> power laws) Exponentiation/inversion Yule process / CRP / rich get richer Criticality/percolation/phase transitions Fragmentation -> lognormal ~ P.L.

CMU SCS Copyright: C. Faloutsos (2011)72 References Zipf, Power-laws, and Pareto - a ranking tutorial, Lada A. Adamicwww.hpl.hp.com/research/idl/paper s/ranking/ranking.htmlwww.hpl.hp.com/research/idl/paper s/ranking/ranking.html L.A. Adamic and B.A. Huberman, 'Zipfs law and the Internet', Glottometrics 3, 2002, 'Zipfs law and the Internet' Human Behavior and Principle of Least Effort, G.K. Zipf, Addison Wesley (1949)