Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7.

Slides:



Advertisements
Similar presentations
Simulation Examples in EXCEL Montana Going Green 2010.
Advertisements

Quantum Computation and Quantum Information – Lecture 2
Lecture 8: Coin Flipping by Telephone Wayne Patterson SYCS 654 Spring 2010.
Mixed Strategies.
High Frequency Words Words 1-100
Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
CSC 172 DATA STRUCTURES. SKIP LISTS Read Weiss
A measurement of fairness game 1: A box contains 1red marble and 3 black marbles. Blindfolded, you select one marble. If you select the red marble, you.
Clear your desk for your quiz. Unit 2 Day 8 Expected Value Average expectation per game if the game is played many times Can be used to evaluate and.
Take out a coin! You win 4 dollars for heads, and lose 2 dollars for tails.
Chapter 6: What Do You Expect? Helpful Links:
Insert Lesson Title Here 1) Joann flips a coin and gets a head. Then she rolls a 6 on a number cube. 2) You pull a black marble out of a bag. You don’t.
Why can I flip a coin 3 times and get heads all three times?
Probability Probability; Sampling Distribution of Mean, Standard Error of the Mean; Representativeness of the Sample Mean.
Announcements: See schedule for weeks 8 and 9 See schedule for weeks 8 and 9 Project workdays, due dates, exam Project workdays, due dates, exam Projects:
1 Review of Probability Theory [Source: Stanford University]
ITIS 6200/8200. time-stamping services Difficult to verify the creation date and accurate contents of a digital file Required properties of time-stamping.
Chapter 7 Expectation 7.1 Mathematical expectation.
A multiple-choice test consists of 8 questions
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898.
Binomial Distributions Calculating the Probability of Success.
1 Bernoulli trial and binomial distribution Bernoulli trialBinomial distribution x (# H) 01 P(x)P(x)P(x)P(x)(1 – p)p ?
Math 15 – Elementary Statistics Sections 7.1 – 7.3 Probability – Who are the Frequentists?
Target: Find and interpret experimental and theoretical probabilities.
On Lossy Compression Paul Vitanyi CWI, University of Amsterdam, National ICT Australia Joint work with Kolya Vereshchagin.
Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam Spring, 2009.
Conditional Probability. The probability of an event A, given the occurrence of some other event B: Ex: A card is selected from a standard 52 card deck.
Glencoe Course 3 Lesson 8-6. KWL Chart Know about Experimental Probability What Like Know about Experimental Probability Learned about Experimental.
VOCABULARY CHECK Prerequisite Skills Copy and complete using a review word from the list: data, mean, median, range, outcome, probability of an event.
K. Shum Lecture 6 Various definitions of Probability.
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860,
Remaining course content Remote, fair coin flipping Remote, fair coin flipping Presentations: Protocols, Elliptic curves, Info Theory, Quantum Crypto,
The probability theory begins in attempts to describe gambling (how to win, how to divide the stakes, etc.), probability theory mainly considered discrete.
Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School.
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.
Information Theory Basics What is information theory? A way to quantify information A lot of the theory comes from two worlds Channel.
Flipping coins over the telephone and other games.
Quantum Cryptography Slides based in part on “A talk on quantum cryptography or how Alice outwits Eve,” by Samuel Lomonaco Jr. and “Quantum Computing”
Game Theory, Part 2 Consider again the game that Sol and Tina were playing, but with a different payoff matrix: H T Tina H T Sol.
Lecture 2 Molecular dynamics simulates a system by numerically following the path of all particles in phase space as a function of time the time T must.
5.1 Probability in our Daily Lives.  Which of these list is a “random” list of results when flipping a fair coin 10 times?  A) T H T H T H T H T H 
1 Parrondo's Paradox. 2 Two losing games can be combined to make a winning game. Game A: repeatedly flip a biased coin (coin a) that comes up head with.
Probability Trash-ball
Lecture 10. Paradigm #8: Randomized Algorithms Back to the “majority problem” (finding the majority element in an array A). FIND-MAJORITY(A, n) while (true)
Software Security Seminar - 1 Chapter 4. Intermediate Protocols 발표자 : 이장원 Applied Cryptography.
By:Tehya Pugh. What is Theoretical Probability  Theoretical Probability Is what you predict what will happen without really doing the experiment.  I.
Making Predictions with Theoretical Probability. Warm Up You flip a coin three times. 1.Create a tree diagram to find the sample space. 2.How many outcomes.
LECTURE 16 TUESDAY, 20 OCTOBER STA 291 Fall
When could two experimental probabilities be equal? Question of the day.
How likely are you to have earned a “A” on the test?
Experimental Probability
P.O.D. #4 basicadvanced A coin is flipped 12 times. It lands on tails 8 times. Based on this experiment, how many times would tails come up if you flipped.
Relative Frequency.
Expected Value.
PROBABILITY The probability of an event is a value that describes the chance or likelihood that the event will happen or that the event will end with.
Lecture 6. Prefix Complexity K
CS 154, Lecture 6: Communication Complexity
Chapter 17 Thinking about Chance.
Probability Trees By Anthony Stones.
History of Probability
Expected Value.
Unit 8. Day 10..
Stochastic Processes and Trees
Entropy CSCI284/162 Spring 2009 GWU.
Lesson #8 Guided Notes.
Expected Value.
Fun… Tree Diagrams… Probability.
“And” Probabilities.
MAS2317- Introduction to Bayesian Statistics
Presentation transcript:

Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7 July 2010

1. Intuition & history What is the information content of an individual string? 111 …. 1 (n 1s) π = … n = Champernownes number: … is normal in scale 10 (every block has same frequency) All these numbers share one commonality: there are small programs to generate them. Shannons information theory does not help here.

Andrey Nikolaevich Kolmogorov ( 1903, Tambov, Russia1987 Moscow) Measure Theory Probability Analysis Intuitionistic Logic Cohomology Dynamical Systems Hydrodynamics Kolmogorov complexity

Example: Randomness Bob proposes to flip a coin with Alice: Alice wins a dollar if Heads; Bob wins a dollar if Tails Result: TTTTTT …. 100 Tails in a roll. Alice lost $100. She feels cheated.

Alice goes to the court Alice complains: T 100 is not random. Bob asks Alice to produce a random coin flip sequence. Alice flipped her coin and got THTTHHTHTHHHTTTTH … But Bob claims Alices sequence has probability , and so does his. How do we define randomness?

2. Roots of Kolmogorov complexity and preliminaries (1) Foundations of Probability P. Laplace: … a sequence is extraordinary (nonrandom) because it contains regularity (which is rare) von Mises notion of a random sequence S: lim n { #(1) in n-prefix of S}/n =p, 0<p<1 The above holds for any subsequence of S selected by an admissible selection rule. If `admissible rule is any partial function then there are no random sequences. A. Wald: countably many admissible selection rules. Then there are random sequences. A. Church: recursive selection functions J. Ville: von Mises-Wald-Church random sequence does not satisfy all laws of randomness.

Roots … (2) Information Theory. Shannon theory is on an ensemble. But what is information in an individual object? (3) Inductive inference. Bayesian approach using universal prior distribution (4) Shannons State x Symbol (Turing machine) complexity.

Preliminaries and Notations Strings: x, y, z. Usually binary. x=x 1 x 2... an infinite binary sequence x i:j =x i x i+1 … x j |x| is number of bits in x.. Sets, A, B, C … |A|, number of elements in set A.. I assume you know Turing machines, universal TMs, basic facts...

3. Mathematical Theory Solomonoff (1960)-Kolmogorov (1965)-Chaitin (1969): The amount of information in a string is the size of the smallest program of an optimal Universal TM U generating that string. C (x) = min {|p|: U(p) = x } U p U p Invariance Theorem: It does not matter which optimal universal Turing machine U we choose. I.e. all universal encoding methods are ok.

Proof of the Invariance theorem Fix an effective enumeration of all Turing machines (TMs): T 1, T 2, … Define C = min {|p|: T(p) = x} T p U is an optimal universal TM such that (p produces x) U(1 n 0p) = T n (p) Then for all x: C U (x) C Tn (x) + n+1, and |C U (x) – C U (x)| c. Fixing U, we write C(x) instead of C U (x). QED Formal statement of the Invariance Theorem: There exists a computable function S 0 such that for all computable functions S, there is a constant c S such that for all strings x ε {0,1} * C S0 (x) C S (x) + c S

It has many applications Mathematics --- probability theory, logic, statistics. Physics --- chaos, thermodynamics. Computer Science – average case analysis, inductive inference and learning, shared information between documents, data mining and clustering, incompressibility method -- examples: Prime number theorem Goedels incompleteness Shellsort average case Heapsort average case Circuit complexity Lower bounds on combinatorics, graphs,Turing machine computations, formal languages, communication complexity, routing Philosophy, biology, cognition, etc – randomness, inference, learning, complex systems, sequence similarity Information theory – information in individual objects, information distance Classifying objects: documents, genomes Query Answering systems

Mathematical Theory cont. Intuitively: C(x)= length of shortest effective description of x Define conditional Kolmogorov complexity similarly, with C(x|y)=length of shortest description of x given y. Examples C(xx) = C(x) + O(1) C(xy) C(x) + C(y) + O(log(min{C(x),C(y)}) C(1 n ) O(logn) C(π 1:n ) O(logn); C(π 1:n |n) O(1) For all x, C(x) |x|+O(1) C(x|x) = O(1) C(x|ε) = C(x); C(ε|x)=O(1)

3.1 Basics Incompressibility: For constant c>0, a string x ε {0,1} * is c-incompressible if C(x) |x|-c. For constant c, we often simply say that x is incompressible. (We will call incompressible strings random strings.) Lemma. There are at least 2 n – 2 n-c +1 c-incompressible strings of length n. Proof. There are only k=0,…,n-c-1 2 k = 2 n-c -1 programs with length less than n-c. Hence only that many strings (out of total 2 n strings of length n) can have shorter programs (descriptions) than n-c. QED.

Facts If x=uvw is incompressible, then C(v) |v| - O(log |x|). Proof. C(uvw) = |uvw| |uw| +C(v)+ O(log |u|) +O(log C(v)). If p is the shortest program for x, then C(p) |p| - O(1) C(x|p) = O(1) but C(p|x) C(|p|)+O(1) ( optimal because of the Halting Problem!) If a subset A of {0,1}* is recursively enumerable (r.e.) (the elements of A can be listed by a Turing machine), and A is sparse (|A =n | p(n) for some polynomial p), then for all x in A, |x|=n, C(x) O(log p(n) ) + O(C(n)) +O(|A|)= O(log n).

3.2 Asymptotics Enumeration of binary strings: 0,1,00,01,10, mapping to natural numbers 0, 1, 2, 3, … C(x) as x Define m(x) to be the monotonic lower bound of C(x) curve (as natural number x ). Then m(x), as x, and m(x) < Q(x) for all unbounded computable Q. Nonmonotonicity: for x=yz, it does not imply that C(y)C(x)+O(1).

Graph of C(x) for integer x. Function m(x) is greatest monotonic non-decreasing lower bound.

Graph of C(x|l(x)). Function m(x) is greatest monotonic non-decreasing lower bound.

The Incompressibility Method: Shellsort Using p increments h 1, …, h p, with h p =1 At k-th pass, the array is divided in h k separate sublists of length n/h k (taking every h k -th element). Each sublist is sorted by insertion/bubble sort Application: Sorting networks --- n log 2 n comparators, easy to program, competitive for medium size lists to be sorted.

Shellsort history Invented by D.L. Shell [1959], using p k = n/2 k for step k. It is a Θ(n 2 ) time algorithm Papernow&Stasevitch [1965]: O(n 3/2 ) time by destroying regularity in Shells geometric sequence. Pratt [1972]: All quasi geometric sequences use O(n 3/2 ) time.Θ(nlog 2 n) time for p=(log n)^2 with increments 2^i3^j. Incerpi-Sedgewick, Chazelle, Plaxton, Poonen, Suel (1980s) – best worst case, roughly, Θ(nlog 2 n / (log logn) 2 ). Average case: Knuth [1970s]: Θ(n 5/3 ) for p=2 Yao [1980]: p=3 characterization, no running time. Janson-Knuth [1997]: O(n 23/15 ) for p=3. Jiang-Li-Vitanyi [J.ACM, 2000]: Ω(pn 1+1/p ) for every p.

Shellsort Average Case Lower bound Theorem. p-pass Shellsort average case T(n) pn 1+1/p Proof. Fix a random permutation Π with Kolmogorov complexity nlogn. I.e. C(Π) nlogn. Use Π as input. (We ignore the self-delimiting coding of the subparts below. The real proof uses better coding.) For pass i, let m i,k be the number of steps the kth element moves. Then T(n) = Σ i,k m i,k From these m i,k 's, one can reconstruct the input Π, hence Σ log m i,k C(Π) n logn Maximizing the left, all m i,k must be the same (maintaining same sum). Call it m. So Σ m = pnm = Σ i,k m i,k Then, Σ log m = pn log m Σ log m i,k nlogn m p n. So T(n) = pnm > pn 1+1/p. Corollary: p=1: Bubblesort Ω(n 2 ) average case lower bound. p=2: n 3/2 lower bound. p=3, n 4/3 lower bound (4/3=20/15); and only p=Θ(log n) can give average time O(n log n).

Similarity of Strings: The Problem: Given: Literal objects Determine: Similarity Distance Matrix (distances between every pair) (binary files) Applications: Clustering, Classification, Evolutionary trees of Internet documents, computer programs, chain letters, genomes, languages, texts, music pieces, ocr, ……

Normalized Information Distance Definition. We define the normalized information distance: d(x,y) =max{C(x|y,C(y|x)}/max{C(x),C(y)} The new measure has the following properties: Triangle inequality symmetric; d(x,y)>0 for x y and d(x,x)=0; Hence it is a metric! But it is not computable.

Practical concerns d(x,y) is not computable, hence we replace C(x) by Z(x), the length of the compressed version of x using compressor Z (gzip,bzip2,PPMZ). The equivalent formula becomes: d(x,y) = Z(xy)-min{Z(x),Z(y)} max{Z(x),Z(y)} This is a parameter-free, feature-free, alignment-free similarity method, usable for data-mining, phylogeny when features are unknown, and so on. Note: max{C(x|y),C(y|x)} = max{ C(xy)-C(y), C(xy)-C(x)} = C(xy) – min{C(x),C(y)}

Example: Eutherian Orders: It has been a disputed issue which of the two groups of placental mammals are closer: Primates, Ferungulates, Rodents. In mtDNA, 6 proteins say primates closer to ferungulates; 6 proteins say primates closer to rodents. Hasegawas group concatenated 12 mtDNA proteins from: rat, house mouse, grey seal, harbor seal, cat, white rhino, horse, finback whale, blue whale, cow, gibbon, gorilla, human, chimpanzee, pygmy chimpanzee, orangutan, sumatran orangutan, with opossum, wallaroo, platypus as out group, 1998, using max likelihood method in MOLPHY.

Evolutionary Tree of Mammals:

3.3 Properties Theorem (Kolmogorov) (i) C(x) is not partially recursive. That is, there is no Turing machine M s.t. M accepts (x,k) if C(x)k and undefined otherwise. (ii) However, there is H(t,x) such that H(t+1,x) H(t,x) and lim t H(t,x)=C(x) where H(t,x) is total recursive. Proof. (i) If such M exists, then design M as follows. M simulates M on input (x,n), for all |x|=n in parallel (one step each), and outputs the first x such that M says `yes. Choose n >> |M|. Thus we have a contradiction: C(x)n by M, but M outputs x hence |x|=n >> |M| C(x) n. (ii) TM with program for x running for t steps defines H(t,x). QED

3.4 Godels Theorem Theorem. The statement x is random (=incompressible) is undecidable for all but finitely many x. C(x) > C+O(log |x|), and output (first) random such x. Then (2) C(x) > C+O(log |x|). QED

3.5 Barzdins Lemma A characteristic sequence of set A is an infinite binary sequence χ=χ 1 χ 2 …, χ i =1 iff iεA. Theorem. (i) The characteristic sequence χ of an r.e. set A satisfies C(χ 1:n |n)log n+c A for all n. (ii) There is an r.e. set such that C(χ 1:n )log n for all n. Proof. (i) Use the number m of 1s in the prefix χ 1:n as termination condition [C(m) log n+O(1)]. (ii) By diagonalization. Let U be the universal TM. Define χ=χ 1 χ 2 …, by χ i =1 if the i-th bit output by U(i)< equals 0, otherwise χ i =0. χ defines an r.e. set. Suppose, for some n, we have C(χ 1:n )<log n. Then, there is a program p such that for all i n we have U(p,i)= χ i and |p|< log n, hence p<n. But U(p,p) not equal χ p by definition. QED

Selected Bibliography T. Jiang, J. Seiferas, and P.M.B. Vitanyi, Two heads are better than two tapes, J. Assoc. Comput. Mach., 44:2(1997), M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance, IEEE Trans. Knowledge and Data Engineering, 19:3(2007), M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 3rd Edition, R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance,, (2004) (2004) E. Keogh, S. Lonardi, and C.A. Rtanamahatana, Toward parameter-free data mining, In: Proc. 10th ACM SIGKDD Intn'l Conf. Knowledge Discovery and Data Mining, Seattle, Washington, USA, August , 2004, M. Li, J.H. Badger, X. Chen, S. Kwong, P. Kearney, and H. Zhang. An information-based sequence distance and its application to whole mitochondrial genome phylogeny, Bioinformatics, 17:2(2001), M. Li and P.M.B. Vitanyi, Reversibility and adiabatic computation: trading time and space for energy, Proc. Royal Society of London, Series A, 452(1996), M. Li and P.M.B Vitanyi. Algorithmic Complexity, pp in: International Encyclopedia of the Social \& Behavioral Sciences, N.J. Smelser and P.B. Baltes, Eds., Pergamon, Oxford, 2001/2002. M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 2nd Edition, A.Londei, V. Loreto, M.O. Belardinelli, Music style and authorship categorization by informative compressors, Proc. 5th Triannual Conference of the European Society for the Cognitive Sciences of Music (ESCOM), September 8-13, 2003, Hannover, Germany, pp S. Wehner, Analyzing network traffic and worms using compression, Manuscript, CWI, Partially available at R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance, IEEE Trans. Knowledge and Data Engineering}, 19:3(2007), M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 3rd Edition, R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance, IEEE Trans. Knowledge and Data Engineering}, 19:3(2007), M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 3rd Edition, 2008.

The End Thank You