Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam Spring, 2009.

Slides:



Advertisements
Similar presentations
Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7.
Advertisements

Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
Lecture 3 Universal TM. Code of a DTM Consider a one-tape DTM M = (Q, Σ, Γ, δ, s). It can be encoded as follows: First, encode each state, each direction,
CS 461 – Nov. 9 Chomsky hierarchy of language classes –Review –Let’s find a language outside the TM world! –Hints: languages and TM are countable, but.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 COMP 382: Reasoning about algorithms Unit 9: Undecidability [Slides adapted from Amos Israeli’s]
Ancient Paradoxes With An Impossible Resolution. Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2005 Lecture 29Apr 28, 2005Carnegie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Lecture 12: Lower bounds By "lower bounds" here we mean a lower bound on the complexity of a problem, not an algorithm. Basically we need to prove that.
Complexity 7-1 Complexity Andrei Bulatov Complexity of Problems.
Computability and Complexity 4-1 Existence of Undecidable Problems Computability and Complexity Andrei Bulatov.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Lecture 6. Prefix Complexity K, Randomness, and Induction The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive:
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture13: Mapping Reductions Prof. Amos Israeli.
Courtesy Costas Busch - RPI1 A Universal Turing Machine.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
CS 302: Discrete Math II A Review. An alphabet Σ is a finite set (e.g., Σ = {0,1}) A string over Σ is a finite-length sequence of elements of Σ For x.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
CHAPTER 4 Decidability Contents Decidable Languages
Fall 2005Costas Busch - RPI1 Recursively Enumerable and Recursive Languages.
Computability and Complexity 10-1 Computability and Complexity Andrei Bulatov Gödel’s Incompleteness Theorem.
Fall 2004COMP 3351 A Universal Turing Machine. Fall 2004COMP 3352 Turing Machines are “hardwired” they execute only one program A limitation of Turing.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 18 Instructor: Paul Beame.
Lecture 27UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 27.
1 Introduction to Computability Theory Lecture11: The Halting Problem Prof. Amos Israeli.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Theory of Computing Lecture 19 MAS 714 Hartmut Klauck.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Computability Kolmogorov-Chaitin-Solomonoff. Other topics. Homework: Prepare presentations.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
On Lossy Compression Paul Vitanyi CWI, University of Amsterdam, National ICT Australia Joint work with Kolya Vereshchagin.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 15-1 Mälardalen University 2012.
Lecture 5. The Incompressibility Method  A key problem in computer science: analyze the average case performance of a program.  Using the Incompressibility.
Lecture 18. Unsolvability Before the 1930’s, mathematics was not like today. Then people believed that “everything true must be provable”. (More formally,
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860,
Theory of Computing Lecture 21 MAS 714 Hartmut Klauck.
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.
A Universal Turing Machine
Great Theoretical Ideas in Computer Science.
1 Linear Bounded Automata LBAs. 2 Linear Bounded Automata (LBAs) are the same as Turing Machines with one difference: The input string tape space is the.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Lecture 10. Paradigm #8: Randomized Algorithms Back to the “majority problem” (finding the majority element in an array A). FIND-MAJORITY(A, n) while (true)
CSE 311 Foundations of Computing I Lecture 26 Cardinality, Countability & Computability Autumn 2011 CSE 3111.
Computation Motivating questions: What does “computation” mean? What are the similarities and differences between computation in computers and in natural.
Lecture 21 Reducibility. Many-one reducibility For two sets A c Σ* and B c Γ*, A ≤ m B if there exists a Turing-computable function f: Σ* → Γ* such that.
Bringing Together Paradox, Counting, and Computation To Make Randomness! CS Lecture 21 
Recursively Enumerable and Recursive Languages
Eric Allender Rutgers University Curiouser and Curiouser: The Link between Incompressibility and Complexity CiE Special Session, June 19, 2012.
Overview of the theory of computation Episode 3 0 Turing machines The traditional concepts of computability, decidability and recursive enumerability.
1 A Universal Turing Machine. 2 Turing Machines are “hardwired” they execute only one program A limitation of Turing Machines: Real Computers are re-programmable.
Kolmogorov Complexity
The Acceptance Problem for TMs
A Universal Turing Machine
Lecture 5. The Incompressibility Method
Lecture 6. Prefix Complexity K
Decidability and Undecidability
Proposed in Turing’s 1936 paper
CS154, Lecture 12: Time Complexity
Instructor: Aaron Roth
Presentation transcript:

Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam Spring, 2009

We live in an information society. Information science is our profession. But do you know what is “information”, mathematically, and how to use it to prove theorems? You will, by the end of the term.

Examples Average case analysis of Shellsort. Open since What is the distance between two pieces of information carrying entities? For example, distance from an internet query to an answer.

Lecture 1. History and Definition History Intuition and ideas in the past Inventors Basic mathematical theory The course. Textbook: Li-Vitanyi: An introduction to Kolmogorov complexity and its applications, preferably the third edition! Homework every week about last lecture, mid-term and final exam (or possibly individual project and presentation).

1. Intuition & history What is the information content of an individual string? 111 …. 1 (n 1’s)‏ π = … n = Champernowne’s number: … is normal in scale 10 (every block has same frequency)‏ All these numbers share one commonality: there are “small” programs to generate them. Shannon’s information theory does not help here.

1903: An interesting year This and the next two pages were stolen from Lance Fortnow

1903: An interesting year Kolmogorov Churchvon Neumann

Andrey Nikolaevich Kolmogorov ( 1903, Tambov, Russia—1987 Moscow)‏ Measure Theory Probability Analysis Intuitionistic Logic Cohomology Dynamical Systems Hydrodynamics Kolmogorov complexity

When there were no digital cameras (1987).

A story of Dr. Samuel Johnson … Dr. Beattie observed, as something remarkable which had happened to him, that he chanced to see both No.1 and No.1000 hackney-coaches. “Why sir,” said Johnson “there is an equal chance for one’s seeing those two numbers as any other two.” Boswell’s Life of Johnson

The case of cheating casino Bob proposes to flip a coin with Alice: Alice wins a dollar if Heads; Bob wins a dollar if Tails Result: TTTTTT …. 100 Tails in a roll. Alice lost $100. She feels being cheated.

Alice goes to the court Alice complains: T 100 is not random. Bob asks Alice to produce a random coin flip sequence. Alice flipped her coin and got THTTHHTHTHHHTTTTH … But Bob claims Alice’s sequence has probability , and so does his. How do we define randomness?

2. Roots of Kolmogorov complexity and preliminaries (1) Foundations of Probability P. Laplace: … a sequence is extraordinary (nonrandom) because it contains regularity (which is rare) von Mises’ notion of a random sequence S: lim n→∞ { #(1) in n-prefix of S}/n =p, 0<p<1 The above holds for any subsequence of S selected by an “admissible” selection rule. If `admissible rule’ is any partial function then there are no random sequences. A. Wald: countably many admissible selection rules. Then there are “random sequences. A. Church: recursive selection functions J. Ville: von Mises-Wald-Church random sequence does not satisfy all laws of randomness.

Roots … (2) Information Theory. Shannon theory is on an ensemble. But what is information in an individual object? (3) Inductive inference. Bayesian approach using universal prior distribution (4) Shannon’s State x Symbol (Turing machine) complexity.

Preliminaries and Notations Strings: x, y, z. Usually binary. x=x 1 x 2... an infinite binary sequence x i:j =x i x i+1 … x j |x| is number of bits in x. Textbook uses l(x). Sets, A, B, C … |A|, number of elements in set A. Textbook uses d(A). K-complexity vs C-complexity, names etc. I assume you know Turing machines, universal TM’s, basic facts...

3. Mathematical Theory Solomonoff (1960)-Kolmogorov (1965)-Chaitin (1969): The amount of information in a string is the size of the smallest program of an optimal Universal TM U generating that string. C (x) = min {|p|: U(p) = x } U p U p Invariance Theorem: It does not matter which optimal universal Turing machine U we choose. I.e. all “universal encoding methods” are ok.

Proof of the Invariance theorem Fix an effective enumeration of all Turing machines (TM’s): T 1, T 2, … Define C = min {|p|: T(p) = x} T p U is an optimal universal TM such that (p produces x) U(1 n 0p) = T n (p)‏ Then for all x: C U (x)  ≤  C Tn (x) + n+1, and |C U (x) – C U’ (x)| ≤ c. Fixing U, we write C(x) instead of C U (x). QED Formal statement of the Invariance Theorem: There exists a computable function S 0 such that for all computable functions S, there is a constant c S such that for all strings x ε {0,1} * C S0 (x) ≤ C S (x) + c S

It has many applications Mathematics --- probability theory, logic, statistics. Physics --- chaos, thermodynamics. Computer Science – average case analysis, inductive inference and learning, shared information between documents, data mining and clustering, incompressibility method -- examples: Prime number theorem Goedel’s incompleteness Shellsort average case Heapsort average case Circuit complexity Lower bounds on combinatorics, graphs,Turing machine computations, formal languages, communication complexity, routing Philosophy, biology, cognition, etc – randomness, inference, learning, complex systems, sequence similarity Information theory – information in individual objects, information distance Classifying objects: documents, genomes Query Answering systems

Mathematical Theory cont.  Intuitively: C(x)= length of shortest description of x  Define conditional Kolmogorov complexity similarly, with C(x|y)=length of shortest description of x given y. Examples C(xx) = C(x) + O(1)‏ C(xy) ≤ C(x) + C(y) + O(log(min{C(x),C(y)})‏ C(1 n ) ≤ O(logn)‏ C(π 1:n ) ≤ O(logn); C(π 1:n |n) ≤ O(1)‏ For all x, C(x) ≤ |x|+O(1)‏ C(x|x) = O(1)‏ C(x|ε) = C(x); C(ε|x)=O(1)‏

3.1 Basics Incompressibility: For constant c>0, a string x ε {0,1} * is c-incompressible if C(x) ≥ |x|-c. For constant c, we often simply say that x is incompressible. (We will call incompressible strings random strings.)‏ Lemma. There are at least 2 n – 2 n-c +1 c-incompressible strings of length n. Proof. There are only ∑ k=0,…,n-c-1 2 k = 2 n-c -1 programs with length less than n-c. Hence only that many strings (out of total 2 n strings of length n) can have shorter programs (descriptions) than n-c. QED.

Facts If x=uvw is incompressible, then C(v) ≥ |v| - O(log |x|). Proof. C(uvw) = |uvw| ≤ |uw| +C(v)+ O(log |u|) +O(log C(v)). If p is the shortest program for x, then C(p) ≥ |p| - O(1)‏ C(x|p) = O(1)‏ but C(p|x) ≤ C(|p|)+O(1) ( optimal because of the Halting Problem!)‏ If a subset A of {0,1}* is recursively enumerable (r.e.) (the elements of A can be listed by a Turing machine), and A is sparse (|A =n | ≤ p(n) for some polynomial p), then for all x in A, |x|=n, C(x) ≤ O(log p(n) ) + O(C(n)) +O(|A|)= O(log n).

3.2 Asymptotics Enumeration of binary strings: 0,1,00,01,10, mapping to natural numbers 0, 1, 2, 3, … C(x) →∞ as x →∞ Define m(x) to be the monotonic lower bound of C(x) curve (as natural number x →∞). Then m(x) →∞, as x →∞, and m(x) < Q(x) for all unbounded computable Q. Nonmonotonicity: for x=yz, it does not imply that C(y)≤C(x)+O(1).

Graph of C(x) for integer x. Function m(x) is greatest monotonic non-decreasing lower bound.

Length-conditional complexity. Let x=x_1...x_n. Self-delimiting codes are x’=1^n 0x with |x’|=2n+1, and x’’= 1^{|n|}0nx with |x’’| = n+2|n|+1 (|n|=log n). n-strings are x’s of the form x=n’0...0 with n=|x|. Note that |n’|=2 log n +1. So, for every n, C(x| n)=O(1) for all n-strings.

Graph of C(x|l(x)). Function m(x) is greatest monotonic non-decreasing lower bound.

3.3 Properties Theorem (Kolmogorov) (i) C(x) is not partially recursive. That is, there is no Turing machine M s.t. M accepts (x,k) if C(x)≥k and undefined otherwise. (ii) However, there is H(t,x) such that H(t+1,x) ≤ H(t,x) and lim t→∞ H(t,x)=C(x)‏ where H(t,x) is total recursive. Proof. (i) If such M exists, then design M’ as follows. M’ simulates M on input (x,n), for all |x|=n in “parallel” (one step each), and outputs the first x such that M says `yes.’ Choose n >> |M’|. Thus we have a contradiction: C(x)≥n by M, but M’ outputs x hence |x|=n >> |M’| ≥ C(x) ≥ n. (ii) TM with program for x running for t steps defines H(t,x). QED

3.4 Godel’s Theorem Theorem. The statement “x is random (=incompressible)” is undecidable for all but finitely many x. C(x) > C+O(log |x|), and output (first) random such x. Then (2) C(x) > C+O(log |x|). QED

3.5 Barzdin’s Lemma A characteristic sequence of set A is an infinite binary sequence χ=χ 1 χ 2 …, χ i =1 iff iεA. Theorem. (i) The characteristic sequence χ of an r.e. set A satisfies C(χ 1:n |n)≤log n+c A for all n. (ii) There is an r.e. set such that C(χ 1:n )≥log n for all n. Proof. (i) Use the number m of 1’s in the prefix χ 1:n as termination condition [C(m) ≤ log n+O(1)]. (ii) By diagonalization. Let U be the universal TM. Define χ=χ 1 χ 2 …, by χ i =1 if the i-th bit output by U(i)<∞ equals 0, otherwise χ i =0. χ defines an r.e. set. Suppose, for some n, we have C(χ 1:n )<log n. Then, there is a program p such that for all i ≤n we have U(p,i)= χ i and |p|< log n, hence p<n. But U(p,p) not equal χ p by definition. QED