Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860,

Slides:



Advertisements
Similar presentations
Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7.
Advertisements

Completeness and Expressiveness
Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
Lecture 3 Universal TM. Code of a DTM Consider a one-tape DTM M = (Q, Σ, Γ, δ, s). It can be encoded as follows: First, encode each state, each direction,
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 COMP 382: Reasoning about algorithms Unit 9: Undecidability [Slides adapted from Amos Israeli’s]
Complexity 7-1 Complexity Andrei Bulatov Complexity of Problems.
Computability and Complexity 4-1 Existence of Undecidable Problems Computability and Complexity Andrei Bulatov.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Lecture 6. Prefix Complexity K, Randomness, and Induction The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive:
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Non Regular Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture13: Mapping Reductions Prof. Amos Israeli.
Courtesy Costas Busch - RPI1 A Universal Turing Machine.
CS 302: Discrete Math II A Review. An alphabet Σ is a finite set (e.g., Σ = {0,1}) A string over Σ is a finite-length sequence of elements of Σ For x.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Submitted by : Estrella Eisenberg Yair Kaufman Ohad Lipsky Riva Gonen Shalom.
1 Introduction to Computability Theory Lecture4: Non Regular Languages Prof. Amos Israeli.
CHAPTER 4 Decidability Contents Decidable Languages
Derandomizing LOGSPACE Based on a paper by Russell Impagliazo, Noam Nissan and Avi Wigderson Presented by Amir Rosenfeld.
Fall 2005Costas Busch - RPI1 Recursively Enumerable and Recursive Languages.
Computability and Complexity 10-1 Computability and Complexity Andrei Bulatov Gödel’s Incompleteness Theorem.
Fall 2004COMP 3351 A Universal Turing Machine. Fall 2004COMP 3352 Turing Machines are “hardwired” they execute only one program A limitation of Turing.
Lecture 27UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 27.
CS21 Decidability and Tractability
1 Introduction to Computability Theory Lecture11: The Halting Problem Prof. Amos Israeli.
Theory of Computing Lecture 19 MAS 714 Hartmut Klauck.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Computability Kolmogorov-Chaitin-Solomonoff. Other topics. Homework: Prepare presentations.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Introduction to CS Theory Lecture 15 –Turing Machines Piotr Faliszewski
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 15-1 Mälardalen University 2012.
Lecture 5. The Incompressibility Method  A key problem in computer science: analyze the average case performance of a program.  Using the Incompressibility.
Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam Spring, 2009.
Lecture 18. Unsolvability Before the 1930’s, mathematics was not like today. Then people believed that “everything true must be provable”. (More formally,
Theory of Computing Lecture 21 MAS 714 Hartmut Klauck.
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.
A Universal Turing Machine
Great Theoretical Ideas in Computer Science.
1 Linear Bounded Automata LBAs. 2 Linear Bounded Automata (LBAs) are the same as Turing Machines with one difference: The input string tape space is the.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Lecture 21 Reducibility. Many-one reducibility For two sets A c Σ* and B c Γ*, A ≤ m B if there exists a Turing-computable function f: Σ* → Γ* such that.
Bringing Together Paradox, Counting, and Computation To Make Randomness! CS Lecture 21 
Recursively Enumerable and Recursive Languages
Eric Allender Rutgers University Curiouser and Curiouser: The Link between Incompressibility and Complexity CiE Special Session, June 19, 2012.
Overview of the theory of computation Episode 3 0 Turing machines The traditional concepts of computability, decidability and recursive enumerability.
Lecture 3. Symmetry of Information In Shannon information theory, the symmetry of information is well known. The same phenomenon is also in Kolmogorov.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 A Universal Turing Machine. 2 Turing Machines are “hardwired” they execute only one program A limitation of Turing Machines: Real Computers are re-programmable.
Kolmogorov Complexity
A Universal Turing Machine
Lecture 5. The Incompressibility Method
Great Theoretical Ideas in Computer Science
Homework: Friday Read Section 4.1. In particular, you must understand the proofs of Theorems 4.1, 4.2, 4.3, and 4.4, so you can do this homework. Exercises.
Lecture 6. Prefix Complexity K
The Limitations of Proofs: An Incompressible Resolution
Proposed in Turing’s 1936 paper
CS154, Lecture 12: Time Complexity
CS21 Decidability and Tractability
DSPACE Slides By: Alexander Eskin, Ilya Berdichevsky
Presentation transcript:

Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860, Winter, 2010

We live in an information society. Information science is our profession. But do you know what is “information”, mathematically, and how to use it to prove theorems?

Examples Average case analysis of Shellsort. Lovasz Local Lemma What is the distance between two pieces of information carrying entities? For example, distance from an internet query to an answer.

Lecture 1. History and Definitions History Intuition and ideas in the past Inventors Basic mathematical theory Textbook: Li-Vitanyi: An introduction to Kolmogorov complexity and its applications. You may use any edition (1 st, 2 nd, 3 rd ) except that the page numbers are from the 2 nd edition.

1. Intuition & history What is the information content of an individual string? 111 …. 1 (n 1’s) π = … n = Champernowne’s number: … is normal in scale 10 (every block has same frequency) All these numbers share one commonality: there are “small” programs to generate them. Shannon’s information theory does not help here. Popular youtube explanation:

1903: An interesting year This and the next two pages were taken from Lance Fortnow

1903: An interesting year Kolmogorov Churchvon Neumann

Andrey Nikolaevich Kolmogorov ( , Tambov, Russia) Measure Theory Probability Analysis Intuitionistic Logic Cohomology Dynamical Systems Hydrodynamics Kolmogorov complexity

Ray Solomonoff:

When there were no digital cameras (1987).

A case of Dr. Samuel Johnson ( ) … Dr. Beattie observed, as something remarkable which had happened to him, that he chanced to see both No.1 and No.1000 hackney-coaches. “Why sir,” said Johnson “there is an equal chance for one’s seeing those two numbers as any other two.” Boswell’s Life of Johnson

The case of cheating casino Bob proposes to flip a coin with Alice: Alice wins a dollar if Heads; Bob wins a dollar if Tails Result: TTTTTT …. 100 Tails in a roll. Alice lost $100. She feels being cheated.

Alice goes to the court Alice complains: T 100 is not random. Bob asks Alice to produce a random coin flip sequence. Alice flipped her coin 100 times and got THTTHHTHTHHHTTTTH … But Bob claims Alice’s sequence has probability , and so does his. How do we define randomness?

2. Roots of Kolmogorov complexity and preliminaries (1) Foundations of Probability P. Laplace: … a sequence is extraordinary (nonrandom) because it contains rare regularity von Mises’ notion of a random sequence S: lim n→∞ { #(1) in n-prefix of S}/n =p, 0<p<1 The above holds for any subsequence of S selected by an “admissible” function. But if you take any partial function, then there is no random sequence a la von Mises. A. Wald: countably many. Then there are “random sequences. A. Church: recursive selection functions J. Ville: von Mises-Wald-Church random sequence does not satisfy all laws of randomness. Laplace,

Roots … (2) Information Theory. Shannon-Weaver theory is on an ensemble. But what is information in an individual object? (3) Inductive inference. Bayesian approach using universal prior distribution (4) Shannon’s State x Symbol (Turing machine) complexity.

Preliminaries and Notations Strings: x, y, z. Usually binary. x=x 1 x 2... an infinite binary sequence x i:j =x i x i+1 … x j |x| is number of bits in x. Textbook uses l(x). Sets, A, B, C … |A|, number of elements in set A. Textbook uses d(A). K-complexity vs C-complexity, names etc. I assume you know Turing machines, universal TM’s, basic facts from CS360.

3. Mathematical Theory Solomonoff (1960)-Kolmogorov (1963)-Chaitin (1965): The amount of information in a string is the size of the smallest program generating that string. Invariance Theorem: It does not matter which universal Turing machine U we choose. I.e. all “encoding methods” are ok. cucu

Proof of the Invariance theorem Fix an effective enumeration of all Turing machines (TM’s): T 1, T 2, … Let U be a universal TM such that (p produces x) U(0 n 1p) = T n (p) Then for all x: C U (x)    C Tn (x) + O(1) --- O(1) depends on n, but not x. Fixing U, we write C(x) instead of C U (x). QED Formal statement of the Invariance Theorem: There exists a computable function S 0 such that for all computable functions S, there is a constant c S such that for all strings x ε {0,1} * C S0 (x) ≤ C S (x) + c S

It has many applications Mathematics --- probability theory, logic. Physics --- chaos, thermodynamics. Computer Science – average case analysis, inductive inference and learning, shared information between documents, data mining and clustering, incompressibility method -- examples: Shellsort average case Heapsort average case Circuit complexity Lower bounds on Turing machines, formal languages Combinatorics: Lovazs local lemma and related proofs. Philosophy, biology etc – randomness, inference, complex systems, sequence similarity Information theory – information in individual objects, information distance Classifying objects: documents, genomes Query Answering systems

Mathematical Theory cont.  Intuitively: C(x)= length of shortest description of x  Define conditional Kolmogorov complexity similarly, C(x|y)=length of shortest description of x given y. Examples C(xx) = C(x) + O(1) C(xy) ≤ C(x) + C(y) + O(log(min{C(x),C(y)}) C(1 n ) ≤ O(logn) C(π 1:n ) ≤ O(logn) For all x, C(x) ≤ |x|+O(1) C(x|x) = O(1) C(x|ε) = C(x)

3.1 Basics Incompressibility: For constant c>0, a string x ε {0,1} * is c-incompressible if C(x) ≥ |x|-c. For constant c, we often simply say that x is incompressible. (We will call incompressible strings random strings.) Lemma. There are at least 2 n – 2 n-c +1 c-incompressible strings of length n. Proof. There are only ∑ k=0,…,n-c-1 2 k = 2 n-c -1 programs with length less than n-c. Hence only that many strings (out of total 2 n strings of length n) can have shorter programs (descriptions) than n-c. QED.

Facts If x=uvw is incompressible, then C(v) ≥ |v| - O(log |x|). If p is the shortest program for x, then C(p) ≥ |p| - O(1) C(x|p) = O(1) If a subset of {0,1}* A is recursively enumerable (r.e.) (the elements of A can be listed by a Turing machine), and A is sparse (|A =n | ≤ p(n) for some polynomial p), then for all x in A, |x|=n, C(x) ≤ O(log p(n) ) + O(C(n)) = O(logn).

3.2 Asymptotics Enumeration of binary strings: 0,1,00,01,10, mapping to natural numbers 0, 1, 2, 3, … C(x) →∞ as x →∞ Define m(x) to be the monotonic lower bound of C(x) curve (as natural number x →∞). Then m(x) →∞, as x →∞ m(x) < Q(x) for all unbounded computable Q. Nonmonotonicity: for x=yz, it does not imply that C(y)≤C(x)+O(1).

m(x) graph

3.3 Properties Theorem (Kolmogorov) C(x) is not partially recursive. That is, there is no Turing machine M s.t. M accepts (x,k) if C(x)≥k and undefined otherwise. However, there is H(t,x) such that lim t→∞ H(t,x)=C(x) where H(t,x), for each fixed t, is total recursive. Proof. If such M exists, then design M’ as follows. Choose n >> |M’|. M’ simulates M on input (x,n), for all |x|=n in “parallel” (one step each), and outputs the first x such that M says yes. Thus we have a contradiction: C(x)≥n by M, but |M’| outputs x hence |x|=n >> |M’| ≥ C(x) ≥ n. QED

3.4 Godel’s Theorem Theorem. The statement “x is random” is not provable. C(x) > C. Proof (G. Chaitin). Let F be an axiomatic theory. C(F)= C. If the theorem is false and statement “x is random” is provable in F, then we can enumerate all proofs in F to find a proof of “x is random” and |x| >> C, output (first) such x. Then C(x) > C. Contradiction. QED

3.5 Barzdin’s Lemma A characteristic sequence of set A is an infinite binary sequence χ=χ 1 χ 2 …, where χ i =1 iff iεA. Theorem. (i) The characteristic sequence χ of an r.e. set A satisfies C(χ 1:n |n)≤logn+c A for all n. (ii) There is an r.e. set, C(χ 1:n |n)≥logn for all n. Proof. (i) Using number 1’s in the prefix χ 1:n as termination condition (hence logn) (ii) By diagonalization. Let U be the universal TM. Define χ=χ 1 χ 2 …, by χ i =1 if U(i-th program, i)=0, otherwise χ i =0. χ defines an r.e. set. And, for each n, we have C(χ 1:n |n)≥logn since the first n programs (i.e. any program of length < logn) are all different from χ 1:n by definition. QED