Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898.

Slides:



Advertisements
Similar presentations
Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7.
Advertisements

Completeness and Expressiveness
Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
Lecture 3 Universal TM. Code of a DTM Consider a one-tape DTM M = (Q, Σ, Γ, δ, s). It can be encoded as follows: First, encode each state, each direction,
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 COMP 382: Reasoning about algorithms Unit 9: Undecidability [Slides adapted from Amos Israeli’s]
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Lecture 12: Lower bounds By "lower bounds" here we mean a lower bound on the complexity of a problem, not an algorithm. Basically we need to prove that.
Complexity 7-1 Complexity Andrei Bulatov Complexity of Problems.
Computability and Complexity 4-1 Existence of Undecidable Problems Computability and Complexity Andrei Bulatov.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Lecture 6. Prefix Complexity K, Randomness, and Induction The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive:
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Non Regular Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture13: Mapping Reductions Prof. Amos Israeli.
Courtesy Costas Busch - RPI1 A Universal Turing Machine.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
CS5371 Theory of Computation Lecture 5: Automata Theory III (Non-regular Language, Pumping Lemma, Regular Expression)
1 Introduction to Computability Theory Lecture4: Non Regular Languages Prof. Amos Israeli.
CHAPTER 4 Decidability Contents Decidable Languages
Computability and Complexity 10-1 Computability and Complexity Andrei Bulatov Gödel’s Incompleteness Theorem.
Fall 2004COMP 3351 A Universal Turing Machine. Fall 2004COMP 3352 Turing Machines are “hardwired” they execute only one program A limitation of Turing.
CS21 Decidability and Tractability
1 Introduction to Computability Theory Lecture11: The Halting Problem Prof. Amos Israeli.
Theory of Computing Lecture 19 MAS 714 Hartmut Klauck.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
On Lossy Compression Paul Vitanyi CWI, University of Amsterdam, National ICT Australia Joint work with Kolya Vereshchagin.
Introduction to CS Theory Lecture 15 –Turing Machines Piotr Faliszewski
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 15-1 Mälardalen University 2012.
Lecture 5. The Incompressibility Method  A key problem in computer science: analyze the average case performance of a program.  Using the Incompressibility.
Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam Spring, 2009.
Lecture 18. Unsolvability Before the 1930’s, mathematics was not like today. Then people believed that “everything true must be provable”. (More formally,
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860,
Theory of Computing Lecture 21 MAS 714 Hartmut Klauck.
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.
A Universal Turing Machine
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
Great Theoretical Ideas in Computer Science.
1 Linear Bounded Automata LBAs. 2 Linear Bounded Automata (LBAs) are the same as Turing Machines with one difference: The input string tape space is the.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
CSE 311 Foundations of Computing I Lecture 26 Cardinality, Countability & Computability Autumn 2011 CSE 3111.
Lecture 21 Reducibility. Many-one reducibility For two sets A c Σ* and B c Γ*, A ≤ m B if there exists a Turing-computable function f: Σ* → Γ* such that.
Bringing Together Paradox, Counting, and Computation To Make Randomness! CS Lecture 21 
Eric Allender Rutgers University Curiouser and Curiouser: The Link between Incompressibility and Complexity CiE Special Session, June 19, 2012.
Overview of the theory of computation Episode 3 0 Turing machines The traditional concepts of computability, decidability and recursive enumerability.
Lecture 3. Symmetry of Information In Shannon information theory, the symmetry of information is well known. The same phenomenon is also in Kolmogorov.
1 Design and Analysis of Algorithms Yoram Moses Lecture 13 June 17, 2010
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 A Universal Turing Machine. 2 Turing Machines are “hardwired” they execute only one program A limitation of Turing Machines: Real Computers are re-programmable.
Kolmogorov Complexity
The Acceptance Problem for TMs
A Universal Turing Machine
Lecture 5. The Incompressibility Method
Homework: Friday Read Section 4.1. In particular, you must understand the proofs of Theorems 4.1, 4.2, 4.3, and 4.4, so you can do this homework. Exercises.
Lecture 6. Prefix Complexity K
Proposed in Turing’s 1936 paper
CS154, Lecture 12: Time Complexity
CS21 Decidability and Tractability
CS21 Decidability and Tractability
Presentation transcript:

Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898 Spring, 2014

We live in an information society. Information science is our profession. But do you know what is “information”, mathematically, and how to use it to prove theorems or do your research? You will, by the end of the term.

Examples Average case analysis of algorithms (Shellsort). Lovasz Local Lemma What is the distance between two pieces of information carrying entities? For example, a theory for big data? Their semantics?

Course outline Kolmogorov complexity (1/4) Its applications (3/4) Is this course a theory course? I wish to focus on applications. Really we are more interested in many different things such as data mining and natural language processing. The course: Three homework assignments (20% each). One project, presentation (35%) Class participation (5%)

Lecture 1. History and Definitions History Intuition and ideas Inventors Basic mathematical theory Textbook: Li-Vitanyi: An introduction to Kolmogorov complexity and its applications. You may use any edition (1 st, 2 nd, 3 rd ) except that the page numbers are from the 2 nd edition.

1. Intuition & history What is the information content of an individual string? 111 …. 1 (n 1’s) π = … n = Champernowne’s number: … is normal in scale 10 (every block of size k has same frequency) All these numbers share one commonality: there are “small” programs to generate them. Shannon’s information theory does not help here. Youtube video:

1903: An interesting year This and the next two pages were taken from Lance Fortnow

1903: An interesting year Kolmogorov Churchvon Neumann

Andrey Nikolaevich Kolmogorov ( , Tambov, Russia) Measure Theory Probability Analysis Intuitionistic Logic Cohomology Dynamical Systems Hydrodynamics Kolmogorov complexity

Ray Solomonoff:

When there were no digital cameras (1987).

A case of Dr. Samuel Johnson ( ) … Dr. Beattie observed, as something remarkable which had happened to him, that he chanced to see both No.1 and No.1000 hackney-coaches. “Why sir,” said Johnson “there is an equal chance for one’s seeing those two numbers as any other two.” Boswell’s Life of Johnson

The case of cheating casino Bob proposes to flip a coin with Alice: Alice wins a dollar if Heads; Bob wins a dollar if Tails Result: TTTTTT …. 100 Tails in a roll. Alice lost $100. She feels being cheated.

Alice goes to the court Alice complains: T 100 is not random. Bob asks Alice to produce a random coin flip sequence. Alice flipped her coin 100 times and got THTTHHTHTHHHTTTTH … But Bob claims Alice’s sequence has probability , and so does his. How do we define randomness?

2. Roots of Kolmogorov complexity and preliminaries (1) Foundations of Probability P. Laplace: … a sequence is extraordinary (nonrandom) because it contains rare regularity von Mises’ notion of a random sequence S: lim n→∞ { #(1) in n-prefix of S}/n =p, 0<p<1 The above holds for any subsequence of S selected by an “admissible” function. But if you take any partial function, then there is no random sequence a la von Mises. A. Wald: countably many. Then there are “random sequences. A. Church: recursive selection functions J. Ville: von Mises-Wald-Church random sequence does not satisfy all laws of randomness. Laplace,

Roots … (2) Information Theory. Shannon-Weaver theory is on an ensemble. But what is information in an individual object? (3) Inductive inference. Bayesian approach using universal prior distribution (4) Shannon’s State x Symbol (Turing machine) complexity.

Preliminaries and Notations Strings: x, y, z. Usually binary. x=x 1 x 2... an infinite or finite binary sequence x i:j =x i x i+1 … x j |x| is number of bits in x. Textbook uses l(x). Sets, A, B, C … |A|, number of elements in set A. Textbook uses d(A). K-complexity vs C-complexity, names etc. I assume you know Turing machines, recursive functions, universal TM’s, i.e. basic facts from CS360.

3. Mathematical Theory Definition (Kolmogorov complexity) Solomonoff (1960)-Kolmogorov (1963)-Chaitin (1965) : The Kolmogorov complexity of a binary string x with respect to a universal Turing machine U is Invariance Theorem: It does not matter which universal Turing machine U we choose. I.e. all “encoding methods” are ok. cUcU

Proof of the Invariance theorem Fix an effective enumeration of all Turing machines (TM’s): T 1, T 2, … Let U be a universal TM such that: U(0 n 1p) = T n (p) Then for all x: C U (x)    C Tn (x) + O(1) --- O(1) depends on n, but not x. QED Fixing U, we write C(x) instead of C U (x). Formal statement of the Invariance Theorem: There exists a computable function S 0 such that for all computable functions S, there is a constant c S such that for all strings x ε {0,1} * C S0 (x) ≤ C S (x) + c S

It has many applications Mathematics --- probability theory, logic. Physics --- chaos, thermodynamics. Computer Science – average case analysis, inductive inference and learning, shared information between documents, data mining and clustering, incompressibility method -- examples: Shellsort average case Heapsort average case Circuit complexity Lower bounds on Turing machines, formal languages Combinatorics: Lovazs local lemma and related proofs. Philosophy, biology etc – randomness, inference, complex systems, sequence similarity Information theory – information in individual objects, information distance Classifying objects: documents, genomes Approximating semantics

Mathematical Theory cont.  Intuitively: C(x)= length of shortest description of x  Define conditional Kolmogorov complexity similarly, C(x|y)=length of shortest description of x given y. Examples C(xx) = C(x) + O(1) C(xy) ≤ C(x) + C(y) + O(log(min{C(x),C(y)}) C(1 n ) ≤ O(logn) C(π 1:n ) ≤ O(logn) For all x, C(x) ≤ |x|+O(1) C(x|x) = O(1) C(x|ε) = C(x)

3.1 Basics Incompressibility: For constant c>0, a string x ε {0,1} * is c-incompressible if C(x) ≥ |x|-c. For constant c, we often simply say that x is incompressible. (We will call incompressible strings random strings.) Lemma. There are at least 2 n – 2 n-c +1 c-incompressible strings of length n. Proof. There are only ∑ k=0,…,n-c-1 2 k = 2 n-c -1 programs with length less than n-c. Hence only that many strings (out of total 2 n strings of length n) can have shorter programs (descriptions) than n-c. QED.

Facts If x=uvw is incompressible, then C(v) ≥ |v| - O(log |x|). If p is the shortest program for x, then C(p) ≥ |p| - O(1), and C(x|p) = O(1) If a subset of {0,1}* A is recursively enumerable (r.e.) (the elements of A can be listed by a Turing machine), and A is sparse (|A =n | ≤ p(n) for some polynomial p), then for all x in A, |x|=n, C(x) ≤ O(log p(n) ) + O(C(n)) = O(logn).

3.2 Asymptotics Enumeration of binary strings: 0,1,00,01,10, mapping to natural numbers 0, 1, 2, 3, … C(x) →∞ as x →∞ Define m(x) to be the monotonic lower bound of C(x) curve (as natural number x →∞). Then m(x) →∞, as x →∞ m(x) < Q(x) for all unbounded computable Q. Nonmonotonicity: for x=yz, it does not imply that C(y)≤C(x)+O(1).

m(x) graph

3.3 Properties Theorem (Kolmogorov) C(x) is not partially recursive. That is, there is no Turing machine M s.t. M accepts (x,k) if C(x)≥k and undefined otherwise. However, there is H(t,x) such that lim t→∞ H(t,x)=C(x) where H(t,x), for each fixed t, is total recursive. Proof. If such M exists, then design M’ as follows. Choose n >> |M’|. M’ simulates M on input (x,n), for all |x|=n in “parallel” (one step each), and outputs the first x such that M says yes. Thus we have a contradiction: C(x)≥n by M, but |M’| outputs x hence |x|=n >> |M’| ≥ C(x) ≥ n. QED

3.4 Godel’s Theorem Theorem. The statement “x is random” is not provable. Proof (G. Chaitin). Let F be an axiomatic theory. C(F)= C. If the theorem is false and statement “x is random” is provable in F, then we can enumerate all proofs in F to find an |x| >> C and a proof of “x is random”, we output (first) such x. We have only used C+O(1) bits in this proof to generate x, thus: C(x) < C C(x) < C +O(1). But the proof for “x is random” implies by definition: C(x) ≥ |x| >> C. C(x) ≥ |x| >> C. Contradiction. QED

3.5 Barzdin’s Lemma A characteristic sequence of set A is an infinite binary sequence χ=χ 1 χ 2 …, such that χ i =1 iff i ε A. Theorem. (i) The characteristic sequence χ of an r.e. set A satisfies C(χ 1:n |n) ≤ log n+c A for all n. (ii) (ii) There is an r.e. set, C(χ 1:n |n) ≥ log n for all n. Proof. (i) Using number of 1’s in the prefix χ 1:n as termination condition (hence log n) (ii) By diagonalization. Let U be the universal TM. Define χ=χ 1 χ 2 …, by χ i =0 if U(i-th program on input i)=1, otherwise χ i =1. χ defines an r.e. set. Thus, for each n, we have C(χ 1:n |n) ≥ log n since the first n programs (i.e. any program of length < log n) are all different from χ 1:n by definition. QED