Lecture 6. Prefix Complexity K

Slides:



Advertisements
Similar presentations
Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
Advertisements

Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Ancient Paradoxes With An Impossible Resolution. Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2005 Lecture 29Apr 28, 2005Carnegie.
Lecture 12: Lower bounds By "lower bounds" here we mean a lower bound on the complexity of a problem, not an algorithm. Basically we need to prove that.
Data Compression.
Entropy Rates of a Stochastic Process
Complexity 7-1 Complexity Andrei Bulatov Complexity of Problems.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Lecture 6. Prefix Complexity K, Randomness, and Induction The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive:
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Randomized Algorithms and Randomized Rounding Lecture 21: April 13 G n 2 leaves
1 Introduction to Computability Theory Lecture11: The Halting Problem Prof. Amos Israeli.
Information Theory and Security
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
§1 Entropy and mutual information
Chapter 5 Algorithm Analysis 1CSCI 3333 Data Structures.
Lecture 2 We have given O(n 3 ), O(n 2 ), O(nlogn) algorithms for the max sub-range problem. This time, a linear time algorithm! The idea is as follows:
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Computability Kolmogorov-Chaitin-Solomonoff. Other topics. Homework: Prepare presentations.
Lecture 2. Randomness Goal of this lecture: We wish to associate incompressibility with randomness. But we must justify this. We all have our own “standards”
Channel Capacity.
Lecture 5. The Incompressibility Method  A key problem in computer science: analyze the average case performance of a program.  Using the Incompressibility.
Lecture 5. The Incompressibility Method  A key problem in computer science: analyze the average case performance of a program.  Using the Incompressibility.
Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam Spring, 2009.
Lecture 18. Unsolvability Before the 1930’s, mathematics was not like today. Then people believed that “everything true must be provable”. (More formally,
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860,
Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.
Lecture 10. Paradigm #8: Randomized Algorithms Back to the “majority problem” (finding the majority element in an array A). FIND-MAJORITY(A, n) while (true)
Lecture 21 Reducibility. Many-one reducibility For two sets A c Σ* and B c Γ*, A ≤ m B if there exists a Turing-computable function f: Σ* → Γ* such that.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Bringing Together Paradox, Counting, and Computation To Make Randomness! CS Lecture 21 
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
Lecture 3. Symmetry of Information In Shannon information theory, the symmetry of information is well known. The same phenomenon is also in Kolmogorov.
Lecture 3. Symmetry of Information In Shannon information theory, the symmetry of information is well known. The same phenomenon is also in Kolmogorov.
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Turing Machines Sections 17.6 – The Universal Turing Machine Problem: All our machines so far are hardwired. ENIAC
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
1 Design and Analysis of Algorithms Yoram Moses Lecture 13 June 17, 2010
Kolmogorov Complexity
Data Structures and Algorithm Analysis Lecture 24
The Acceptance Problem for TMs
Chapter 6: Discrete Probability
Markov Chains and Random Walks
Information Complexity Lower Bounds
CS104:Discrete Structures
Lecture12 The Halting Problem
Lecture 5. The Incompressibility Method
Approximate Inference
CS 154, Lecture 6: Communication Complexity
CSCE 411 Design and Analysis of Algorithms
Decidability and Undecidability
Lecture 9. Resource bounded KC
CS154, Lecture 12: Time Complexity
Theorem 9.3 A problem Π is in P if and only if there exist a polynomial p( ) and a constant c such that ICp (x|Π) ≤ c holds for all instances x of Π.
DSPACE Slides By: Alexander Eskin, Ilya Berdichevsky
Presentation transcript:

Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only modulo a logn factor. There exists x,y s.t. C(x,y)>C(x)+C(y)+logn –c. (This is because there are (n+1)2n pairs of x,y s.t. |x|+|y|=n. Some pair in this set has complexity n+logn. Nonmonotonicity over prefixes Problems when defining random infinite sequences in connection with Martin-Lof theory where we wish to identify infinite random sequences with those whose finite initial segments are all incompressible, Lecture 2 Problem with Solomonoff’s initial university distribution P(x) = 2-C(x) but P(x)=∞.

In order to fix the problems … Let x=x0x1 … xn , then x =x00x10x20 … xn1 and x’=|x| x Thus, x’ is a prefix code such that |x’| ≤ |x|+2 log|x| x’ is a self-delimiting version of x. Let reference TM’s have only binary alphabet {0,1}, no blank B. The programs p should from an effective prefix code: p,p’ [ p is not prefix of p’] Resulting self-delimiting Kolmogorov complexity (Levin, 1974, Chaitin 1975). We use K for prefix Kolmogorov complexity to distinguish from C, the plain Kolmogorov complexity.

Properties By Kraft’s inequality (proof – look at the binary tree): x * 2-K(x) ≤ 1 Naturally subadditive Monotonic over prefixes C(x) ≤ K(x) ≤ C(x)+log C(x) K(x|n) ≤ K(x) + O(1) ≤ C(x|n) + K(n)+O(1) ≤ C(x|n)+log*n+logn+loglogn+…+O(1) An infinite binary sequence ω is (Martin-Lof) random iff there is a constant c s.t. for all n, K(ω1:n)≥n-c. (Note, please compare with Lecture 2, C-measure)

Alice’s revenge Remember Bob at a cheating casino flipped 100 heads in a row. Now Alice can have a winning strategy. She proposes the following: She pays $1 to Bob. She receives 2100-K(x) in return, for flip sequence x of length 100. Note that this is a fair proposal as |x|=100 2-100 2100-K(x) < $1 But if Bob cheats with 1100, then Alice gets 2100-log100

Chaitin’s mystery number Ω Define Ω = ∑P halts on ε2-|P| <1 by Kraft’s inequality. Theorem 1. Let Xi=1 iff the ith program halts. Then Ω1:n encodes X1:2^n. I.e., from Ω1:n we can compute X1:2^n Proof. (1) Ω1:n < Ω < Ω1:n+2-n. (2) Dovetailing simulate all programs till Ω’> Ω1:n. Then if P, |P|≤n, has not halted yet, it will not (since otherwise Ω > Ω’+2-n> Ω). QED Bennett: Ω1:10,000 yields all interesting mathematics. Theorem 2. K(Ω1:n) ≥n – c. Remark. Ω is a particular random sequence! Proof. By Theorem 1, given Ω1:n we can obtain all halting programs of length ≤ n. For any x that is not an output of these programs, we have K(x)>n. Since from Ω1:n we can obtain such x, it must be the case that K(Ω1:n) ≥n – c. QED

Universal distribution A (discrete) semi-measure is a function P that satisfies xNP(x)≤1. A enumerable semi-measure P0 is universal (maximal) if P0 if for all enumerable semi-measure P, there is a constant cp, s.t. for all xN, cPP0(x)≥P(x). We say that P0 dominates each P. Theorem. There is a universal enumerable semi-measure, m. Theorem. –log m(x) = K(x) + O(1) ---- Proof omitted. Remark. This universal distribution m is one of the foremost notions in KC theory. As prior probability in a Bayes rule, it maximizes ignorance by assigning maximal probability to all objects (as it dominates other distributions up to a multiplicative constant).

Average-case complexity under m Theorem [Li-Vitanyi]. If the input to an algorithm A is distributed according to m, then the average-case time complexity of A equals to A’s worst-case time complexity. Proof. Let T(n) be the worst-case time complexity. Define P(x) as follows: an=|x|=nm(x) If |x|=n, and x is the first s.t. t(x)=T(n), then P(x):=an else P(x):=0. Thus, P(x) is enumerable, hence cPm(x)≥P(x). Then the average time complexity of A under m(x) is: T(n|m) = |x|=nm(x)t(x) / |x|=nm(x) ≥ 1/cP |x|=n P(x)T(n) / |x|=nm(x) = 1/cP |x|=n [P(x)/|x|=nP(x)] T(n) = 1/cPT(n). QED Intuition: The x with worst time has low KC, hence large m(x) Example: Quicksort. With easy inputs, more likely incur worst case.

Inductive Inference Solomonoff (1960, 1964): given a sequence of observations: S=010011100010101110 .. Question: predict next bit of S. Using Bayesian rule: P(S1|S)=P(S1)P(S|S1) / P(S) =P(S1) / [P(S0)+P(S1)] here P(S1) is the prior probability, and we know P(S|S1)=P(S|S1)=1. Choose universal prior probability: P(S) = 2-K(S) , denoted as m(x) Continue in Lecture 7 on this topic.