Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002
Content Kolmogorov complexity Universal Distribution Inductive Learning
Principle of Indifference (Epicurus) Keep all hypotheses that are consistent with the facts
Occam’s Razor Among all hypotheses consistent with the facts, choose the simplest Newton’s rule #1 for doing nature philosophy –We are to admit no more costs of nature things than such as are both true and sufficient to explain the appearances
Question What does “simplest” mean? How to define simplicity? Can a thing be simple under one definition and not under another?
Bayes’ Rule P(H|D) = P(D|H)*P(H)/P(D) -P(H) is often considered as initial degree of belief in H In essence, Bayes’ rule is a mapping from prior probability P(H) to posterior probability P(H|D) determined by D
How to get P(H) By the law of large numbers, we can get P(H|D) if we use many examples Give as much information about that from only a limited of number of data P(H) may be unknown, uncomputable, even may not exist Can we find a single probability distribution to use as prior distribution in each different case, with a proximately the same result as if we had used the real distribution
Hume on Induction Induction is impossible because we can only reach conclusion by using known data and methods. So the conclusion is logically already contained in the start configuration
Solomonoff’s Theory of Induction Maintain all hypotheses consistent with the data Incoporate “Occam’s Razor”-assign the simplest hypotheses with highest probability Using Bayes’ rule
Kolmogorov Complexity k(s) is the length of the shortest program which, on no input, prints out s k(s)<=|s| There is a string s, k(s) >=n k(s) is objective (program language independent) by Invariance Theorem
Universal Distribution P(s) = 2 -k(s) We use k(s) to describe the complexity of an object. By Occam’s Razor, the simplest should have the highest probability.
Problem: P(s)>1 For every n, there exists a n-bit string s, k(s) = log n, so P(s) = 2 -log n = 1/n ½+1/3+….>1
Levin’s improvement Using prefix-free program –A set of programs, no one of which is a prefix of any other Kraft’s inequality –Let L1, L2,… be a sequence of natural numbers. There is a prefix-code with this sequence as lengths of its binary code words iff n 2 -ln <=1
Multiplicative domination Levin proved that there exists c, c*p(s) >= p’(s) where c depends on p, but not on s If true prior distribution is computable, then use the single fixed universal distribution p is almost as good as the actually true distribution itself
Turing’s thesis: Universal turing machine can compute all intuitively computable functions Kolmogorov’s thesis: the Kolmogorov complexity gives the shortest description length among all description lengths that can be effectively approximated according to intuition. Levin’s thesis: The universal distribution give the largest distribution among all the distribution that can be effectively approximated according to intuition
Universal Bet Street gambler Bob tossing a coin and offer: –Next is head “1” – give Alice 2$ –Next is tail “0” – pay Bob 1$ Is Bob honest? –Side bet: flip coin 1000 times, record the result as a string s –Alice pay 1$, Bob pay Alice k(s) $
Good offer: – |s|= k(s) = |s|= k(s) <=1 If Bob is honest, Alice increase her money polynomially If Bob cheat, Alice increase her money exponentially
Notice The complexity of a string is non- computable
Conclusion Kolmogorov complexity – optimal effective descriptions of objects Universal Distribution – optimal effective probability of objects Both are objective and absolute
Reference Ming Li, Paul Vitanvi, An Introduction to Kolmogorov complexity and its applications, 2 nd Edtion Spring – Verky 1997