Download presentation
Presentation is loading. Please wait.
Published byMavis Annabel Hart Modified over 9 years ago
1
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002
2
Content Kolmogorov complexity Universal Distribution Inductive Learning
3
Principle of Indifference (Epicurus) Keep all hypotheses that are consistent with the facts
4
Occam’s Razor Among all hypotheses consistent with the facts, choose the simplest Newton’s rule #1 for doing nature philosophy –We are to admit no more costs of nature things than such as are both true and sufficient to explain the appearances
5
Question What does “simplest” mean? How to define simplicity? Can a thing be simple under one definition and not under another?
6
Bayes’ Rule P(H|D) = P(D|H)*P(H)/P(D) -P(H) is often considered as initial degree of belief in H In essence, Bayes’ rule is a mapping from prior probability P(H) to posterior probability P(H|D) determined by D
7
How to get P(H) By the law of large numbers, we can get P(H|D) if we use many examples Give as much information about that from only a limited of number of data P(H) may be unknown, uncomputable, even may not exist Can we find a single probability distribution to use as prior distribution in each different case, with a proximately the same result as if we had used the real distribution
8
Hume on Induction Induction is impossible because we can only reach conclusion by using known data and methods. So the conclusion is logically already contained in the start configuration
9
Solomonoff’s Theory of Induction Maintain all hypotheses consistent with the data Incoporate “Occam’s Razor”-assign the simplest hypotheses with highest probability Using Bayes’ rule
10
Kolmogorov Complexity k(s) is the length of the shortest program which, on no input, prints out s k(s)<=|s| There is a string s, k(s) >=n k(s) is objective (program language independent) by Invariance Theorem
11
Universal Distribution P(s) = 2 -k(s) We use k(s) to describe the complexity of an object. By Occam’s Razor, the simplest should have the highest probability.
12
Problem: P(s)>1 For every n, there exists a n-bit string s, k(s) = log n, so P(s) = 2 -log n = 1/n ½+1/3+….>1
13
Levin’s improvement Using prefix-free program –A set of programs, no one of which is a prefix of any other Kraft’s inequality –Let L1, L2,… be a sequence of natural numbers. There is a prefix-code with this sequence as lengths of its binary code words iff n 2 -ln <=1
14
Multiplicative domination Levin proved that there exists c, c*p(s) >= p’(s) where c depends on p, but not on s If true prior distribution is computable, then use the single fixed universal distribution p is almost as good as the actually true distribution itself
15
Turing’s thesis: Universal turing machine can compute all intuitively computable functions Kolmogorov’s thesis: the Kolmogorov complexity gives the shortest description length among all description lengths that can be effectively approximated according to intuition. Levin’s thesis: The universal distribution give the largest distribution among all the distribution that can be effectively approximated according to intuition
16
Universal Bet Street gambler Bob tossing a coin and offer: –Next is head “1” – give Alice 2$ –Next is tail “0” – pay Bob 1$ Is Bob honest? –Side bet: flip coin 1000 times, record the result as a string s –Alice pay 1$, Bob pay Alice 2 1000-k(s) $
17
Good offer: – |s|=1000 2 -1000 2 1000-k(s) = |s|=1000 2 -k(s) <=1 If Bob is honest, Alice increase her money polynomially If Bob cheat, Alice increase her money exponentially
18
Notice The complexity of a string is non- computable
19
Conclusion Kolmogorov complexity – optimal effective descriptions of objects Universal Distribution – optimal effective probability of objects Both are objective and absolute
20
Reference Ming Li, Paul Vitanvi, An Introduction to Kolmogorov complexity and its applications, 2 nd Edtion Spring – Verky 1997
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.