Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.

Slides:



Advertisements
Similar presentations
How Much Information Is In Entangled Quantum States? Scott Aaronson MIT |
Advertisements

The Learnability of Quantum States Scott Aaronson University of Waterloo.
Limitations of Quantum Advice and One-Way Communication Scott Aaronson UC Berkeley IAS Useful?
How Much Information Is In A Quantum State? Scott Aaronson MIT |
Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
A Syntactic Justification of Occam’s Razor 1 John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China.
Decision Making Under Uncertainty CSE 495 Resources: –Russell and Norwick’s book.
Chapter 6 The Structural Risk Minimization Principle Junping Zhang Intelligent Information Processing Laboratory, Fudan University.
Week 7 - Wednesday.  What did we talk about last time?  Set proofs and disproofs  Russell’s paradox.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Announcements CS Ice Cream Social 9/5 3:30-4:30, ECCR 265 includes poster session, student group presentations.
Complexity 7-1 Complexity Andrei Bulatov Complexity of Problems.
Lecture 6. Prefix Complexity K, Randomness, and Induction The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive:
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
Courtesy Costas Busch - RPI1 A Universal Turing Machine.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Derandomizing LOGSPACE Based on a paper by Russell Impagliazo, Noam Nissan and Avi Wigderson Presented by Amir Rosenfeld.
Fall 2004COMP 3351 A Universal Turing Machine. Fall 2004COMP 3352 Turing Machines are “hardwired” they execute only one program A limitation of Turing.
Bayesian Learning Rong Jin.
Thanks to Nir Friedman, HU
1 Introduction to Computability Theory Lecture11: The Halting Problem Prof. Amos Israeli.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
1 Turing Machines. 2 A Turing Machine Tape Read-Write head Control Unit.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
Lecture 7: Inductive Inference MDL and PAC Learning.
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898.
Approximation Algorithms Pages ADVANCED TOPICS IN COMPLEXITY THEORY.
New Bulgarian University 9th International Summer School in Cognitive Science Simplicity as a Fundamental Cognitive Principle Nick Chater Institute for.
Computability Kolmogorov-Chaitin-Solomonoff. Other topics. Homework: Prepare presentations.
CSE 446: Point Estimation Winter 2012 Dan Weld Slides adapted from Carlos Guestrin (& Luke Zettlemoyer)
1 New Coins from old: Computing with unknown bias Elchanan Mossel, U.C. Berkeley
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860,
Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon.
Uncertainty in Expert Systems
Information Theory Linear Block Codes Jalal Al Roumy.
A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
1 Linear Bounded Automata LBAs. 2 Linear Bounded Automata (LBAs) are the same as Turing Machines with one difference: The input string tape space is the.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
Bringing Together Paradox, Counting, and Computation To Make Randomness! CS Lecture 21 
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
3/7/20161 Now it’s time to look at… Discrete Probability.
Week 7 - Wednesday.  What did we talk about last time?  Proving the subset relationship  Proving set equality  Set counterexamples  Laws of set algebra.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
1 A Universal Turing Machine. 2 Turing Machines are “hardwired” they execute only one program A limitation of Turing Machines: Real Computers are re-programmable.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
P-value Calculating Problem Ph. D. Thesis by Jing Zhang Presented by Chao Wang.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesTuring.
Kolmogorov Complexity
Payam Seraji IPM-Isfahan branch, Ordibehesht 1396
Computable Functions.
Agenda A Simple Question Definition of Types
Lecture 4. Importance of being small
Uncountable Classical and Quantum Complexity Classes
A computational approximation to the AIXI model
Lecture 6. Prefix Complexity K
Now it’s time to look at…
Now it’s time to look at…
Our First NP-Complete Problem
CSE 105 theory of computation
28th September 2005 Dr Bogdan L. Vrusias
Presentation transcript:

Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002

Content Kolmogorov complexity Universal Distribution Inductive Learning

Principle of Indifference (Epicurus) Keep all hypotheses that are consistent with the facts

Occam’s Razor Among all hypotheses consistent with the facts, choose the simplest Newton’s rule #1 for doing nature philosophy –We are to admit no more costs of nature things than such as are both true and sufficient to explain the appearances

Question What does “simplest” mean? How to define simplicity? Can a thing be simple under one definition and not under another?

Bayes’ Rule P(H|D) = P(D|H)*P(H)/P(D) -P(H) is often considered as initial degree of belief in H In essence, Bayes’ rule is a mapping from prior probability P(H) to posterior probability P(H|D) determined by D

How to get P(H) By the law of large numbers, we can get P(H|D) if we use many examples Give as much information about that from only a limited of number of data P(H) may be unknown, uncomputable, even may not exist Can we find a single probability distribution to use as prior distribution in each different case, with a proximately the same result as if we had used the real distribution

Hume on Induction Induction is impossible because we can only reach conclusion by using known data and methods. So the conclusion is logically already contained in the start configuration

Solomonoff’s Theory of Induction Maintain all hypotheses consistent with the data Incoporate “Occam’s Razor”-assign the simplest hypotheses with highest probability Using Bayes’ rule

Kolmogorov Complexity k(s) is the length of the shortest program which, on no input, prints out s k(s)<=|s| There is a string s, k(s) >=n k(s) is objective (program language independent) by Invariance Theorem

Universal Distribution P(s) = 2 -k(s) We use k(s) to describe the complexity of an object. By Occam’s Razor, the simplest should have the highest probability.

Problem:  P(s)>1 For every n, there exists a n-bit string s, k(s) = log n, so P(s) = 2 -log n = 1/n ½+1/3+….>1

Levin’s improvement Using prefix-free program –A set of programs, no one of which is a prefix of any other Kraft’s inequality –Let L1, L2,… be a sequence of natural numbers. There is a prefix-code with this sequence as lengths of its binary code words iff  n 2 -ln <=1

Multiplicative domination Levin proved that there exists c, c*p(s) >= p’(s) where c depends on p, but not on s If true prior distribution is computable, then use the single fixed universal distribution p is almost as good as the actually true distribution itself

Turing’s thesis: Universal turing machine can compute all intuitively computable functions Kolmogorov’s thesis: the Kolmogorov complexity gives the shortest description length among all description lengths that can be effectively approximated according to intuition. Levin’s thesis: The universal distribution give the largest distribution among all the distribution that can be effectively approximated according to intuition

Universal Bet Street gambler Bob tossing a coin and offer: –Next is head “1” – give Alice 2$ –Next is tail “0” – pay Bob 1$ Is Bob honest? –Side bet: flip coin 1000 times, record the result as a string s –Alice pay 1$, Bob pay Alice k(s) $

Good offer: –  |s|= k(s) =  |s|= k(s) <=1 If Bob is honest, Alice increase her money polynomially If Bob cheat, Alice increase her money exponentially

Notice The complexity of a string is non- computable

Conclusion Kolmogorov complexity – optimal effective descriptions of objects Universal Distribution – optimal effective probability of objects Both are objective and absolute

Reference Ming Li, Paul Vitanvi, An Introduction to Kolmogorov complexity and its applications, 2 nd Edtion Spring – Verky 1997