Presentation is loading. Please wait.

Presentation is loading. Please wait.

Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin.

Similar presentations


Presentation on theme: "Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin."— Presentation transcript:

1 Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

2 Non-Probabilistic Statistics

3 Classic Statistics--Recalled

4 Probabilistic Sufficient Statistic

5 Kolmogorov complexity  K(x)= length of shortest description of x  K(x|y)=length of shortest description of x given y. A string is random if K(x) ≥ |x|.  K(x)-K(x|y) is information y knows about x.  Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)

6 Randomness Deficiency

7 Algorithmic Sufficient Statistic where model is a set

8 Algorithmic suficient statistic where model is a total computable function Data is binary string x; Model is a total computable function p ; Prefix complexity is K(p) ( size smallest TM computing p ); Data-to-model code length l_x(p)=min_d {|d|:p(d)=x. x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small. p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p. p is minimal ss (sophistication) for x if K(p) minimal.

9 Graph Structure Function h_x(α) α log |S| Lower bound h_x(α)=K(x)-α

10 Minimum Description Length estimator, Relations between estimators Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}. MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}. Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.

11 Individual characteristics: More detail, especially for meaningful (nonrandom) Data We flip the graph so that log|.| is on the x-axis and K(.) is on the y-axis. This is essentally the Rate-distortion graph for list (set) distortion.

12 Primogeniture of ML/MDL estimators ML/MDL estimators can be approximated from above; Best-fit estimator cannot be approximated Either from above or below, up to any Precision. But the approximable ML/MDL estimators yield the best-fitting models, even though we don’t know the quantity of goodness- of-fit  ML/MDL estimators implicitly optimize goodness-of-fit.

13 Positive- and Negative Randomness, and Probabilistic Models

14 Precision of following given function h(α) h(α) d h_x(α) Model cost α Data-to-Model cost log |S|

15 Logarithmic precision is sharp Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those are the strings of high complexity K(x) > n. For strings of low complexity, say K(x)< n/2, The number of appropriate functions is much greater than the number of strings. Hence there cannot be a string for every such function. But we show that there is a string for every approximate shape of function.

16 All degrees of neg. randomness Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n (up to a log term) Proof. All shapes of the structure function are possible, as long as it starts from n-k and decreases monotonically and is 0 at k for some k ≤ n. (Up to the precision in the previous slide).

17 Are there natural examples of negative randomness Question: Are there natural examples of strings of with large negative randomness. Kolmogorov didn’t Think they exist, but we know the are abundant.. Maybe information distance between strings x and y yields large negative randomness.

18 Information Distance: Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98) D(x,y) = min { |p|: p(x)=y & p(y)=x} Binary program for a Universal Computer (Lisp, Java, C, Universal Turing Machine) Theorem (i) D(x,y) = max {K(x|y),K(y|x)} Kolmogorov complexity of x given y, defined as length of shortest binary ptogram that outputs x on input y. (ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x. ≤ 1 (iii) D(x,y) is a metric.

19 Not between random strings The information distance between random strings x and y of length n doesn’t work. If x,y satisfy K(x|y),K(y|x) > n then p=x XOR y where XOR means bitwise exclusive-or serves as a program to translate x too y and y to x. But if x and y are positively random it appears that p is so too. T

20 Selected Bibliography N.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, http://arxiv.org/abs/cs.IT/0411014, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), 3265- 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463. Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), 1--29. P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446--464.


Download ppt "Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin."

Similar presentations


Ads by Google