Download presentation
Presentation is loading. Please wait.
Published byKory Gilmore Modified over 9 years ago
1
Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin
2
Non-Probabilistic Statistics
3
Classic Statistics--Recalled
4
Probabilistic Sufficient Statistic
5
Kolmogorov complexity K(x)= length of shortest description of x K(x|y)=length of shortest description of x given y. A string is random if K(x) ≥ |x|. K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)
6
Randomness Deficiency
7
Algorithmic Sufficient Statistic where model is a set
8
Algorithmic suficient statistic where model is a total computable function Data is binary string x; Model is a total computable function p ; Prefix complexity is K(p) ( size smallest TM computing p ); Data-to-model code length l_x(p)=min_d {|d|:p(d)=x. x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small. p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p. p is minimal ss (sophistication) for x if K(p) minimal.
9
Graph Structure Function h_x(α) α log |S| Lower bound h_x(α)=K(x)-α
10
Minimum Description Length estimator, Relations between estimators Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}. MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}. Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.
11
Individual characteristics: More detail, especially for meaningful (nonrandom) Data We flip the graph so that log|.| is on the x-axis and K(.) is on the y-axis. This is essentally the Rate-distortion graph for list (set) distortion.
12
Primogeniture of ML/MDL estimators ML/MDL estimators can be approximated from above; Best-fit estimator cannot be approximated Either from above or below, up to any Precision. But the approximable ML/MDL estimators yield the best-fitting models, even though we don’t know the quantity of goodness- of-fit ML/MDL estimators implicitly optimize goodness-of-fit.
13
Positive- and Negative Randomness, and Probabilistic Models
14
Precision of following given function h(α) h(α) d h_x(α) Model cost α Data-to-Model cost log |S|
15
Logarithmic precision is sharp Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those are the strings of high complexity K(x) > n. For strings of low complexity, say K(x)< n/2, The number of appropriate functions is much greater than the number of strings. Hence there cannot be a string for every such function. But we show that there is a string for every approximate shape of function.
16
All degrees of neg. randomness Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n (up to a log term) Proof. All shapes of the structure function are possible, as long as it starts from n-k and decreases monotonically and is 0 at k for some k ≤ n. (Up to the precision in the previous slide).
17
Are there natural examples of negative randomness Question: Are there natural examples of strings of with large negative randomness. Kolmogorov didn’t Think they exist, but we know the are abundant.. Maybe information distance between strings x and y yields large negative randomness.
18
Information Distance: Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98) D(x,y) = min { |p|: p(x)=y & p(y)=x} Binary program for a Universal Computer (Lisp, Java, C, Universal Turing Machine) Theorem (i) D(x,y) = max {K(x|y),K(y|x)} Kolmogorov complexity of x given y, defined as length of shortest binary ptogram that outputs x on input y. (ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x. ≤ 1 (iii) D(x,y) is a metric.
19
Not between random strings The information distance between random strings x and y of length n doesn’t work. If x,y satisfy K(x|y),K(y|x) > n then p=x XOR y where XOR means bitwise exclusive-or serves as a program to translate x too y and y to x. But if x and y are positively random it appears that p is so too. T
20
Selected Bibliography N.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, http://arxiv.org/abs/cs.IT/0411014, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), 3265- 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463. Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), 1--29. P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446--464.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.