Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin
Non-Probabilistic Statistics
Classic Statistics--Recalled
Probabilistic Sufficient Statistic
Kolmogorov complexity K(x)= length of shortest description of x K(x|y)=length of shortest description of x given y. A string is random if K(x) ≥ |x|. K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)
Randomness Deficiency
Algorithmic Sufficient Statistic where model is a set
Algorithmic suficient statistic where model is a total computable function Data is binary string x; Model is a total computable function p ; Prefix complexity is K(p) ( size smallest TM computing p ); Data-to-model code length l_x(p)=min_d {|d|:p(d)=x. x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small. p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p. p is minimal ss (sophistication) for x if K(p) minimal.
Graph Structure Function h_x(α) α log |S| Lower bound h_x(α)=K(x)-α
Minimum Description Length estimator, Relations between estimators Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}. MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}. Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.
Individual characteristics: More detail, especially for meaningful (nonrandom) Data We flip the graph so that log|.| is on the x-axis and K(.) is on the y-axis. This is essentally the Rate-distortion graph for list (set) distortion.
Primogeniture of ML/MDL estimators ML/MDL estimators can be approximated from above; Best-fit estimator cannot be approximated Either from above or below, up to any Precision. But the approximable ML/MDL estimators yield the best-fitting models, even though we don’t know the quantity of goodness- of-fit ML/MDL estimators implicitly optimize goodness-of-fit.
Positive- and Negative Randomness, and Probabilistic Models
Precision of following given function h(α) h(α) d h_x(α) Model cost α Data-to-Model cost log |S|
Logarithmic precision is sharp Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those are the strings of high complexity K(x) > n. For strings of low complexity, say K(x)< n/2, The number of appropriate functions is much greater than the number of strings. Hence there cannot be a string for every such function. But we show that there is a string for every approximate shape of function.
All degrees of neg. randomness Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n (up to a log term) Proof. All shapes of the structure function are possible, as long as it starts from n-k and decreases monotonically and is 0 at k for some k ≤ n. (Up to the precision in the previous slide).
Are there natural examples of negative randomness Question: Are there natural examples of strings of with large negative randomness. Kolmogorov didn’t Think they exist, but we know the are abundant.. Maybe information distance between strings x and y yields large negative randomness.
Information Distance: Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98) D(x,y) = min { |p|: p(x)=y & p(y)=x} Binary program for a Universal Computer (Lisp, Java, C, Universal Turing Machine) Theorem (i) D(x,y) = max {K(x|y),K(y|x)} Kolmogorov complexity of x given y, defined as length of shortest binary ptogram that outputs x on input y. (ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x. ≤ 1 (iii) D(x,y) is a metric.
Not between random strings The information distance between random strings x and y of length n doesn’t work. If x,y satisfy K(x|y),K(y|x) > n then p=x XOR y where XOR means bitwise exclusive-or serves as a program to translate x too y and y to x. But if x and y are positively random it appears that p is so too. T
Selected Bibliography N.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000),