Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin.

Slides:



Advertisements
Similar presentations
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Advertisements

The Equivalence of Sampling and Searching Scott Aaronson MIT.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Lecture 6. Prefix Complexity K The plain Kolmogorov complexity C(x) has a lot of “minor” but bothersome problems Not subadditive: C(x,y)≤C(x)+C(y) only.
Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Set Cover 資工碩一 簡裕峰. Set Cover Problem 2.1 (Set Cover) Given a universe U of n elements, a collection of subsets of U, S ={S 1,…,S k }, and a cost.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
1 SOFSEM 2007 Weighted Nearest Neighbor Algorithms for the Graph Exploration Problem on Cycles Eiji Miyano Kyushu Institute of Technology, Japan Joint.
Chapter 6 The Structural Risk Minimization Principle Junping Zhang Intelligent Information Processing Laboratory, Fudan University.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
T(n) = 4 T(n/3) +  (n). T(n) = 2 T(n/2) +  (n)
Compression-based Unsupervised Clustering of Spectral Signatures D. Cerra, J. Bieniarz, J. Avbelj, P. Reinartz, and R. Mueller WHISPERS, Lisbon,
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Information Distance from a Question to an Answer.
1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Derandomizing LOGSPACE Based on a paper by Russell Impagliazo, Noam Nissan and Avi Wigderson Presented by Amir Rosenfeld.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
1 INTRODUCTION NP, NP-hardness Approximation PCP.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Lecture 3. Relation with Information Theory and Symmetry of Information Shannon entropy of random variable X over sample space S: H(X) = ∑ P(X=x) log 1/P(X=x)‏,
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
Section 11.4 Language Classes Based On Randomization
Similarity and Denoising Paul Vitanyi CWI, University of Amsterdam.
Advanced Algorithms Piyush Kumar (Lecture 5: Weighted Matching) Welcome to COT5405 Based on Kevin Wayne’s slides.
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS 898.
On Data Mining, Compression, and Kolmogorov Complexity. C. Faloutsos and V. Megalooikonomou Data Mining and Knowledge Discovery, 2007.
Introduction With such a wide variety of algorithms to choose from, which one is best? Are there any reasons to prefer one algorithm over another? Occam’s.
Advanced Algorithms Piyush Kumar (Lecture 5: Weighted Matching) Welcome to COT5405 Based on Kevin Wayne’s slides.
Approximation Algorithms Pages ADVANCED TOPICS IN COMPLEXITY THEORY.
Computability Kolmogorov-Chaitin-Solomonoff. Other topics. Homework: Prepare presentations.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
On Lossy Compression Paul Vitanyi CWI, University of Amsterdam, National ICT Australia Joint work with Kolya Vereshchagin.
Information theory concepts in software engineering Richard Torkar
Algorithmic Information Theory, Similarity Metrics and Google Varun Rao.
Channel Capacity.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
1 Combinatorial Algorithms Parametric Pruning. 2 Metric k-center Given a complete undirected graph G = (V, E) with nonnegative edge costs satisfying the.
Facticity, Complexity and Big Data
Kolmogorov complexity and its applications Paul Vitanyi Computer Science University of Amsterdam Spring, 2009.
Competence Centre on Information Extraction and Image Understanding for Earth Observation Prof. Dr. Mihai Datcu SATELLITE IMAGE ARTIFACTS DETECTION BASED.
Kolmogorov complexity and its applications Ming Li School of Computer Science University of Waterloo CS860,
Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon.
Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002.
Coding Theory Efficient and Reliable Transfer of Information
Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
A Optimal On-line Algorithm for k Servers on Trees Author : Marek Chrobak Lawrence L. Larmore 報告人:羅正偉.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Entropy estimation and lossless compression Structure and Entropy of English How much lossless compression can be achieved for a given image? How much.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Approximation algorithms
Lower Bounds on Extended Formulations Noah Fleming University of Toronto Supervised by Toniann Pitassi.
Introduction With such a wide variety of algorithms to choose from, which one is best? Are there any reasons to prefer one algorithm over another? Occam’s.
Kolmogorov Complexity
Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Lecture 6. Prefix Complexity K
Near-Optimal (Euclidean) Metric Compression
Greedy: Huffman Codes Yin Tat Lee
Algorithmic Complexity and Random Strings
LECTURE 19: FOUNDATIONS OF MACHINE LEARNING
Presentation transcript:

Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin

Non-Probabilistic Statistics

Classic Statistics--Recalled

Probabilistic Sufficient Statistic

Kolmogorov complexity  K(x)= length of shortest description of x  K(x|y)=length of shortest description of x given y. A string is random if K(x) ≥ |x|.  K(x)-K(x|y) is information y knows about x.  Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)

Randomness Deficiency

Algorithmic Sufficient Statistic where model is a set

Algorithmic suficient statistic where model is a total computable function Data is binary string x; Model is a total computable function p ; Prefix complexity is K(p) ( size smallest TM computing p ); Data-to-model code length l_x(p)=min_d {|d|:p(d)=x. x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small. p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p. p is minimal ss (sophistication) for x if K(p) minimal.

Graph Structure Function h_x(α) α log |S| Lower bound h_x(α)=K(x)-α

Minimum Description Length estimator, Relations between estimators Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}. MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}. Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.

Individual characteristics: More detail, especially for meaningful (nonrandom) Data We flip the graph so that log|.| is on the x-axis and K(.) is on the y-axis. This is essentally the Rate-distortion graph for list (set) distortion.

Primogeniture of ML/MDL estimators ML/MDL estimators can be approximated from above; Best-fit estimator cannot be approximated Either from above or below, up to any Precision. But the approximable ML/MDL estimators yield the best-fitting models, even though we don’t know the quantity of goodness- of-fit  ML/MDL estimators implicitly optimize goodness-of-fit.

Positive- and Negative Randomness, and Probabilistic Models

Precision of following given function h(α) h(α) d h_x(α) Model cost α Data-to-Model cost log |S|

Logarithmic precision is sharp Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those are the strings of high complexity K(x) > n. For strings of low complexity, say K(x)< n/2, The number of appropriate functions is much greater than the number of strings. Hence there cannot be a string for every such function. But we show that there is a string for every approximate shape of function.

All degrees of neg. randomness Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n (up to a log term) Proof. All shapes of the structure function are possible, as long as it starts from n-k and decreases monotonically and is 0 at k for some k ≤ n. (Up to the precision in the previous slide).

Are there natural examples of negative randomness Question: Are there natural examples of strings of with large negative randomness. Kolmogorov didn’t Think they exist, but we know the are abundant.. Maybe information distance between strings x and y yields large negative randomness.

Information Distance: Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98) D(x,y) = min { |p|: p(x)=y & p(y)=x} Binary program for a Universal Computer (Lisp, Java, C, Universal Turing Machine) Theorem (i) D(x,y) = max {K(x|y),K(y|x)} Kolmogorov complexity of x given y, defined as length of shortest binary ptogram that outputs x on input y. (ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x. ≤ 1 (iii) D(x,y) is a metric.

Not between random strings The information distance between random strings x and y of length n doesn’t work. If x,y satisfy K(x|y),K(y|x) > n then p=x XOR y where XOR means bitwise exclusive-or serves as a program to translate x too y and y to x. But if x and y are positively random it appears that p is so too. T

Selected Bibliography N.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000),