Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University
Motivation Extend the IB for a broad family of representations Relation to the Exponential family Hello, world Multinomial distribution Vectors
Outline Rate-Distortion Formulation Bregman Divergences Bregman IB Statistical Interpretation Summary
Information Bottleneck XTY X [p(y=1|X) … p(y=n|X)] [p(y=1|T) … p(y=n|T)] T
Input Variables Distortion Rate-Distortion Formulation
Bolzman Distribution: Markov + Bayes Marginal Self-Consistent Equations
Bregman Divergences f (u,f(u)) (v,f(v)) (v, f(u)+f’(u)(v-u)) B f (v||u) = f(v) - (f(u)+f’(u)(v-u))B f (v||u) = f:S R
Functional Bregman Function Input Variables Distortion Bregman IB: Rate-Distortion Formulation
Bolzman Distribution: Prototypes: convex combination of input vectors Marginal Self-Consistent Equations
Special Cases Information Bottleneck: Bregman function : f(x)=x log(x) – x Domain: Simplex Divergence: Kullback-Leibler Soft K-means Bregman function: f(x)=(1/2) x 2 Domain: Reals n Divergence: Euclidian Distance [Still, Bialek, Bottou, NIPS 2003]
Bregman IB Information Bottleneck Bregman Clustering Rate-Distortion Exponential Family
Expectation parameters: Examples (single dimension): Normal Poisson
Expectation parameters: Properties : Exponential Family and Bregman Divergences
Illustration
Expectation parameters: Properties : Exponential Family and Bregman Divergences
Distortion: Data vectors and prototypes: expectation parameters Question: For what exponential distribution we have ? Answer: Poisson Back to Distributional Clustering
Product of Poisson Distributions Illustration a a b a a a b a a a.8.2 ab ab Pr Multinomial Distribution
Back to Distributional Clustering Information Bottleneck: Distributional clustering of Poison distributions (Soft) k-means: (Soft) Clustering of Normal distributions
Distortion Input: Observations Output Parameters of Distribution IB functional: EM [Elidan & Fridman, before] Maximum Likelihood Perspective
Posterior: Partition Function: Weighted -norm of the Likelihood → ∞, most likely cluster governs →0, clusters collapse into a single prototype Back to Self Consistent Equations
Summary Bregman Information Bottleneck Clustering/Compression for many representations and divergences Statistical Interpretation Clustering of distributions from the exponential family EM like formulation Current Work: Algorithms Characterize distortion measures which also yield Bolzman distributions General distortion measures