Download presentation
Presentation is loading. Please wait.
1
. Multivariate Information Bottleneck Noam Slonim Princeton University Lewis-Sigler Institute for Integrative Genomics Nir Friedman Naftali Tishby Hebrew University School of Computer Science and Engineering
2
2 Multivariate Information Bottleneck - Preview - A general framework for specifying a new family of clustering problems - Almost all of these problems, are not treated by standard clustering approaches - Insights and demonstrations why these problems are important - A general optimal solution for all these problems, based on a single Information Theoretic principle - Applications for text analysis, gene expression data and more...
3
3 Multivariate IB – introduction u Second half starts here… u Maybe a temporary summary – a well defined method (formulated as a variational principle)… 3 different algorithmic approaches… however – it was limited for a specific optimization problem… but we could think of other problems (e.g. symmetric)… and in the following we will describe a lift-up of the first half for dealing with a much more rich family of problems… still work in progress… -Original IB: Compressing one variable while preserving the information about some other single variable
4
4 Multivariate IB – introduction (cont.) -However, we could think of other problems, e.g. symmetric compression: Question: How to formulate and solve all such problems under one unifying principle?
5
5 (a few words about …) Bayesian Networks -A Bayes net over (X 1,…,X n ) is a DAG G in which vertices correspond to the random variables - P(X 1,…,X n ) is consistent with G iff each X i is independent of all the other (non-descendant) variables, given its parents Pa i
6
6 Multi-information and Bayes nets -The information (X 1,…,X n ) contains about each other is captured by: -If P(X 1,…,X n ) is consistent with G then:
7
7 Original IB through Bayes net formulation New generalized formulation: Which in this case means: Constant What compresses what What predicts what
8
8 Alternative formulation: preliminaries For a given DAG G, define: P For P which is consistent with G in : Real multi-info in P(X,T) Multi-info as though P(X,T) is consistent with G out
9
9 Alternative formulation for original IB Which in this case means: Constant Actual distribution Desired independencies Alternative formulation: Minimize independencies violations
10
10 Comparing the two principles - Given G in, different choices of G out will yield different optimization problems… - Given G in and G out, each principle will yield different optimization problems… Original IB problem
11
11 Comparing the two principles (cont.)
12
12 Beyond the original IB [Slonim, Friedman, Tishby] G in dependencies (minimize) G out dependencies (maximize) Compression (Bottleneck) variables Input variables Parameters
13
13 A simple example: Symmetric IB What compresses whatWhat predicts what
14
14 A multivariate formal optimal solution -Where now d(Pa j,t j ) is a generalized (KL) distortion measure… - For example, in symmetric IB:
15
15 Multivariate IB algorithms – example for aIB [Slonim, Friedman, Tishby, 2002] W 1 W 2 W 3 W 4 W 5................ W N W 1 W 2 W 3, W 4 W 5.......... W N W 1,W 2...W N W 1 W 2 W 3 W 4 W 5................ W N W 1 W 2 W 3,W 4 W 5.......... W N W 1,W 2...W N -Which pair to merge? -Where now is a generalized (JS) distortion measure… - For example, in symmetric aIB:
16
16 Symmetric aIB compression: documents, words - Accuracy of symmetric aIB vs. original aIB over 3 small datasets: Word clusters provide a more robust representation…
17
17 Symmetric IB through Deterministic Annealing Data: 20,000 messages from 20 different discussion groups [Lang, 95] W – a word in the corpus C – the class (newsgroup) of the message P(W=‘bible’,C=‘alt.atheism’): Probability that choosing a random position in the corpus would select the word ‘bible’ in a message of the newsgroup (class) ‘alt.atheism’… Words Classes
18
18 Symmetric IB through Deterministic Annealing Newsgroup Word
19
19 Symmetric IB through Deterministic Annealing alt.atheism rec.autos rec.motorcycles rec.sport.* sci.med sci.space soc.religion.christian talk.politics.* comp.* misc.forsale sci.crypt sci.electronics car turkish game team jesus gun hockey … x file image encryption window dos mac … Newsgroup Word P(T C,T W )
20
20 Symmetric IB through Deterministic Annealing Newsgroup word comp.graphics comp.os.ms-windows.misc comp.windows.x comp.sys.ibm.pc.hardware comp.sys.mac.hardware misc.forsale sci.crypt sci.electronics windows image window jpeg graphics … encryption db ide escrow monitor … P(T C,T W )
21
21 Symmetric IB through Deterministic Annealing Newsgroup word P(T C,T W )
22
22 Symmetric IB through Deterministic Annealing Newsgroup word alt.atheism rec.sport.baseball rec.sport.hockey soc.religion.christian talk.politics.mideast talk.religion.misc rec.autos rec.motorcycles sci.med sci.space talk.politics.guns talk.politics.misc armenian turkish jesus hockey israeli armenians … car q gun bike fbi health … P(T C,T W )
23
23 Symmetric IB through Deterministic Annealing Newsgroup Word P(T C,T W )
24
24 Symmetric IB through Deterministic Annealing Newsgroup Word P(T C,T W )
25
25 Symmetric IB through Deterministic Annealing Newsgroup Word atheists christianity jesus bible sin faith … alt.atheism soc.religion.christian talk.religion.misc P(T C,T W )
26
26 Symmetric aIB compression: genes, samples Data: Gene expression of 500 “informative” genes Vs. 72 Leukemia samples (Golub et al, 1999) Genes Samples
27
27 Symmetric aIB compression: genes, samples ALL B-cell hosp1 ALL B-cell hosp1 ALL T-cell hosp1 Male BM B-cell BM B-cell AML hosp2 AML hosp3 10 Gene clusters 8 Sample clusters X00437_s_at M12886_at X76223_s_at M59807_at U23852_s_at D00749_s_at U89922_s_at X03934_at U50743_at M21624_at M28826_at M37271_s_at X59871_at X14975_at M16336_s_at L05148_at M28825_at Data after symmetric aIB compression:
28
28 Another example: parallel IB - Consider a document collection with different topics, and different writing styles: topic4 topic2 topic3 Science topic1
29
29 Another example: parallel IB (cont.) topic2 topic1 topic4 topic3 Topic1Topic2Topic3Topic4 -One possible “legitimate” partition is by the topic:
30
30 Another example: parallel IB (cont.) -And another possible “legitimate” partition is by the writing style: topic1 topic3 topic2 topic3 topic4 topic1 topic4 topic1 topic2 topic4 topic1 topic3 topic1 topic3 topic4 topic1 topic2 topic3 topic1 topic3 topic2 topic4 Style1Style2Style3 There might be more than one “legitimate” partition…
31
31 Parallel IB: solution Minimize dependenciesMaximize dependencies Effective distortion:
32
32 Parallel sIB: Text analysis results -Data: ~1,500 “documents” taken from E. R. Burroughs: The Beasts of Tarzan & The Gods of Mars R. Kipling: The Jungle Book & Rewards and Fairies - X 1 corresponds to “documents”, X 2 corresponds to words 32542 1254 4061 2315 T 2,b T 2,a Burroughs Kipling 3670 Rewards and Fairies 2550 The Jungle Book 0407 The Gods of Mars 2315 The Beasts of Tarzan T 1,b T 1,a
33
33 Parallel sIB :Gene Expression data results - Data: Gene expression of 500 “informative” genes Vs. 72 Leukemia samples (Golub et al, 1999) - X 1 corresponds to samples, X 2 corresponds to genes.72.64 90 T-cell 380 B-cell 470 ALL 223 AML T 1,b T 1,a.66.71 90 137 1037 1114 T 2,b T 2,a.76.53 63 326 389 1312 T 3,b T 3,a.69.70 72 1820 2522 1213 T 4,b T 4,a
34
34 Another Example: Triplet IB -Consider the following sequence data: s(1) s(2) s(3) … s(t-1) s(t) s(t+1) … -Can we extract features s.t. their combination is informative about a symbol between them? XpXp XmXm XnXn TpTp TnTn
35
35 Triplet IB: solution Minimize dependenciesMaximize dependencies
36
36 Triplet IB Data (E. R. Burroughs, “Tarzan the Terrible”) “… As Tarzan ascended the platform his eyes narrowed angrily at the sight which met them… ‘’What means this?” he cried angrily…” 1 st word in triplet X p 2 nd word in triplet X m 3 rd word in triplet X n X m = {apemans, apes, eyes, girl, great, jungle, tarzan, time, two, way} Data: Tarzan and the Jewels of Opar, Tarzan of the Apes, Tarzan the Terrible, Tarzan the Untamed, The Beasts of Tarzan, The Jungle Tales of Tarzan, The Return of Tarzan Joint distribution P(X p,X m,X n ) of dimension 90 x 10 x 233
37
37 Triplet sIB: Text analysis results - Given X p and X n, two schemes to predict middle word: X m = argmax P( x m’ | t p,t n ) - Test on a NEW sequence, “The son of Tarzan”: 22%28%55%53% Average 21%28%81%60% Way (101) 8%11%92%41% Two (148) 26%48%82%70% Time (145) 25%40%67%41% Tarzan (48) 24%27%54%49% Jungle (241) 48%50%92% Great (219) 1%5%30%43% Girl (240) 28%32%81%83% Eyes (177) 14%17%26%43% Apes(78) X p, X n T p, T n X p, X n T p, T n XmXm Precision (%)Recall (%) X m = argmax P( x m’ | x p,x n )
38
38 Summary - The IB method is a principled framework, for extracting “informative” structure out of a joint distribution P(X1,X2). - The Multivariate IB extends this framework to extract “informative” structure from more complex joint distributions, P(X1,…,Xn), in various ways. - This enables us to define and solve a new family of optimization problems, under a single unifying Information Theoretic principle. - References: www.cs.huji.ac.il/~noamm - “Clustering” conceals a family of distinct problems which deserve special consideration. The multivariate IB framework enables to define these sub-problems, solve them, and demonstrate their importance.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.