Download presentation
Presentation is loading. Please wait.
Published byJoel Stone Modified over 9 years ago
1
An Application of Divergence Estimation to Projection Retrieval for Semi-supervised Classification and Clustering MADALINA FITERAU ARTUR DUBRAWSKI ICML WDDL 20-06-2013
2
Informative Projection Retrieval We consider applications where a human operator supervises and validates the classification or clustering procedure. Thus, the process must be: Transparent Comprehensible Projection Retrieval for a Learning Task -problem of finding subspaces -which give operators confidence in the resolution of the task for a given query -where the task is well-handled overall given some performance metric (such as expected risk) 2
3
Models Designed to Aid Human Users 3 BORDER CONTROLSTOCK MARKETVEHICLE CHECKS DIAGNOSTICS CELL ANALYSIS DRUG EVALUATION ACCEPT OUTCOME INVESTIGATE FURTHER
4
Framework X X g(X) 4 π 1 ( X) π 2 ( X) π 3 ( X) π ( X) CONTEXTSOLVERSPROJECTIONSSELECTOR QUERY
5
0 Assign data to projections 0 Train solvers given the assignments The 2 Stages of the Framework Features
6
Starting point: the loss matrix 6 Projections Samples Divergence- based estimators of loss function 6 HIGH LOSS LOW LOSS moderate loss high loss low loss
7
Starting point: the loss matrix 7 Projections Samples 7 1 2 34 Divergence- based estimators of loss function moderate loss high loss low loss
8
RIPR Model for a learning task T 8 Dataset Target model Small set of projections Selection function Solvers
9
Objective Function for RIPR Expected loss for task solver trained on projection assigned to point 9
10
Loss/Risk for common Learning Tasks 10 Learning Task Loss/Risk Classification * Semi-supervised classification Clustering Regression ** * The object of prior work: “Projection Retrieval for Classification”, NIPS 2013 ** Beyond the scope of this talk since it doesn’t use entropy estimators
11
Local Entropy Estimators 11 Neighbor-based estimator for conditional entropy*: Based on the divergence estimator by Poczos and Schneider, “On the estimation of alpha-divergences” (AISTATS 2011) 11 x x
12
12 Entropy Estimators for semi- supervised classification 0 For labeled samples: same as for classification 0 For unlabeled samples: 0 Consider all possible labeled assignments 0 Assume the most ‘confident’ label (with smallest loss) Equivalent to 0 Penalizing unlabeled samples proportional to how ambivalent they are to the label assigned DECENT AWFUL GOOD
13
13 Entropy Estimators for semi- supervised classification 0 For labeled samples: same as for classification 0 For unlabeled samples: 0 Consider all possible labeled assignments 0 Assume the most ‘confident’ label (with smallest loss) Equivalent to 0 Penalizing unlabeled samples proportional to how ambivalent they are to the label assigned
14
14 Entropy Estimators for Clustering 0 Point-wise estimators are problematic for clustering 0 An ensemble view of the data is typical required 0 The issue is we don’t know which data to assign to which projection prior to clustering
15
15 Entropy Estimators for Clustering 0 Point-wise estimators are problematic for clustering 0 An ensemble view of the data is typical required 0 The issue is we don’t know which data to assign to which projection prior to clustering 0 We focus on density-based clustering 0 The loss is lower for densely packed regions 0 We eliminate dimensionality issues by considering negative KL divergence to uniform on the same space BADGOOD
16
16 Entropy Estimators for Clustering 0 Point-wise estimators are problematic for clustering 0 An ensemble view of the data is typical required 0 The issue is we don’t know which data to assign to which projection prior to clustering 0 We focus on density-based clustering 0 The loss is lower for densely packed regions 0 We eliminate dimensionality issues by considering negative KL divergence to uniform on the same space* * some scaling issues remain
17
The Optimization Procedure optimal nearly optimal We introduce a penalty over # of columns to limit the # of projections in the model 12345671234567 Data Points Projections 17 Matrix of Loss Estimators (L)
18
The Optimization Procedure Suboptimal projections will be used for some of the points 12345671234567 Data Points Projections 18 optimal nearly optimal Matrix of Loss Estimators (L)
19
Regression for Informative Projection Recovery (RIPR) 0 RIPR biases the projection selection toward ‘popular’ projections through a multiplier δ 0 Iterative procedure 0 Get estimate of selection matrix B 0 Compute multiplier δ inversely proportional with projection popularity 0 Obtain new selection matrix B by using penalty |Bδ| 1 19
20
The RECIP Algorithm 20 1. Estimate selection matrix B 2. Compute multiplier δ inversely proportional with projection popularity 1. Obtain new selection matrix B penalizing Bδ ITERATE UNTIL CONVERGENCE
21
Assigning a Projection to a Query 21 For semi-supervised classification, consider distances to labeled samples only. For clustering, consider cluster assignments determined during learning.
22
Experiments: Semi-supervised classification on artificial data 22 Accuracy Number of noisy samples Each u th sample is unlabeled Dataset contains 3 informative projections, 3000 labeled points. RIPR correctly recovers the projections for all setups tested here. RIPR model Classifier trained on all features (same class)
23
Experiments: Clustering on UCI data 23 UCI DatasetDist Log Vol RIPRRIPR0RIPRKmeans Seeds161747.689.70 Libras9620-5.807.26 MiniBOONE1252,180,019240.00248.15 Cell40,87718,881,66454.6967.68 Concrete1,37068,86549.2452.75 SUM OF MEAN DISTANCES TO CLUSTER CENTERS AND LOG CLUSTER VOLUME LOWER IS BETTER. RIPR MODELS ALWAYS HAVE A SMALLER TOTAL VOLUME.
24
Summary 0 Informative Projection Retrieval is relevant to many applications requiring intervention of human users 0 We solved IPR through a regression-based optimization over a task-specific loss matrix 0 The loss is expressed though divergence estimators 0 Semi-supervised models: penalize unlabeled data that cannot be confidently assigned to a class 0 Clustering models: 0 RIPR models are compact and well-performing 0 IPs perfectly recovered (artificial data) 0 Often more accurate than classifiers trained on all features 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.