Download presentation
Presentation is loading. Please wait.
Published byCameron Thomas Modified over 8 years ago
1
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental Design, ICML 2008 2010-08-27 Presented by Yongjin Kwon
2
Copyright 2010 by CEBT Active Learning Nowadays huge amounts of data are cheaply available and are used to find some patterns or to extract some information. Many supervised learning systems require a large amount of (labeled) training dataset to perform well. email spam, handwritten digits, movie rating, etc. In many applications, however, labeled training data are very difficult, time-consuming, or expensive to obtain. speech recognition, text classification, biological research, etc. Problem : Given a pool of unlabeled data and an oracle, how can the machine learn something accurately, requesting as few labels as possible? 2
3
Copyright 2010 by CEBT Active Learning (Cont’d) If the machine actively tries to learn some “curious” or “informative” data, it will perform better with less training! 3 Answer Query “curious” points only. (b) Active Learning One-way teaching (a) Passive Learning Learn something Everything should be prepared!
4
Copyright 2010 by CEBT Active Learning (Cont’d) What are “curious” or “informative” points? If the learner is NOT unsure about the label of a point, then the point will be less curious. 4 less curiousmore curious
5
Copyright 2010 by CEBT Typical Active Learning Approach Start by querying the labels of a few randomly-chosen points. Repeat the following process: Determine the decision boundary on current set of labeled points. Choose the next unlabeled point closest to the current decision boundary. (i.e. the most “uncertain” or “informative” point) Query that point and obtain its label. 5 Decision Boundary Binary Classification:
6
Copyright 2010 by CEBT Typical Active Learning Approach (Cont’d) Missed Cluster Effect Random sampling may fail to explore some regions, called missed cluster. After the initial sampling and learning, only points in the center group are queried due to the sampling bias. The learner is NOT consistent even with infinitely many labels! 6 45%5%45%5% Missed ClusterOnly these points are selected!
7
Copyright 2010 by CEBT Cluster-Adaptive Sampling How can the adaptive sampling really be helpful? Exploiting (cluster) structure in data Efficient search through hypothesis space 7 Probably just four labels are needed!
8
Copyright 2010 by CEBT Cluster-Adaptive Sampling (Cont’d) Query points in order to recognize the label of each cluster, rather than the label of each individual point. Every point will be assigned the label of the corresponding cluster. How to obtain the label of a cluster? Sample some points from the cluster and find the majority label. 8
9
Copyright 2010 by CEBT Cluster-Adaptive Active Learning Algorithm Unsupervised hierarchical clustering of the unlabeled data Cluster-adaptive sampling – Idea : Find a pruning of the tree (hierarchical clustering) that consists of “pure” nodes (or clusters). – Choose a node v in the current pruning with probability prob[v] first, choose a random point p within the node v, and then query p ‘s label. – Assess “purity” of each node and change the pruning to a better one. – After finding the best pruning, assign each point the label of the corresponding cluster. Supervised learning on the resulting fully labeled data 9 Set of nodes in the tree that forms a partition of the set of leaves
10
Copyright 2010 by CEBT Cluster-Adaptive Active Learning (Cont’d) Hierarchical clustering 10 1 2 3 4 5 6 7 8 9 1 23 4567 89 leaves (points)
11
Copyright 2010 by CEBT Cluster-Adaptive Active Learning (Cont’d) Cluster-adaptive sampling 11 1 2 3 4 5 6 7 8 9 1 23 4567 89 leaves (points) : pruning
12
Copyright 2010 by CEBT Cluster-Adaptive Active Learning (Cont’d) Labeling 12 1 2 3 4 5 6 7 8 9 1 23 4567 89 leaves (points)
13
Copyright 2010 by CEBT Cluster-Adaptive Active Learning (Cont’d) Comparison between sampling methods Random Sampling – If there is a pruning with m nodes and error ε, then only O ( m/ ε ) labels are needed before finding a pruning with error O ( ε ). Active Sampling – Never worse than a constant factor away from consistency of random sampling. – More quickly converges than random sampling. 13
14
Copyright 2010 by CEBT Experiments Hierarchical Clustering Ward’s average linkage clustering Baseline Algorithms Random sampling (passive learning) Margin-based sampling (query points closet to current classifier) Model -regularization with logistic regression, choosing the trade-off parameter with 10-fold cross validation Classification OCR digit images (multi-class classification) Newsgroup text (binary classification) 14
15
Copyright 2010 by CEBT Experiments (Cont’d) OCR Digit Images 15 Errors of the best prunings in the OCR digits tree Test error curves on classification task
16
Copyright 2010 by CEBT Experiments (Cont’d) Newsgroup Text 16 Test error curves on classification task
17
Copyright 2010 by CEBT Conclusions Cluster-Adaptive Active Learning Exploits cluster structures inherited in data. Manages sampling bias to make the learner consistent. Empirically outperforms random sampling and is competitive with inconsistent active learning methods. 17
18
Copyright 2010 by CEBT Discussions The cluster-adaptive active learning introduces another approach of active learning. It looks at the structure of the large amount of data. Hierarchical clustering method may affect the total cost. It is not sure that this method can cope with very large unlabeled data. 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.