Download presentation
Presentation is loading. Please wait.
Published byEunice Ferguson Modified over 9 years ago
1
Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center, *** : Yahoo! Research Journal of Computer and System Sciences 2009 2010-10-08 Presented by Yongjin Kwon
2
Copyright 2010 by CEBT Introduction Nowadays a plentiful amount of data are cheaply available and are used to find useful patterns or concepts. Traditional machine learning has concentrated on the problems that require labeled data only. However, labeling is expensive! speech recognition, document classification, etc. How can we reduce the number of labeled data required? Exploit the abundance of unlabeled data! 2
3
Copyright 2010 by CEBT Introduction (Cont’d) Semi-supervised Learning Use a set of unlabeled data under additional assumptions. Active Learning Ask for labels of “informative” data. 3 Supervised Learning Semi-supervised and Active Learning more informative less informative
4
Copyright 2010 by CEBT Active Learning If the machine actively tries to learn some “informative” data, it will perform better with less training! 4 Answer Query “informative” points only. (b) Active Learning One-way teaching (a) Passive Learning Learn something Everything should be prepared!
5
Copyright 2010 by CEBT Active Learning (Cont’d) What are “informative” points? If the learner is NOT unsure about the label of a point, then the point will be less informative. 5 less informativemore informative
6
Copyright 2010 by CEBT Typical Active Learning Approach Start by querying the labels of a few randomly-chosen points. Repeat the following process: Determine the decision boundary on current set of labeled points. Choose the next unlabeled point closest to the current decision boundary. (i.e. the most “uncertain” or “informative” point) Query that point and obtain its label. 6 Decision Boundary Binary Classification:
7
Copyright 2010 by CEBT Improvement in Label Complexity 1-D Binary Classification in the noise-free setting Find the optimal threshold (or classifier). In order to achieve misclassification error ≤ ε, – Supervised Learning : O ( 1/ ε ) labeled examples are needed. – Active Learning : O (log 1/ ε ) labeled examples are needed! Exponential improvement in label complexity!! How general is this phenomenon? 7 Number of label requests to achieve a given accuracy threshold +++- - - (Binary Search)
8
Copyright 2010 by CEBT CAL Active Learning General-purpose learning strategy (in the noise-free setting) 8 Region of uncertainty Binary Classification Rectangular Classifier Ask its label!
9
Copyright 2010 by CEBT Lebel Complexity of CAL In realizable (or noise-free) case Label complexity for misclassification error ≤ ε, – Supervised Learning : O ( 1/ ε ) labeled examples – Active Learning : O (log 1/ ε ) labeled examples In unrealizable (or agnostic) case There is no perfect classifier of any form! A small amount of adversarial noise can make CAL fail to find the ( ε -)optimal classifier! A noise-robust algorithm is needed… 9 Binary Classification Threshold OptimalClassifier
10
Copyright 2010 by CEBT A Algorithm General-purpose learning strategy (in the agnostic setting) Do NOT trust answers from the oracle completely. Compare error bounds between classifiers. 10 2 Still uncertain (b) Unrealizable Case Binary Classification Linear Classifier Must be RED! (a) Realizable Case Now it must be RED! Blue Best Classifier? Best Classifier!
11
Copyright 2010 by CEBT Size of region of uncertainty In my opinion, the paper is wrong at these points. Upper bound of error Lower bound of error A Algorithm (Cont’d) General-purpose learning strategy (in the agnostic setting) Do NOT trust answers from the oracle completely. Compare error bounds between classifiers. 11 2
12
Copyright 2010 by CEBT A Algorithm (Cont’d) 12 2 Binary Classification Threshold Error Rates of Classifiers Sampling and Labeling Error Rate Domain Upper Bound Lower Bound min upper bound Remove classifiers such that
13
Copyright 2010 by CEBT A Algorithm (Cont’d) Correctness It returns an ε -optimal classifier with high probability. Fallback Analysis It is never much worse than a standard batch, bound-based algorithm in terms of label complexity. Improvement in label complexity It achieve great improvement compared to passive learning in some special cases (thresholds, and homogeneous linear sepa-rators under a uniform distribution). 13 2
14
Copyright 2010 by CEBT Conclusions A Algorithm First active learning algorithm that finds an ( ε -)optimal classifier in the unrealizable (or agnostic) case It achieves a (near-)exponential improvement in label complexity for several unrealizable settings. It never requires substantially more labeling requests than passive learning. 14 2
15
Copyright 2010 by CEBT Discussions This paper shows a theoretical approach of active learning, especially in the unrealizable (or agnostic) case. It does NOT ensure the improvement in label complexity for any kind of hypothesis class. The A Algorithm is intended to theoretically extend the power of active learning to the unrealizable case. How can we apply it for practical purposes? 15 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.