Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Slides:



Advertisements
Similar presentations
Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay.
Advertisements

Online learning, minimizing regret, and combining expert advice
Learning with Online Constraints: Shifting Concepts and Active Learning Claire Monteleoni MIT CSAIL PhD Thesis Defense August 11th, 2006 Supervisor: Tommi.
Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.
Boosting Approach to ML
1 Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: UAI-2012 Catalina Island,
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
Machine Learning Week 2 Lecture 1.
A general agnostic active learning algorithm
Semi-Supervised Learning
Practical Online Active Learning for Classification Claire Monteleoni (MIT / UCSD) Matti Kääriäinen (University of Helsinki)
Longin Jan Latecki Temple University
Review of : Yoav Freund, and Robert E
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Active Learning. 2 Learning from Examples  Passive learning A random set of labeled examples A random set of labeled examples.
The Rate of Convergence of AdaBoost Indraneel Mukherjee Cynthia Rudin Rob Schapire.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Active Learning of Binary Classifiers
Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.
Analysis of greedy active learning Sanjoy Dasgupta UC San Diego.
MCS 2005 Round Table In the context of MCS, what do you believe to be true, even if you cannot yet prove it?
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
Machine Learning Theory Maria-Florina Balcan Lecture 1, Jan. 12 th 2010.
Online Learning Algorithms
Incorporating Unlabeled Data in the Learning Process
Ensembles of Classifiers Evgueni Smirnov
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
Machine Learning CSE 681 CH2 - Supervised Learning.
Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.
Coarse sample complexity bounds for active learning Sanjoy Dasgupta UC San Diego.
Benk Erika Kelemen Zsolt
Kernels, Margins, and Low-dimensional Mappings [NIPS 2007 Workshop on TOPOLOGY LEARNING ] Maria-Florina Balcan, Avrim Blum, Santosh Vempala.
Modern Topics in Multivariate Methods for Data Analysis.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.
Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,
Ensemble Methods in Machine Learning
Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.
05/04/07 Using Active Learning to Label Large Corpora Ted Markowitz Pace University CSIS DPS & IBM T. J. Watson Research Ctr.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Kernels and Margins Maria Florina Balcan 10/13/2011.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Learning with General Similarity Functions Maria-Florina Balcan.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Zhipeng (Patrick) Luo December 6th, 2016
The Boosting Approach to Machine Learning
The Boosting Approach to Machine Learning
Importance Weighted Active Learning
Privacy-Preserving Classification
Machine Learning Today: Reading: Maria Florina Balcan
A general agnostic active learning algorithm
Semi-Supervised Learning
Computational Learning Theory
Computational Learning Theory
Maria Florina Balcan 03/04/2010
Presentation transcript:

Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th

Active Learning A Label for that Example Request for the Label of an Example A Label for that Example Request for the Label of an Example Data Source Unlabeled examples... Algorithm outputs a classifier Learning Algorithm Expert / Oracle The learner can choose specific examples to be labeled. He works harder, to use fewer labeled examples.

Maria-Florina Balcan Choose the label requests carefully, to get informative labels. What Makes a Good Algorithm? Guaranteed to output a relatively good classifier for most learning problems. Doesn’t make too many label requests.

Maria-Florina Balcan Can It Really Do Better Than Passive? YES! (sometimes) We often need far fewer labels for active learning than for passive. This is predicted by theory and has been observed in practice.

Maria-Florina Balcan When Does it Work? And Why? The algorithms currently used in practice are not well understood theoretically. We don’t know if/when they output a good classifier, nor can we say how many labels they will need. So we seek algorithms that we can understand and state formal guarantees for. Rest of this talk: surveys recent theoretical results.

Maria-Florina Balcan Active Learning The learner has the ability to choose specific examples to be labeled: - The learner works harder, in order to use fewer labeled examples. We get to see unlabeled data first, and there is a charge for every label. How many labels can we save by querying adaptively?

Can adaptive querying help? [CAL92, Dasgupta04] Threshold fns on the real line: w +- Exponential improvement. h w (x) = 1(x ¸ w), C = {h w : w 2 R} Sample with 1/  unlabeled examples; do binary search. - Binary search – need just O(log 1/  ) labels. Active: only O(log 1/  ) labels. Passive supervised:  (1/  ) labels to find an  -accurate threshold. + - Active Algorithm Other interesting results as well.

Maria-Florina Balcan Active Learning might not help [Dasgupta04] C = {linear separators in R 1 }: active learning reduces sample complexity substantially. In this case: learning to accuracy  requires 1/  labels… In general, number of queries needed depends on C and also on D. C = {linear separators in R 2 }: there are some target hyp. for which no improvement can be achieved! - no matter how benign the input distr. h1h1 h2h2 h3h3 h0h0

Maria-Florina Balcan Examples where Active Learning helps C = {linear separators in R 1 }: active learning reduces sample complexity substantially no matter what is the input distribution. In general, number of queries needed depends on C and also on D. C - homogeneous linear separators in R d, D - uniform distribution over unit sphere: need only d log 1/  labels to find a hypothesis with error rate < . Freund et al., ’97. Dasgupta, Kalai, Monteleoni, COLT 2005 Balcan-Broder-Zhang, COLT 07

Maria-Florina Balcan Region of uncertainty [CAL92] current version space Example: data lies on circle in R 2 and hypotheses are homogeneous linear separators. region of uncertainty in data space Current version space: part of C consistent with labels so far. “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) + +

Maria-Florina Balcan Region of uncertainty [CAL92] Pick a few points at random from the current region of uncertainty and query their labels. current version space region of uncertainy Algorithm:

Maria-Florina Balcan Region of uncertainty [CAL92] Current version space: part of C consistent with labels so far. “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) current version space region of uncertainty in data space + +

Maria-Florina Balcan Region of uncertainty [CAL92] Current version space: part of C consistent with labels so far. “Region of uncertainty” = part of data space about which there is still some uncertainty (i.e. disagreement within version space) new version space New region of uncertainty in data space + +

Maria-Florina Balcan Region of uncertainty [CAL92], Guarantees Algorithm: Pick a few points at random from the current region of uncertainty and query their labels. C - homogeneous linear separators in R d, D -uniform distribution over unit sphere. low noise, need only d 2 log 1/  labels to find a hypothesis with error rate < . realizable case, d 3/2 log 1/  labels. supervised -- d/  labels. [Balcan, Beygelzimer, Langford, ICML’06] Analyze a version of this alg. which is robust to noise. C- linear separators on the line, low noise, exponential improvement.

Maria-Florina Balcan Margin Based Active-Learning Algorithm Use O(d) examples to find w 1 of error 1/8. iterate k=2, …, log(1/  ) rejection sample m k samples x from D satisfying |w k-1 T ¢ x| ·  k ; label them; find w k 2 B(w k-1, 1/2 k ) consistent with all these examples. end iterate [Balcan-Broder-Zhang, COLT 07] wkwk w k+1 γkγk w*w*

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case u v  (u,v) Theorem Ifthen after w s has error · . P X is uniform over S d. and iterations Fact 1 Fact 2 v 

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case u v  (u,v) If and Theorem Ifthen after w s has error · . P X is uniform over S d. and iterations Fact 1 Fact 3 v u v 

Maria-Florina Balcan Margin Based Active-Learning [BBZ’07] region of uncertainty in data space WkWk

Maria-Florina Balcan BBZ’07, Proof Idea iterate k=2, …, log(1/  ) Rejection sample m k samples x from D satisfying |w k-1 T ¢ x| ·  k ; ask for labels and find w k 2 B(w k-1, 1/2 k ) consistent with all these examples. end iterate Assume w k has error · . We are done if 9  k s.t. w k+1 has error ·  /2 and only need O( d log( 1/  )) labels in round k. wkwk w k+1 γkγk w*w*

Maria-Florina Balcan BBZ’07, Proof Idea iterate k=2, …, log(1/  ) Rejection sample m k samples x from D satisfying |w k-1 T ¢ x| ·  k ; ask for labels and find w k 2 B(w k-1, 1/2 k ) consistent with all these examples. end iterate Assume w k has error · . We are done if 9  k s.t. w k+1 has error ·  /2 and only need O( d log( 1/  )) labels in round k. wkwk w k+1 γkγk w*w*

Maria-Florina Balcan BBZ’07, Proof Idea iterate k=2, …, log(1/  ) Rejection sample m k samples x from D satisfying |w k-1 T ¢ x| ·  k ; ask for labels and find w k 2 B(w k-1, 1/2 k ) consistent with all these examples. end iterate Assume w k has error · . We are done if 9  k s.t. w k+1 has error ·  /2 and only need O( d log( 1/  )) labels in round k. Key Point Under the uniform distr. assumption for we have ·  /4 wkwk w k+1 γkγk w*w*

Maria-Florina Balcan BBZ’07, Proof Idea Key Point Under the uniform distr. assumption for we have ·  /4 So, it’s enough to ensure that We can do so by only using O(d log( 1/  )) labels in round k. wkwk w k+1 γkγk w*w* Key Point