Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.

Slides:



Advertisements
Similar presentations
New Horizons in Machine Learning Avrim Blum CMU This is mostly a survey, but last part is joint work with Nina Balcan and Santosh Vempala [Workshop on.
Advertisements

Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Learning with Online Constraints: Shifting Concepts and Active Learning Claire Monteleoni MIT CSAIL PhD Thesis Defense August 11th, 2006 Supervisor: Tommi.
Boosting Approach to ML
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
A general agnostic active learning algorithm
Semi-Supervised Learning
Practical Online Active Learning for Classification Claire Monteleoni (MIT / UCSD) Matti Kääriäinen (University of Helsinki)
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo.
Active Learning. 2 Learning from Examples  Passive learning A random set of labeled examples A random set of labeled examples.
Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Active Learning of Binary Classifiers
Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira Carnegie Mellon.
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.
Analysis of greedy active learning Sanjoy Dasgupta UC San Diego.
Semi Supervised Learning Qiang Yang –Adapted from… Thanks –Zhi-Hua Zhou – ople/zhouzh/ –LAMDA.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
New Theoretical Frameworks for Machine Learning
Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
A Discriminative Framework for Clustering via Similarity Functions Maria-Florina Balcan Carnegie Mellon University Joint with Avrim Blum and Santosh Vempala.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
A Theory of Learning and Clustering via Similarity Functions Maria-Florina Balcan 09/17/2007 Joint work with Avrim Blum and Santosh Vempala Carnegie Mellon.
Machine Learning Theory Maria-Florina Balcan Lecture 1, Jan. 12 th 2010.
Incorporating Unlabeled Data in the Learning Process
Ensembles of Classifiers Evgueni Smirnov
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.
Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.
Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.
Coarse sample complexity bounds for active learning Sanjoy Dasgupta UC San Diego.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Universit at Dortmund, LS VIII
Benk Erika Kelemen Zsolt
Kernels, Margins, and Low-dimensional Mappings [NIPS 2007 Workshop on TOPOLOGY LEARNING ] Maria-Florina Balcan, Avrim Blum, Santosh Vempala.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.
Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,
Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.
Distribution Specific Learnability – an Open Problem in Statistical Learning Theory M. Hassan Zokaei-Ashtiani December 2013.
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Learning with General Similarity Functions Maria-Florina Balcan.
Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Correlation Clustering
Combining Labeled and Unlabeled Data with Co-Training
The Boosting Approach to Machine Learning
The Boosting Approach to Machine Learning
Importance Weighted Active Learning
Machine Learning Today: Reading: Maria Florina Balcan
A general agnostic active learning algorithm
Semi-Supervised Learning
Computational Learning Theory
Computational Learning Theory
Presentation transcript:

Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research

Maria-Florina Balcan Incorporating Unlabeled Data in the Learning Process Unlabeled data cheap & easy to obtain. Web page, document classification OCR, Image classification All the classification problems at Yahoo! Research. Labeled data much more expensive.

Maria-Florina Balcan Semi-Supervised Passive Learning Several SSL methods developed to use unlabeled data to improve performance, e.g.: Transductive SVM [Joachims ’98] Co-training [Blum & Mitchell ’98], Graph-based methods [Blum & Chawla’01] See Avrim’s talk at the “Open Problems” session. Unlabeled data - allows to focus on a priori reasonable classifiers.

Maria-Florina Balcan Active Learning The learner can choose specific examples to be labeled: - The learner works harder to use fewer labeled examples. P distribution over X £ Y; Setting Get a set of unlabeled examples from P X. Goal: find h with small error over P. hypothesis class C. Interactively request labels of any of these examples. This talk: linear separators. Minimize the number of label requests.

Maria-Florina Balcan Can Adaptive Querying Help? [CAL ’92, Dasgupta ’04] Exponential improvement in sample complexity. Active setting: O(log 1/  ) labels to find an  -accurate threshold. Learning to accuracy  requires 1/  labels. In general, number of queries needed depends on C and P. C = {linear separators in R 2 }: for some target hyp. no improvement can be achieved. h1h1 h2h2 h3h3 h0h0 C = {linear separators in R 1 }, realizable case.

Maria-Florina Balcan When Active Learning Helps In general, number of queries needed depends on C and P. C - homogeneous linear separators in R d, P X - uniform distribution over unit sphere. [Freund et al., ’97; Dasgupta, Kalai, Monteleoni ’05] low noise, O(d 2 log 1/  ) labels to find a hypothesis with error . A 2 algorithm [Balcan, Beygelzimer, Langford ’06] Realizable case Agnostic Case [Hanneke ’07] O(d log 1/  ) labels to find a hypothesis with error .

Maria-Florina Balcan An Overview of Our Results Analyze a class of margin based active learning algorithms for learning linear separators. C - homogeneous linear separators in R d, P X - uniform distrib. over unit sphere get exponential improvement in the realizable case. Naturally extend the analysis to the bounded noise setting. Dimension independent bounds when we have a good margin distribution.

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k=2, …, s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying |w k-1 ¢ x| ·  k-1 ; label them and add them to W(k). end iterate Algorithm

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k = 2, …, s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying |w k-1 T ¢ x| ·  k-1 label them and add them to W(k). w1w1 11 w2w2 22 w3w3

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case u v  (u,v) If and Theorem Ifthen after w s has error · . P X is uniform over S d. and iterations Fact 1 Fact 2 v u v 

Maria-Florina Balcan Induction: all w consistent with W(k) have error · 1/2 k ; so, w k has error · 1/2 k. Margin Based Active-Learning, Realizable Case Proof Idea w k-1 w  k-1 w*w* For · 1/2 k+1 iterate k=2, …,s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying |w k-1 T ¢ x| ·  k-1 label them and add them to W(k).

Maria-Florina Balcan Proof Idea Under the uniform distr. for · 1/2 k+1 w k-1 w  k-1 w*w*

Maria-Florina Balcan Proof Idea Enough to ensure Can do with only · 1/2 k+1 labels. w k-1 w  k-1 w*w* Under the uniform distr. for

Maria-Florina Balcan Realizable Case, Suboptimal Alternative w k-1 w w*w* Could imagine: zero Need and so labels to find a hyp. with error . need Similar to [CAL’92, BBL’06, H’07] Suboptimal

Maria-Florina Balcan Margin Based Active-Learning, Non-realizable Case Guarantee Assume P X is uniform over S d. Assume that |P(Y=1|x)-P(Y=-1|x)| ¸  for all x. Then The previous algorithm and proof extend naturally, and get again an exponential improvement. Assume w * is the Bayes classifier.

Maria-Florina Balcan Margin Based Active-Learning, Non-realizable Case Guarantee Assume P X is uniform over S d. Assume that |P(Y=1|x)-P(Y=-1|x)| ¸  for all x. Then The previous algorithm and proof extend naturally, and get again an exponential improvement. Assume w * is the Bayes classifier.

Maria-Florina Balcan Summary Analyze a class of margin based active learning algorithms for learning linear separators. Open Problems Characterize the right sample complexity terms for the Active Learning setting. Analyze a wider class of distributions, e.g. log-concave.

Maria-Florina Balcan

Also, special thanks to: Alina Beygelzimer, Sanjoy Dasgupta, and John Langford for useful discussions.