Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.

Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research

Maria-Florina Balcan Incorporating Unlabeled Data in the Learning Process Unlabeled data cheap & easy to obtain. Web page, document classification OCR, Image classification All the classification problems at Yahoo! Research. Labeled data much more expensive.

Maria-Florina Balcan Semi-Supervised Passive Learning Several SSL methods developed to use unlabeled data to improve performance, e.g.: Transductive SVM [Joachims ’98] Co-training [Blum & Mitchell ’98], Graph-based methods [Blum & Chawla’01] See Avrim’s talk at the “Open Problems” session. Unlabeled data - allows to focus on a priori reasonable classifiers.

Maria-Florina Balcan Active Learning The learner can choose specific examples to be labeled: - The learner works harder to use fewer labeled examples. P distribution over X £ Y; Setting Get a set of unlabeled examples from P X. Goal: find h with small error over P. hypothesis class C. Interactively request labels of any of these examples. This talk: linear separators. Minimize the number of label requests.

Maria-Florina Balcan Can Adaptive Querying Help? [CAL ’92, Dasgupta ’04] Exponential improvement in sample complexity. Active setting: O(log 1/  ) labels to find an  -accurate threshold. Learning to accuracy  requires 1/  labels. In general, number of queries needed depends on C and P. C = {linear separators in R 2 }: for some target hyp. no improvement can be achieved. h1h1 h2h2 h3h3 h0h0 C = {linear separators in R 1 }, realizable case.

Maria-Florina Balcan When Active Learning Helps In general, number of queries needed depends on C and P. C - homogeneous linear separators in R d, P X - uniform distribution over unit sphere. [Freund et al., ’97; Dasgupta, Kalai, Monteleoni ’05] low noise, O(d 2 log 1/  ) labels to find a hypothesis with error . A 2 algorithm [Balcan, Beygelzimer, Langford ’06] Realizable case Agnostic Case [Hanneke ’07] O(d log 1/  ) labels to find a hypothesis with error .

Maria-Florina Balcan An Overview of Our Results Analyze a class of margin based active learning algorithms for learning linear separators. C - homogeneous linear separators in R d, P X - uniform distrib. over unit sphere get exponential improvement in the realizable case. Naturally extend the analysis to the bounded noise setting. Dimension independent bounds when we have a good margin distribution.

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k=2, …, s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying |w k-1 ¢ x| ·  k-1 ; label them and add them to W(k). end iterate Algorithm

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k = 2, …, s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying |w k-1 T ¢ x| ·  k-1 label them and add them to W(k). w1w1 11 w2w2 22 w3w3

Maria-Florina Balcan Margin Based Active-Learning, Realizable Case u v  (u,v) If and Theorem Ifthen after w s has error · . P X is uniform over S d. and iterations Fact 1 Fact 2 v u v 

Maria-Florina Balcan Induction: all w consistent with W(k) have error · 1/2 k ; so, w k has error · 1/2 k. Margin Based Active-Learning, Realizable Case Proof Idea w k-1 w  k-1 w*w* For · 1/2 k+1 iterate k=2, …,s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying |w k-1 T ¢ x| ·  k-1 label them and add them to W(k).

Maria-Florina Balcan Proof Idea Under the uniform distr. for · 1/2 k+1 w k-1 w  k-1 w*w*

Maria-Florina Balcan Proof Idea Enough to ensure Can do with only · 1/2 k+1 labels. w k-1 w  k-1 w*w* Under the uniform distr. for

Maria-Florina Balcan Realizable Case, Suboptimal Alternative w k-1 w w*w* Could imagine: zero Need and so labels to find a hyp. with error . need Similar to [CAL’92, BBL’06, H’07] Suboptimal

Maria-Florina Balcan Margin Based Active-Learning, Non-realizable Case Guarantee Assume P X is uniform over S d. Assume that |P(Y=1|x)-P(Y=-1|x)| ¸  for all x. Then The previous algorithm and proof extend naturally, and get again an exponential improvement. Assume w * is the Bayes classifier.

Maria-Florina Balcan Summary Analyze a class of margin based active learning algorithms for learning linear separators. Open Problems Characterize the right sample complexity terms for the Active Learning setting. Analyze a wider class of distributions, e.g. log-concave.

Maria-Florina Balcan

Also, special thanks to: Alina Beygelzimer, Sanjoy Dasgupta, and John Langford for useful discussions.

Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.

Similar presentations

Presentation on theme: "Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.

Similar presentations

Presentation on theme: "Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research."— Presentation transcript:

Similar presentations

About project

Feedback