Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.

Slides:



Advertisements
Similar presentations
Semi-Supervised Learning Avrim Blum Carnegie Mellon University [USC CS Distinguished Lecture Series, 2008]
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
New Horizons in Machine Learning Avrim Blum CMU This is mostly a survey, but last part is joint work with Nina Balcan and Santosh Vempala [Workshop on.
Pattern Analysis Prof. Bennett Math Model of Learning and Discovery 2/14/05 Based on Chapter 1 of Shawe-Taylor and Cristianini.
Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.
Boosting Approach to ML
Semi-Supervised Learning and Learning via Similarity Functions: Two key settings for Data- Dependent Concept Spaces Avrim Blum [NIPS 2008 Workshop on Data.
A general agnostic active learning algorithm
Semi-Supervised Learning
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Active Learning of Binary Classifiers
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
New Theoretical Frameworks for Machine Learning
Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
A Discriminative Framework for Clustering via Similarity Functions Maria-Florina Balcan Carnegie Mellon University Joint with Avrim Blum and Santosh Vempala.
Machine Learning Rob Schapire Princeton Avrim Blum Carnegie Mellon Tommi Jaakkola MIT.
A Theory of Learning and Clustering via Similarity Functions Maria-Florina Balcan 09/17/2007 Joint work with Avrim Blum and Santosh Vempala Carnegie Mellon.
On Kernels, Margins, and Low- dimensional Mappings or Kernels versus features Nina Balcan CMU Avrim Blum CMU Santosh Vempala MIT.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,
Machine Learning Theory Maria-Florina Balcan Lecture 1, Jan. 12 th 2010.
Introduction to machine learning
Incorporating Unlabeled Data in the Learning Process
Crash Course on Machine Learning
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Universit at Dortmund, LS VIII
Kernels, Margins, and Low-dimensional Mappings [NIPS 2007 Workshop on TOPOLOGY LEARNING ] Maria-Florina Balcan, Avrim Blum, Santosh Vempala.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.
Empirical Research Methods in Computer Science Lecture 6 November 16, 2005 Noah Smith.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,
Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
NTU & MSRA Ming-Feng Tsai
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Learning from Labeled and Unlabeled Data Tom Mitchell Statistical Approaches to Learning and Discovery, and March 31, 2003.
Classification using Co-Training
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Learning with General Similarity Functions Maria-Florina Balcan.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Combining Labeled and Unlabeled Data with Co-Training
What is Pattern Recognition?
Semi-Supervised Learning
Computational Learning Theory
Computational Learning Theory
Presentation transcript:

Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer Science Department

Maria-Florina Balcan What is Machine Learning? Design of programs that adapt from experience, identify patterns in data. Used to: –recognize speech, faces, … –categorize documents, info retrieval,... Goals of ML theory: develop models, analyze algorithmic and statistical issues involved.

Maria-Florina Balcan Outline of the talk Brief Overview of Supervised Learning –PAC Model Semi-Supervised Learning –An Augmented PAC Style Model

Maria-Florina Balcan Usual Supervised Learning Problem Decide which messages are spam and which are important. Might represent each message by n features. (e.g., keywords, spelling, etc.). Take a sample S of data, labeled according to whether they were/weren't spam. Goal of algorithm is to use data seen so far to produce good prediction rule h (a "hypothesis") for future data.

Maria-Florina Balcan The Concept Learning Setting E.g., Given data, some reasonable rules might be: Predict SPAM if unknown AND (money OR pills) Predict SPAM if money + pills – known > 0... examplelabel

Maria-Florina Balcan Supervised Learning, Big Questions Algorithm Design. How to optimize? –How might we automatically generate rules that do well on observed data? Sample Complexity/Confidence Bound –Real goal is to do well on new data. –What kind of confidence do we have that rules that do well on sample will do well in the future? for a given learning alg, how much data do we need...

Maria-Florina Balcan Supervised Learning: Formalization (PAC) PAC model – standard model for learning from labeled data. Have sample S = {(x,l)} drawn from some distrib D over examples x 2 X, labeled by some target function c *. Alg does optimization over S to produce some hypothesis h 2 C (e.g., C = linear separators). Goal is for h to be close to c * over D. –err(h)=Pr x 2 D (h(x)  c * (x)) Allow failure with small probability  (to allow for chance that S is not representative).

Maria-Florina Balcan The Issue of Sample-Complexity We want to do well on D, but all we have is S. –Are we in trouble? –How big does S have to be so that low error on S implies low error on D? Luckily, sample-complexity bounds. Algorithm: Pick a concept that agrees with S. Sample Complexity Statement: –If |S| ¸ (1/  )[log|C| + log 1/  ], then with probability at least (1-  ), all h 2 C that agree with sample S have true error · .

Maria-Florina Balcan Outline of the talk Brief Overview of Supervised Learning –PAC Model Semi-Supervised Learning –An Augmented PAC Style Model

Maria-Florina Balcan Combining Labeled and Unlabeled Data Hot topic in recent years in Machine Learning. Many applications have lots of unlabeled data, but labeled data is rare or expensive: Web page, document classification OCR, Image classification Several methods have been developed to try to use unlabeled data to improve performance, e.g.: Transductive SVM Co-training Graph-based methods

Maria-Florina Balcan An Augmented PAC style Model for Semi-Supervised Learning Extends PAC naturally to the case of learning from both labeled and unlabeled data. Unlabeled data is useful if we have beliefs not only about the form of the target, but also about its relationship with the underlying distribution. –Different algorithms are based on different assumptions about how data should behave. –Question – how to capture many of the assumptions typically used?

Maria-Florina Balcan Example of “typical” assumption The separator goes through low density regions of the space/large margin. –assume we are looking for linear separator –belief: there should exist one with large separation + + _ _ Labeled data only + + _ _ + + _ _ Transductive SVM SVM

Maria-Florina Balcan Another Example Agreement between two parts : co-training. –examples contain two sufficient sets of features i.e. an example is x= h x 1, x 2 i –belief: the two parts of the example are consistent 9 c 1, c 2 such that c 1 (x 1 )=c 2 (x 2 )=c * (x) –for example, if we want to classify web pages:

Maria-Florina Balcan Another Example, cont Agreement between two parts : co-training. –examples contain two sufficient sets of features i.e. an example is x= h x 1, x 2 i –belief: the two parts of the example are consistent 9 c 1, c 2 such that c 1 (x 1 )=c 2 (x 2 )=c * (x) –for example, if we want to classify web pages: My AdvisorProf. Avrim BlumMy AdvisorProf. Avrim Blum x 2 - Text info x 1 - Link info x - Link info & Text info x = h x 1, x 2 i

Maria-Florina Balcan Co-Training Works by using unlabeled data to propagate learned information. Text info Link info X1X1 X2X2 My Advisor +

Maria-Florina Balcan Semi-Supervised Learning Formalization. Main Idea Augment the notion of a concept class C with a notion of compatibility  between a concept and the data distribution (  (h,D) 2 [0,1]). “Learn C” becomes “learn (C,  )” (i.e. learn class C under compatibility notion  ). Express relationships that one hopes the target function and underlying distribution will possess.

Maria-Florina Balcan Semi-Supervised Learning Formalization. Main Idea Augment the notion of a concept class C with a notion of compatibility  between a concept and the data distribution (  (h,D) 2 [0,1]). “Learn C” becomes “learn (C,  )” (i.e. learn class C under compatibility notion  ). Express relationships that one hopes the target function and underlying distribution will possess. Use unlabeled data & the belief that the target is compatible to reduce C down to just {the highly compatible functions in C}.

Maria-Florina Balcan Semi-Supervised Learning Formalization. Main Idea, cont Use unlabeled data & our belief to reduce size(C) down to size(highly compatible functions in C) in our sample complexity bounds. Need to analyze how much unlabeled data is needed to uniformly estimate compatibilities well. Require that the degree of compatibility be something that can be estimated from a finite sample.

Maria-Florina Balcan Margins, Compatibility Margins: belief is that there should exist a separator with margin .  ( h, D) =1-(the probability mass within distance  of h). can be can be estimated from a finite sample. + + _ _ Highly compatible

Maria-Florina Balcan Types of Results in Our Model As in the usual PAC model, can discuss algorithmic and sample complexity issues. Can analyze how much unlabeled data we need to see: –depends both on the complexity of C and the complexity of our notion of compatibility. Can analyze the ability of a finite unlabeled sample to reduce our dependence on labeled examples: –as a function of compatibility of the target function and various measures of the helpfulness of the distribution.

Maria-Florina Balcan Examples of Results in Our Model Algorithm: pick a compatible concept that agrees with the labeled sample. Sample Complexity Statement:

Maria-Florina Balcan Examples of Results in Our Model, cont. Algorithm: pick a compatible concept that agrees with the labeled sample. Sample Complexity Statement: Highly compatible + + _ _

Maria-Florina Balcan Summary Provided a PAC style model for semi-supervised learning. Captures many of the ways in which unlabeled data is typically used. Unified framework for analyzing when and why unlabeled data can help. Can get much better bounds in terms of labeled examples.

Maria-Florina Balcan