Computational Learning Theory

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
VC Dimension – definition and impossibility result
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
HW HW1: Let us know if you have any questions. ( the TAs) HW2:
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
Computational Learning Theory
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Analysis of greedy active learning Sanjoy Dasgupta UC San Diego.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Probably Approximately Correct Model (PAC)
true error less Any learner that outputs a hypothesis consistent with all training examples.
Evaluating Hypotheses
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Computational Learning Theory
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
Part I: Classification and Bayesian Learning
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
Learning from Observations Chapter 18 Through
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Machine Learning Chapter 5. Evaluating Hypotheses
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Evaluating Hypotheses
HW HW1: Let us know if you have any questions. ( the TAs)
Machine Learning Applications in Grid Computing
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory
CH. 2: Supervised Learning
ECE 5424: Introduction to Machine Learning
INTRODUCTION TO Machine Learning
CSCI B609: “Foundations of Data Science”
INF 5860 Machine learning for image classification
Evaluating Hypotheses
Computational Learning Theory
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
CSCI B609: “Foundations of Data Science”
CS344 : Introduction to Artificial Intelligence
Computational Learning Theory
Machine Learning: UNIT-3 CHAPTER-2
Supervised Learning Berlin Chen 2005 References:
Evaluating Hypothesis
Lecture 14 Learning Inductive inference
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

Computational Learning Theory In the Name of God Machine Learning Computational Learning Theory Mohammad Ali Keyvanrad Thanks to: M. Soleymani (Sharif University of Technology) Tom Mitchell (Carnegie Mellon University ) Fall 1392

Outline Computational Learning Theory PAC learning theorem VC dimension

Computational Learning Theory We want a theory that relates Number of training examples Complexity of hypothesis space Accuracy to which target function is approximated Probability that learner outputs a successful hypothesis

Learning scenarios Learner proposes instances as queries to teacher? learner proposes 𝒙, teacher provides 𝑐(𝒙) Teacher (who knows 𝑐(𝒙)) proposes training examples? teacher proposes sequence { 𝒙 1 ,𝑐 𝒙 1 ,…, ( 𝒙 𝑛 ,𝑐 𝒙 𝑛 )} instances drawn according to 𝑃(𝒙)

Sample Complexity How good is the classifier, really? How much data do I need to make it “good enough”?

Problem settings Set of all instances 𝑋 Set of hypotheses 𝐻 Set of possible target functions 𝐶={𝑐:𝑋→ 0,1 } Sequence of 𝑚 training instances 𝐷= 𝒙 𝑖 ,𝑐 𝒙 𝑖 𝑖=1 𝑚 𝒙 drawn at random from unknown distribution 𝑃(𝒙) Teacher provides noise-free label 𝑐(𝒙) for it Learner observes a set of training examples 𝐷 for target function 𝑐 and outputs a hypothesis ℎ∈𝐻 estimating 𝑐 Goal: with high probability ("probably"), the selected function will have low generalization error ("approximately correct")

True error of a hypothesis True error of ℎ: probability that it will misclassify an example drawn at random from 𝑃(𝒙)

Two notions of error

Overfitting Consider a hypothesis ℎ and its Error rate over training data: 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑎𝑖𝑛 (ℎ) True error rate over all data: 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑢𝑒 (ℎ) We say ℎ overfits the training data if 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑢𝑒 (ℎ)>𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑎𝑖𝑛 (ℎ) Amount of overfitting 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑢𝑒 ℎ −𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑎𝑖𝑛 (ℎ) Can we bound 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑢𝑒 ℎ in terms of 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑎𝑖𝑛 ℎ ?

Problem setting Classification 𝐷: 𝑚 i.i.d. data points that are labeled Finite number of possible hypothesis (e.g., decision trees of depth 𝑑 0 ) A learner finds a hypothesis ℎ that is consistent with training data 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑎𝑖𝑛 (ℎ)=0 What is the probability that the true error of ℎ will be more than 𝜖? 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑢𝑒 (ℎ)≥𝜖

How likely is a learner to pick a bad hypothesis? Bound on the probability that any consistent learner will output ℎ with 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑢𝑒 (ℎ)>𝜖 Theorem [Haussler, 1988]: For target concept c, ∀ 0 ≤𝜖 ≤1 , If 𝐻 is finite and 𝐷 contains 𝑚≥1 independent random samples

Proof

Proof (Cont’d)

PAC Bound Theorem [Haussler’88]: Consider finite hypothesis space 𝐻, training set 𝐷 with 𝑚 i.i.d. samples,0<𝜖<1: for any learned hypothesis ℎ∈𝐻 that is consistent on the training set 𝐷: If 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑎𝑖𝑛 (ℎ)=0 then with probability at least (1−𝛿): PAC: Probably Approximately Correct

PAC Bound Sample Complexity How many training examples suffice? Given 𝜖 and 𝛿, yields sample complexity: Given 𝑚 and 𝛿, yields error bound:

Example: Conjunction of up to 𝑑 Boolean literals

Agnostic learning

Hoeffding bounds: Agnostic learning

PAC bound: Agnostic learning

Limitation of the bounds

Shattering a set of instances

Vapnick-Chervonenkis (VC) dimension

PAC bound using VC

VC dimension: linear classifier in a 2-D space

VC dimension: linear classifier

Summary of PAC bounds