CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
VC Dimension – definition and impossibility result
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
CS621: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture–3: Some proofs in Fuzzy Sets and Fuzzy Logic 27 th July.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Probably Approximately Correct Model (PAC)
Vapnik-Chervonenkis Dimension
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds.
CS 4700: Foundations of Artificial Intelligence
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38: PAC Learning, VC dimension; Self Organization.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Theoretical Approaches to Machine Learning Early work (eg. Gold) ignored efficiency Only considers computabilityOnly considers computability “Learning.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
Machine Learning CSE 681 CH2 - Supervised Learning.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28- PAC and Reinforcement Learning.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman Machine Learning: The Theory of Learning R&N 18.5.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Prof. Pushpak Bhattacharyya, IIT Bombay CS 621 Artificial Intelligence Lecture 28 – 22/10/05 Prof. Pushpak Bhattacharyya Generating Function,
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 3 - Search.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CS623: Introduction to Computing with Neural Nets (lecture-7) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 23- Forward probability and Robot Plan; start of plan.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
© Jude Shavlik 2006 David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #28, Slide #1 Theoretical Approaches to Machine Learning Early work (eg.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS 9633 Machine Learning Concept Learning
Computational Learning Theory
Computational Learning Theory
INTRODUCTION TO Machine Learning
Computational Learning Theory
Computational Learning Theory
CS344 : Introduction to Artificial Intelligence
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
CSCI B609: “Foundations of Data Science”
CS344 : Introduction to Artificial Intelligence
Machine Learning: UNIT-3 CHAPTER-2
Supervised Learning Berlin Chen 2005 References:
Supervised machine learning: creating a model
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning

IIT Bombay2 Attempt at formalizing Machine Learning (Landmark paper by L.G.Valiant, 1984, A Theory of Learnable, CACM Journal)

IIT Bombay3 Learning Training (Loading) Testing (Generalization)

IIT Bombay4 Training InternalizationHypothesis Production

IIT Bombay5 Hypothesis Production Inductive Bias In what form is the hypothesis produced?

Ch U Universe C h = Error region P(C h ) <= Є + + accuracy parameter Prob. distribution

P(X) = Prob that x is generated by the teacher – the “oracle” and is labeled : Positive example. : Negative example.

Learning Means the following Should happen: Pr(P(c h) = 1- δ PAC model of learning correct. + Probably Approximately Correct

IIT Bombay9 An Example AB D C Universe: 2- Dimensional Plane

Key insights from 40 years of machine Learning Research: 1) What is it that is being learnt, and how the hypothesis should be produced ? This is a “MUST”. This is called Inductive Bias. 2)“Learning in the Vacuum” is not possible. A learner already has crucial given pieces of knowledge at its disposal.

IIT Bombay11 AB D C x y

Algo: 1. Ignore –ve example. 2. Find the closest fitting axis parallel rectangle for the data.

Case 1: If P([]ABCD) < Є than the Algo is PAC. AB D C x y c h Pr(P(c h) = 1- δ C h + +

Case 2: AB D C x y p([]ABCD) > Є Bottom Right Left Top P(Top) = P(Bottom) = P(Right) = P(Left) = Є /4 Case 2

IIT Bombay15 Let # of examples = m. Probability that a point comes from top = Є/4 Probability that none of the m example come from top = (1- Є/4) m

Probability that none of m examples come from one of top/bottom/left/right = 4(1 - Є/4) m Probability that at least one example will come from the 4 regions = 1- 4(1 - Є/4) m

This fact must have probability greater than or equal to 1- δ 1-4 (1 - Є/4 ) m >1- δ or 4(1 - Є/4 ) m < δ

AB D C x y +

(1 - Є/4) m < e (-Єm/4) We must have 4 e (-Єm/4) < δ Or m > (4/Є) ln(4/δ)

Lets say we want 10% error with 90% confidence M > ((4/0.1) ln (4/0.1)) Which is nearly equal to 200

Criticism against PAC learning 1.The model produces too many –ve results. 2.The Constrain of arbitrary probability distribution is too restrictive.

In spite of –ve results, so much learning takes place around us.

VC-dimension Gives a necessary and sufficient condition for PAC learnability.

Def:- Let C be a concept class, i.e., it has members c1,c2,c3,…… as concepts in it. C1 C2 C3 C

Let S be a subset of U (universe). Now if all the subsets of S can be produced by intersecting with C i s, then we say C shatters S.

The highest cardinality set S that can be shattered gives the VC-dimension of C. VC-dim(C)= |S| VC-dim: Vapnik-Cherronenkis dimension.

IIT Bombay27 2 – Dim surface C = { half planes} x y

IIT Bombay28 a |s| = 1 can be shattered S 1 = { a } {a}, Ø y x

IIT Bombay29 a |s| = 2 can be shattered b S 2 = { a,b } {a,b}, {a}, {b}, Ø x y

IIT Bombay30 a |s| = 3 can be shattered b c x y S 3 = { a,b,c }

IIT Bombay31

IIT Bombay32 AB D C x y |s| = 4 cannot be shattered S 4 = { a,b,c,d }

IIT Bombay33 Fundamental Theorem of PAC learning (Ehrenfeuct et. al, 1989) A Concept Class C is learnable for all probability distributions and all concepts in C if and only if the VC dimension of C is finite If the VC dimension of C is d, then…(next page)

IIT Bombay34 Fundamental theorem (contd) (a) for 0<ε<1 and the sample size at least max[(4/ε)log(2/δ), (8d/ε)log(13/ε)] any consistent function A:S c  C is a learning function for C (b) for 0<ε<1/2 and sample size less than max[((1-ε)/ ε)ln(1/ δ), d(1-2(ε(1- δ)+ δ))] No function A:S c  H, for any hypothesis space is a learning function for C.

Book 1.Computational Learning Theory, M. H. G. Anthony, N. Biggs, Cambridge Tracts in Theoretical Computer Science, Paper’s 1. A theory of the learnable, Valiant, LG (1984), Communications of the ACM 27(11): Learnability and the VC-dimension, A Blumer, A Ehrenfeucht, D Haussler, M Warmuth - Journal of the ACM, 1989.