CS344 : Introduction to Artificial Intelligence

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
VC Dimension – definition and impossibility result
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Probably Approximately Correct Model (PAC)
Vapnik-Chervonenkis Dimension
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38: PAC Learning, VC dimension; Self Organization.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36-37: Foundation of Machine Learning.
CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28- PAC and Reinforcement Learning.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Prof. Pushpak Bhattacharyya, IIT Bombay CS 621 Artificial Intelligence Lecture 28 – 22/10/05 Prof. Pushpak Bhattacharyya Generating Function,
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
© Jude Shavlik 2006 David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #28, Slide #1 Theoretical Approaches to Machine Learning Early work (eg.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Computational Learning Theory
CS 9633 Machine Learning Concept Learning
Computational Learning Theory
Computational Learning Theory
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
CH. 2: Supervised Learning
Vapnik–Chervonenkis Dimension
INTRODUCTION TO Machine Learning
Computational Learning Theory
Computational Learning Theory
The probably approximately correct (PAC) learning model
CS344 : Introduction to Artificial Intelligence
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
CSCI B609: “Foundations of Data Science”
Computational Learning Theory
CS 621 Artificial Intelligence Lecture /10/05 Prof
Machine Learning: UNIT-3 CHAPTER-2
CS621: Artificial Intelligence Lecture 12: Counting no
Supervised Learning Berlin Chen 2005 References:
CS623: Introduction to Computing with Neural Nets (lecture-3)
Supervised machine learning: creating a model
Inductive Learning (2/2) Version Space and PAC Learning
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 27- Theoretical Aspect of Learning

Computational Complexity Relation between Computational Complexity & Learning IIT Bombay

Testing (Generalization) Learning Training (Loading) Testing (Generalization) IIT Bombay

Internalization Hypothesis Production Training Internalization Hypothesis Production IIT Bombay

Hypothesis Production Inductive Bias In what form is the hypothesis produced? IIT Bombay

U Universe C h C h = Error region + P(C h ) <= Є + accuracy parameter Prob. distribution

<x, +> : Positive example. <x, -> : Negative example. P(X) = Prob that x is generated by the teacher – the “oracle” and is labeled <x, +> : Positive example. <x, -> : Negative example.

Learning Means the following Should happen: Pr(P(c h) <= Є) >= 1- δ PAC model of learning correct. + Probably Approximately Correct

Example Universe 2- Dimensional Plane Axis parallel Inductive Bias - A + - + - + - C - D IIT Bombay

Key insights from 40 years of machine Learning Research: 1) What is it that is being learnt , and how the hypothesis should be produced ? This is a “MUST”. This is called Inductive Bias . “Learning in the Vacuum” is not possible. A learner already has crucial given pieces of knowledge at its disposal.

y A B - + - - - - - + - - + - - - C - D x IIT Bombay

Algo: 1. Ignore –ve example. 2. Find the closest fitting axis parallel rectangle for the data.

Case 1: If P([]ABCD) < Є than the Algo is PAC. Pr(P(c h) <= Є ) >= 1- δ y + c C h + A B - - - - - - + + - - + h - - - C - D x Case 1: If P([]ABCD) < Є than the Algo is PAC.

P(Top) = P(Bottom) = P(Right) = P(Left) = Є /4 Case 2: Case 2 p([]ABCD) > Є y A B Top - - - - - - - Right - Left - - - C - D x Bottom P(Top) = P(Bottom) = P(Right) = P(Left) = Є /4

Probability that a point comes from top = Є/4 Let # of examples = m. Probability that a point comes from top = Є/4 Probability that none of the m example come from top = (1- Є/4)m IIT Bombay

Probability that none of m examples come from one of top/bottom/left/right = 4(1 - Є/4)m Probability that at least one example will come from the 4 regions = 1- 4(1 - Є/4)m

This fact must have probability greater than or equal to 1- δ or 4(1 - Є/4 )m < δ

y A B + + + + C D x

(1 - Є/4)m < e(-Єm/4) We must have 4 e(-Єm/4) < δ Or m > (4/Є) ln(4/δ)

Which is nearly equal to 200 Lets say we want 10% error with 90% confidence M > ((4/0.1) ln (4/0.1)) Which is nearly equal to 200

Criticism against PAC learning The model produces too many –ve results. The Constrain of arbitrary probability distribution is too restrictive.

In spite of –ve results, so much learning takes place around us.

VC-dimension Gives a necessary and sufficient condition for PAC learnability.

Def:- Let C be a concept class, i.e., it has members c1,c2,c3,…… as concepts in it. C C1 C3 C2

Let S be a subset of U (universe). Now if all the subsets of S can be produced by intersecting with Cis, then we say C shatters S.

The highest cardinality set S that can be shattered gives the VC-dimension of C. VC-dim(C)= |S| VC-dim: Vapnik-Cherronenkis dimension.

y 2 – Dim surface C = { half planes} x IIT Bombay

y S1= { a } {a}, Ø a x |s| = 1 can be shattered IIT Bombay

y S2= { a,b } b {a,b}, {a}, a {b}, Ø x |s| = 2 can be shattered IIT Bombay

y S3= { a,b,c } b a c x |s| = 3 can be shattered IIT Bombay

IIT Bombay

|s| = 4 cannot be shattered y S4= { a,b,c,d } A B C D x |s| = 4 cannot be shattered IIT Bombay

Fundamental Theorem of PAC learning (Ehrenfeuct et. al, 1989) A Concept Class C is learnable for all probability distributions and all concepts in C if and only if the VC dimension of C is finite If the VC dimension of C is d, then…(next page) IIT Bombay

Fundamental theorem (contd) (a) for 0<ε<1 and the sample size at least max[(4/ε)log(2/δ), (8d/ε)log(13/ε)] any consistent function A:ScC is a learning function for C (b) for 0<ε<1/2 and sample size less than max[((1-ε)/ ε)ln(1/ δ), d(1-2(ε(1- δ)+ δ))] No function A:ScH, for any hypothesis space is a learning function for C. IIT Bombay

Book Computational Learning Theory, M. H. G. Anthony, N. Biggs, Cambridge Tracts in Theoretical Computer Science, 1997. Paper’s 1. A theory of the learnable, Valiant, LG (1984), Communications of the ACM 27(11):1134 -1142. 2. Learnability and the VC-dimension, A Blumer, A Ehrenfeucht, D Haussler, M Warmuth - Journal of the ACM, 1989.