1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

Cenni di Machine Learning: Decision Tree Learning Fabio Massimo Zanzotto.
Learning from Observations Chapter 18 Section 1 – 3.
ICS 178 Intro Machine Learning
1 The Restaurant Domain Will they wait, or not?. 2 Decision Trees Patrons? NoYesWaitEst? No Alternate?Hungry?Yes Reservation?Fri/Sat?Alternate?Yes NoYesBar?Yes.
Learning Department of Computer Science & Engineering Indian Institute of Technology Kharagpur.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Cooperating Intelligent Systems
18 LEARNING FROM OBSERVATIONS
Learning From Observations
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Probably Approximately Correct Model (PAC)
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Machine Learning Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
ICS 273A Intro Machine Learning
Decision List LING 572 Fei Xia 1/18/06. Outline Basic concepts and properties Case study.
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning decision trees
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
ICS 273A Intro Machine Learning
CS 4700: Foundations of Artificial Intelligence
1 Inductive Learning of Rules MushroomEdible? SporesSpots Color YN BrownN YY GreyY NY BlackY NN BrownN YN WhiteN YY BrownY YN Brown NN Red Don’t try this.
Induction of Decision Trees (IDT) CSE 335/435 Resources: – –
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
Learning from observations
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Decision List LING 572 Fei Xia 1/12/06. Outline Basic concepts and properties Case study.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Computational Learning Theory Part 1: Preliminaries 1.
Chapter 18 Section 1 – 3 Learning from Observations.
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Learning From Observations Inductive Learning Decision Trees Ensembles.
CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.
Anifuddin Azis LEARNING. Why is learning important? So far we have assumed we know how the world works Rules of queens puzzle Rules of chess Knowledge.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
Learning from Observations
Learning from Observations
Learning from Observations
Introduce to machine learning
Computational Learning Theory
Presented By S.Yamuna AP/CSE
Computational Learning Theory
CSE 573 Introduction to Artificial Intelligence Decision Trees
Computational Learning Theory
Learning from Observations
Lecture 14 Learning Inductive inference
Learning from Observations
Decision trees One possible representation for hypotheses
Inductive Learning (2/2) Version Space and PAC Learning
Machine Learning: Decision Tree Learning
Presentation transcript:

1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a concept C. zWe can define Error(f) to be: zwhere D are the set of all examples on which f and C disagree.

2 PAC Learning zWe’re not perfect (in more than one way). So why should our programs be perfect? zWhat we want is:  Error(f) <  for some chosen  zBut sometimes, we’re completely clueless: (hopefully, with low probability). What we really want is:  Prob ( Error(f)  < .  As the number of examples grows,  and  should decrease. zWe call this Probably approximately correct.

3 Definition of PAC Learnability zLet C be a class of concepts. zWe say that C is PAC learnable by a hypothesis space H if: ythere is a polynomial-time algorithm A, ya polynomial function p,  such that for every C in C, every probability distribution Pr, and  and ,  if A is given at least p(1/ , 1/  ) examples,  then A returns with probability 1-  a hypothesis whose error is less than . zk-DNF, and k-CNF are PAC learnable.

4 Version Spaces: A Learning Alg. zKey idea: yMaintain most specific and most general hypotheses at every point. Update them as examples come in. zWe describe objects in the space by attributes: yfaculty, staff, student y20’s, 30’s, 40’s. ymale, female zConcepts: boolean combination of attribute- values: yfaculty, 30’s, male, yfemale, 20’s.

5 Generalization and Specializ... zA concept C1 is more general than C2 if it describes a superset of the objects: yC1={20’s, faculty} is more general than C2={20’s, faculty, female}. yC2 is a specialization of C1. zImmediate specializations (generalizations). zThe version space algorithm maintains the most specific and most general boundaries at every point of the learning.

6 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30

7 With a Positive Example zEliminate all concepts in the general boundary that are not consistent with the example. zMinimally generalize all concepts in the specific boundary until they cover the example. zEliminate from the specific boundary if: ynot a specialization of some concept in the general boundary, or yis a generalization of some other concept in the specific boundary.

8 With a Negative Example zEliminate all concepts in the specific boundary that are consistent with the example. zMinimally specialize all concepts in the general boundary until they don’t cover the example. zEliminate from the general boundary if: ynot a generalization of some concept in the specific boundary, or yis a specialization of some other concept in the general boundary.

9 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30 +

10 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30 + -

11 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,

12 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,

13 The Restaurant Domain Will they wait, or not?

14 Decision Trees Patrons? NoYesWaitEst? No Alternate?Hungry?Yes Reservation?Fri/Sat?Alternate?Yes NoYesBar?Yes No Raining?Yes No none some full > no yes no yes no yesnoyes noyes noyesno yes

15 Inducing Decision Trees zStart at the root with all examples. zIf there are both positive and negative examples, choose an attribute to split them. zIf all remaining examples are positive (or negative), label with Yes (or No). zIf no example exists, determine label according to majority in parent. zIf no attributes left, but you still have both positive and negative examples, you have a problem...

16 Inducing decision trees Patrons? + - X7, X11 none some full + X1, X3, X4, X6, X8, X12 - X2, X5, X7, X9, X10, X11 +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10 Type? + X1 - X5 French Italian Thai +X6 - X10 +X3, X12 - X7, X9 + X4,X8 - X2, X11 Burger

17 Continuing Induction Patrons? + - X7, X11 none some full + X1, X3, X4, X6, X8, X12 - X2, X5, X7, X9, X10, X11 +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10 NoYes Hungry? + X4, X12 - X2, X X5, X9

18 Final Decision Tree Patrons? NoYesHungry? Type? Fri/Sat? No Yes No none some full >60 NoYes French Italian noyes Thai burger

19 Decision Trees: summary zFinding optimal decision tree is computationally intractable. zWe use heuristics: yChoosing the right attribute is the key. Choice based on information content that the attribute provides. zRepresent DNF boolean formulas. zWork well in practice. zWhat do do with noise? Continuous attributes? Attributes with large domains?