Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a.

Similar presentations


Presentation on theme: "1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a."— Presentation transcript:

1 1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a concept C. zWe can define Error(f) to be: zwhere D are the set of all examples on which f and C disagree.

2 2 PAC Learning zWe’re not perfect (in more than one way). So why should our programs be perfect? zWhat we want is:  Error(f) <  for some chosen  zBut sometimes, we’re completely clueless: (hopefully, with low probability). What we really want is:  Prob ( Error(f)  < .  As the number of examples grows,  and  should decrease. zWe call this Probably approximately correct.

3 3 Definition of PAC Learnability zLet C be a class of concepts. zWe say that C is PAC learnable by a hypothesis space H if: ythere is a polynomial-time algorithm A, ya polynomial function p,  such that for every C in C, every probability distribution Pr, and  and ,  if A is given at least p(1/ , 1/  ) examples,  then A returns with probability 1-  a hypothesis whose error is less than . zk-DNF, and k-CNF are PAC learnable.

4 4 Version Spaces: A Learning Alg. zKey idea: yMaintain most specific and most general hypotheses at every point. Update them as examples come in. zWe describe objects in the space by attributes: yfaculty, staff, student y20’s, 30’s, 40’s. ymale, female zConcepts: boolean combination of attribute- values: yfaculty, 30’s, male, yfemale, 20’s.

5 5 Generalization and Specializ... zA concept C1 is more general than C2 if it describes a superset of the objects: yC1={20’s, faculty} is more general than C2={20’s, faculty, female}. yC2 is a specialization of C1. zImmediate specializations (generalizations). zThe version space algorithm maintains the most specific and most general boundaries at every point of the learning.

6 6 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30

7 7 With a Positive Example zEliminate all concepts in the general boundary that are not consistent with the example. zMinimally generalize all concepts in the specific boundary until they cover the example. zEliminate from the specific boundary if: ynot a specialization of some concept in the general boundary, or yis a generalization of some other concept in the specific boundary.

8 8 With a Negative Example zEliminate all concepts in the specific boundary that are consistent with the example. zMinimally specialize all concepts in the general boundary until they don’t cover the example. zEliminate from the general boundary if: ynot a generalization of some concept in the specific boundary, or yis a specialization of some other concept in the general boundary.

9 9 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30 +

10 10 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30 + -

11 11 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30 + - +

12 12 Example T malefemale facultystudent 20’s30’s male, fac male,studfemale,facfemale,studfac,20’s fac, 30’s male,fac,20male,fac,30fem,fac,20male,stud,30 + - + -

13 13 The Restaurant Domain Will they wait, or not?

14 14 Decision Trees Patrons? NoYesWaitEst? No Alternate?Hungry?Yes Reservation?Fri/Sat?Alternate?Yes NoYesBar?Yes No Raining?Yes No none some full >60 30-6010-30 0-10 no yes no yes no yesnoyes noyes noyesno yes

15 15 Inducing Decision Trees zStart at the root with all examples. zIf there are both positive and negative examples, choose an attribute to split them. zIf all remaining examples are positive (or negative), label with Yes (or No). zIf no example exists, determine label according to majority in parent. zIf no attributes left, but you still have both positive and negative examples, you have a problem...

16 16 Inducing decision trees Patrons? + - X7, X11 none some full + X1, X3, X4, X6, X8, X12 - X2, X5, X7, X9, X10, X11 +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10 Type? + X1 - X5 French Italian Thai +X6 - X10 +X3, X12 - X7, X9 + X4,X8 - X2, X11 Burger

17 17 Continuing Induction Patrons? + - X7, X11 none some full + X1, X3, X4, X6, X8, X12 - X2, X5, X7, X9, X10, X11 +X1, X3, X6, X8 - +X4, X12 - X2, X5, X9, X10 NoYes Hungry? + X4, X12 - X2, X10 + - X5, X9

18 18 Final Decision Tree Patrons? NoYesHungry? Type? Fri/Sat? No Yes No none some full >60 NoYes French Italian noyes Thai burger

19 19 Decision Trees: summary zFinding optimal decision tree is computationally intractable. zWe use heuristics: yChoosing the right attribute is the key. Choice based on information content that the attribute provides. zRepresent DNF boolean formulas. zWork well in practice. zWhat do do with noise? Continuous attributes? Attributes with large domains?


Download ppt "1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a."

Similar presentations


Ads by Google