Artificial Intelligence University Politehnica of Bucharest Adina Magda Florea
Course No. 10, 11 Machine learning Types of learning Learning by decision trees Learning disjunctive concepts Learning in version space 2
1. Types of learning Specific inferences Inductive inference Abductive inference Analogical inference Uda(iarba) ( x) (PlouaPeste(x) Uda(x))
Learning system Learning Process Problem Solving K & B Inferences Strategy Performance Evaluation Learning results Results Environment Feed-back Teacher Feed-back Data General structure of a learning system
Learning through memorization Learning through instruction / operationalization Learning through induction (from examples) Learning through analogy Types of learning
2. Decision trees. ID3 algorithm Inductive learning Learns concept descriptions from examples Examples (instances of concepts) are defined by attributes and classified in classes Concepts are represented as a decision tree in which every level of the tree is associated to an attribute The leafs are labeled with concepts
Building and using the decision tree First build the decision tree from examples Label leaves with YES or NO (one class) or with the class (Ci) Unknown instances are then classified by following a path in the decision tree according to the values of the attributes
No.Risk (Classification)Credit HistoryDebtCollateralIncome 1HighBadHighNone$0 to $15k 2HighUnknownHighNone$15 to $35k 3ModerateUnknownLowNone$15 to $35k 4HighUnknownLowNone$0k to $15k 5LowUnknownLowNoneOver $35k 6LowUnknownLowAdequateOver $35k 7HighBadLowNone$0 to $15k 8ModerateBadLowAdequateOver $35k 9LowGoodLowNoneOver $35k 10LowGoodHighAdequateOver $35k 11HighGoodHighNone$0 to $15k 12ModerateGoodHighNone$15 to $35k 13LowGoodHighNoneOver $35k 14HighBadHighNone$15 to $35k Another example: Credit evaluation
Algorithm for building the decision tree func tree (ex_set, attributes, default) 1. if ex_set = empty then return a leaf labeled with default 2. if all examples in ex_set are in the same class then return a leaf labeled with that class 3. if attributes = empty then return a leaf labeled with the disjunction of classes in ex_set 4. Select an attribute A, create a node for A and labeled the node with A - remove A from attributes –> attributes’ - m = majority (ex_set) -for each value V of A repeat - be partitionV the set of examples from ex_set with value V for A - create nodeV = tree (partitionV, attributes’,m) - create link node A - nodeV and label the link with V end
Remarks Different decision trees Depth of different DTs is different Occam's razor: build the simplest tree
Information theory Universe of messages M = {m 1, m 2,..., m n } and a probability p(m i ) of occurrence of every message in M, the information content of M can be defined as:
Information content I(T) p(risk is high) = 6/14 p(risk is moderate) = 3/14 p(risk is low) = 5/14 The information content of the decision tree is: I(Arb) = 6/14log(6/14)+3/14log(3/14)+5/14log(5/14)
Information gain G(A) For an attribute A, the information gain obtained by selecting this attribute as the root of the tree equals the total information content of the tree minus the information content that is necessary to finish the classification (building the tree), after selecting A as root G(A) = I(Arb) - E(A)
Computing E(A) Set of learning examples C Attribute A with n values in the root -> C divided in {C 1, C 2,..., C n }
Example “Income” as root: C 1 = {1, 4, 7, 11} C 2 = {2, 3, 12, 14} C 3 = {5, 6, 8, 9, 10, 13} G(income) = I(Arb) - E(Income) =1, ,564 = 0,967 bits G(credit history) = 0,266 bits G(debt) = 0,581 bits G(collateral) = 0,756 bits
Learning performance Be S the set of learning examples Divide S in the learning set and the training set Apply ID3 How many examples from the training set are correctly classified? Repeat steps above for different LS and TS Obtain a prediction of the learning performance Graph X- size of LS, Y- percentage of correctly classified examples Happy graphs
Remarks Lack of data Attributes with many values and high information gain Attributes with numerical values Decision rules
3. Learning by clustering Generalization and specialization Learning examples 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) 21
Learning by clustering concept name: NAME positive part cluster: description: (yellow brick nice big) ex: 1 negative part ex: concept name: NAME positive part cluster: description: ( _ _ nice _) ex: 1, 2 negative part ex: (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)
Learning by clustering concept name: NAME positive part cluster: description: ( _ _ _ _) ex: 1, 2, 3, 4, 5 negative part ex: 6, 7 23 over generalization 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)
Learning by clustering concept name: NAME positive part cluster: description: (yellow brick nice big) ex: 1 cluster: description: ( blue ball nice small) ex: 2 negative part ex: 6, (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)
Learning by clustering concept name: NAME positive part cluster: description: ( yellow brick _ _) ex: 1, 3 cluster: description: ( _ ball _ _) ex: 2, 4 negative part ex: 6, (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)
Learning by clustering concept name: NAME positive part cluster: description: ( yellow _ _ _) ex: 1, 3, 5 cluster: description: ( _ ball _ _) ex: 2, 4 negative part ex: 6, (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) A if yellow or ball
Learning by clustering algorithm 1. Be S the set of examples 2. Create PP and NP 3. Add all ex- from S in NP and remove ex- from S 4. Create a cluster in PP and add first ex+ 5. S = S – ex+ 6. for every ex+ in S e i repeat 6.1 for every cluster C i repeat - Create description e i + C i - if description covers no ex- then add e i to C i 6.2 if e i has not been added to any cluster then create a new cluster with e i end 27
4. Learning in version space Generalization operators in version space Replace constants with variables color(ball, red)color(X, red) Remove literals from conjunctions shape(X, round) size(X, small) color(X, red) shape(X, round) color(X, red) Add disjunctions shape(X, round) size(X, small) color(X, red) shape(X, round) size(X, small) (color(X, red) color(X, blue)) Replace an class with the superclass in is-a relations is-a(tom, cat) is-a(tom, animal) 28
Candidate elimination algorithm Version space Version space = the set of concept descriptions which are consistent with the learning examples What is the idea? = reduce the version space based on learning examples 1 algorithm – from specific to general 1 algorithm – from general to specific 1 algorithm – bidirectional search = candidate elimination algorithm 29
Candidate elimination algorithm 30 obj(X, Y, Z) obj(X, Y, ball) obj(X, red, Z) obj(small, Y, Z) obj(X, red, ball)obj(small, Y, ball) obj(small, red, ball) obj(small, red, Z) obj(small, orange, ball)
Generalization and specialization P and Q – the set which unify with p and q in FOPL p is more general than q if and only if P Q color(X,red) color(ball,red) p more genarl than q - p q x p(x) positive(x) x q(x) positive(x) p covers q if and only if: q(x) positive(x) is a logical consequence of p(x) positive(x) Concept spaceobj(X,Y,Z) 31
Generalization and specialization A concept c is maximally specific if it covers all ex+, does not cover any ex- and for c’ which covers all ex+, c c’. - S A concept c is maximally general if it does not cover any ex- and for c’ which does not cover any ex-, c c’. - G S S – set of hypothesis (candidate concepts) = maximum specific generalizations G G – set of hypothesis (candidate concepts) = maximum general specializations 32
Algorithm for searching from specific to general 1. Initialize S with the first ex+ 2. Initialize N with the empty set 3. for every learning example repeat 3.1 if ex+, p, then for each s S repeat - if s does not cover p then replace s with the most specific generalization which covers p - Remove from S all hypothesis more general than other hypothesis from S - Remove from S all hypothesis which cover an ex- from N 3.2 if ex-, n, then - Remove from S all hypothesis which cover n - Add n to N (to check for overgeneralization) end 33
Algorithm for searching from specific to general 34 Positive: obj(small, red, ball) Positive: obj(small, white, ball) Positive: obj(large, blue, ball) S: { } S: { obj(small, red, ball) } S: { obj(small, Y, ball) } S: { obj(X, Y, ball) }
Algorithm for searching from general to specific 1. Initialize G with the most general description 2. Initialize P with the empty set 3. for every learning example repeat 3.1 if ex-, n, then for each g G repeat - if g covers n then replace g with the most general specialization which does not cover n - Remove from G all the hypothesis more specific than other hypothesis in G - Remove from G all hypothesis which does not cover the positive examples from P 3.2 if ex+, p, then - Remove from G all the hypothesis that does not cover p - Add p to P (to check for overspecialization) end 35
Algorithm for searching from general to specific 36 Negative: obj(small, red, brick) Positive: obj(large, white, ball) Negative: obj(large, blue, cube) G: { obj(X, Y, Z) } G: { obj(large, Y, Z), obj(X, white, Z), obj(X, blue, Z), obj(X, Y, ball), obj(X, Y, cube) } Positive: obj(small, blue, ball) G: { obj(large, Y, Z), obj(X, white, Z), obj(X, Y, ball) } G: {obj(X, white, Z), obj(X, Y, ball) } G: obj(X, Y, ball)
Algorithm for searching in version space 1. Initialize G with the most general description 2. Initialize S with the first ex+ 3. for every learning example repeat 3.1 if ex+, p, then Remove from G all the elements that does not cover p for each s S repeat - if s does not cover p then replace s with the most specific generalization which covers p - Remove from S all hypothesis more general than other hypothesis in S - Remove from S all hypothesis more general than other hypothesis in G 37
Algorithm for searching in version space - cont 3.2 if ex-, n, then Remove from S all the hypothesis that cover n for each g G repeat - if g covers n then replace g with the most general specialization which does not cover n - Remove from G all hypthesis more specific than other hypothesis in G - Remove from G all hypthesis more specific than other hypothesis in S 4. if G = S and card(S) = 1 then a concept is found 5. if G = S = { } then there is no concept consistent with all hypothesis end 38
Algorithm for searching in version space 39 Negative: obj(large, red, cube) Positive: obj(small, red, ball) Negative: obj(small, blue, ball) G: { obj(X, Y, Z) } S: { } G: { obj(X, Y, Z) } S: { obj(small, red, ball) } Positive: obj(large, red, ball) G: { obj(X, red, ball) } S: { obj(X, red, ball) } G: { obj(X, red, Z) } S: { obj(small, red, ball) } G: { obj(X, red, Z) } S: { obj(X, red, ball) }
Implementation of the algorithm specific to general 40 exemple([pos([large,white,ball]),neg([small,red,brick]), pos([small,blue,ball]),neg([large,blue,cube])]). acopera([],[]). acopera([H1|T1], [H2|T2]) :- var(H1), var(H2), acopera(T1,T2). acopera([H1|T1], [H2|T2]) :- var(H1), atom(H2), acopera(T1,T2). acopera([H1|T1], [H2|T2]) :- atom(H1), atom(H2), H1=H2, acopera(T1,T2). maigeneral(X,Y) :- not(acopera(Y,X)), acopera(X,Y). generaliz([], [], []). generaliz([Atrib|Rest], [Inst|RestInst], [Atrib|RestGen]):- Atrib==Inst, generaliz(Rest,RestInst,RestGen). generaliz([Atrib |Rest], [Inst|RestInst], [_|RestGen]):- Atrib\=Inst, generaliz(Rest,RestInst,RestGen).
41 specgen :- exemple( [pos(H)|Rest] ), speclagen([H], [], Rest). speclagen(H, N, []) :- print('H='), print(H), nl, print('N='), print(N), nl. speclagen(H, N, [Ex|RestEx]) :- process(Ex, H, N, H1, N1), speclagen(H1, N1, RestEx). process(pos(Ex), H, N, H1, N) :- generalizset(H, HGen, Ex), elim(X, HGen, (member(Y,HGen), maigeneral(X,Y)), H2), elim(X, H2, (member(Y,N),acopera(X,Y)), H1). process(neg(Ex), H, N, H1, [Ex|N]) :- elim(X, H, acopera(X,Ex), H1). elim(X,L,Goal,L1):- (bagof(X, (member(X,L), not(Goal)), L1); L1=[]). Implementation of the algorithm specific to general
42 generalizset([], [], _). generalizset([Ipot|Rest], IpotNoua, Ex) :- not(acopera(Ipot,Ex)), (bagof(X, generaliz(Ipot,Ex,X), ListIpot); ListIpot=[]), generalizset(Rest,RestNou,Ex), append(ListIpot,RestNou,IpotNoua). generalizset([Ipot|Rest], [Ipot|RestNou], Ex):- acopera(Ipot,Ex), generalizset(Rest,RestNou,Ex). ?- specgen. H=[[_G390, _G393, ball]] N=[[large, blue, cube], [small, red, brick]] Implementation of the algorithm specific to general