Artificial Intelligence University Politehnica of Bucharest 2008-2009 Adina Magda Florea

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.
Decision Tree Learning - ID3
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Ch. 19 – Knowledge in Learning Supplemental slides for CSE 327 Prof. Jeff Heflin.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
18 LEARNING FROM OBSERVATIONS
Chapter 2 - Concept learning
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Machine Learning: Symbol-Based
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 4.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Mohammad Ali Keyvanrad
Learning Holy grail of AI. If we can build systems that learn, then we can begin with minimal information and high-level strategies and have systems better.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Ch10 Machine Learning: Symbol-Based
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Symbol-Based Luger: Artificial.
Learning, page 19 CSI 4106, Winter 2005 Learning decision trees A concept can be represented as a decision tree, built from examples, as in this problem.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Outline Inductive bias General-to specific ordering of hypotheses
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Learning, page 1 CSI 4106, Winter 2005 Symbolic learning Points Definitions Representation in logic What is an arch? Version spaces Candidate elimination.
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
1 Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014.
KU NLP Machine Learning1 Ch 9. Machine Learning: Symbol- based  9.0 Introduction  9.1 A Framework for Symbol-Based Learning  9.2 Version Space Search.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Machine Learning A Quick look Sources: Artificial Intelligence – Russell & Norvig Artifical Intelligence - Luger By: Héctor Muñoz-Avila.
Machine Learning Concept Learning General-to Specific Ordering
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
ID3 example. No.Risk (Classification)Credit HistoryDebtCollateralIncome 1HighBadHighNone$0 to $15k 2HighUnknownHighNone$15 to $35k 3ModerateUnknownLowNone$15.
Computational Learning Theory Part 1: Preliminaries 1.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Machine Learning: Symbol-Based
CS 9633 Machine Learning Decision Tree Learning
Machine Learning: Symbol-Based
Artificial Intelligence
Machine Learning Learning is “any change in a system that allows it to perform better the second time on repetition of the same task or on another task.
Data Science Algorithms: The Basic Methods
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Version Space Machine Learning Fall 2018.
Presentation transcript:

Artificial Intelligence University Politehnica of Bucharest Adina Magda Florea

Course No. 10, 11 Machine learning Types of learning Learning by decision trees Learning disjunctive concepts Learning in version space 2

1. Types of learning Specific inferences Inductive inference Abductive inference Analogical inference Uda(iarba) (  x) (PlouaPeste(x)  Uda(x))

Learning system Learning Process Problem Solving K & B Inferences Strategy Performance Evaluation Learning results Results Environment Feed-back Teacher Feed-back Data General structure of a learning system

Learning through memorization Learning through instruction / operationalization Learning through induction (from examples) Learning through analogy Types of learning

2. Decision trees. ID3 algorithm Inductive learning Learns concept descriptions from examples Examples (instances of concepts) are defined by attributes and classified in classes Concepts are represented as a decision tree in which every level of the tree is associated to an attribute The leafs are labeled with concepts

Building and using the decision tree First build the decision tree from examples Label leaves with YES or NO (one class) or with the class (Ci) Unknown instances are then classified by following a path in the decision tree according to the values of the attributes

Example

Example

No.Risk (Classification)Credit HistoryDebtCollateralIncome 1HighBadHighNone$0 to $15k 2HighUnknownHighNone$15 to $35k 3ModerateUnknownLowNone$15 to $35k 4HighUnknownLowNone$0k to $15k 5LowUnknownLowNoneOver $35k 6LowUnknownLowAdequateOver $35k 7HighBadLowNone$0 to $15k 8ModerateBadLowAdequateOver $35k 9LowGoodLowNoneOver $35k 10LowGoodHighAdequateOver $35k 11HighGoodHighNone$0 to $15k 12ModerateGoodHighNone$15 to $35k 13LowGoodHighNoneOver $35k 14HighBadHighNone$15 to $35k Another example: Credit evaluation

Algorithm for building the decision tree func tree (ex_set, attributes, default) 1. if ex_set = empty then return a leaf labeled with default 2. if all examples in ex_set are in the same class then return a leaf labeled with that class 3. if attributes = empty then return a leaf labeled with the disjunction of classes in ex_set 4. Select an attribute A, create a node for A and labeled the node with A - remove A from attributes –> attributes’ - m = majority (ex_set) -for each value V of A repeat - be partitionV the set of examples from ex_set with value V for A - create nodeV = tree (partitionV, attributes’,m) - create link node A - nodeV and label the link with V end

Remarks Different decision trees Depth of different DTs is different Occam's razor: build the simplest tree

Information theory Universe of messages M = {m 1, m 2,..., m n } and a probability p(m i ) of occurrence of every message in M, the information content of M can be defined as:

Information content I(T) p(risk is high) = 6/14 p(risk is moderate) = 3/14 p(risk is low) = 5/14 The information content of the decision tree is: I(Arb) = 6/14log(6/14)+3/14log(3/14)+5/14log(5/14)

Information gain G(A) For an attribute A, the information gain obtained by selecting this attribute as the root of the tree equals the total information content of the tree minus the information content that is necessary to finish the classification (building the tree), after selecting A as root G(A) = I(Arb) - E(A)

Computing E(A) Set of learning examples C Attribute A with n values in the root -> C divided in {C 1, C 2,..., C n }

Example “Income” as root: C 1 = {1, 4, 7, 11} C 2 = {2, 3, 12, 14} C 3 = {5, 6, 8, 9, 10, 13} G(income) = I(Arb) - E(Income) =1, ,564 = 0,967 bits G(credit history) = 0,266 bits G(debt) = 0,581 bits G(collateral) = 0,756 bits

Learning performance Be S the set of learning examples Divide S in the learning set and the training set Apply ID3 How many examples from the training set are correctly classified? Repeat steps above for different LS and TS Obtain a prediction of the learning performance Graph X- size of LS, Y- percentage of correctly classified examples Happy graphs

Remarks Lack of data Attributes with many values and high information gain Attributes with numerical values Decision rules

3. Learning by clustering Generalization and specialization Learning examples 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) 21

Learning by clustering concept name: NAME positive part cluster: description: (yellow brick nice big) ex: 1 negative part ex: concept name: NAME positive part cluster: description: ( _ _ nice _) ex: 1, 2 negative part ex: (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)

Learning by clustering concept name: NAME positive part cluster: description: ( _ _ _ _) ex: 1, 2, 3, 4, 5 negative part ex: 6, 7 23 over generalization 1. (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)

Learning by clustering concept name: NAME positive part cluster: description: (yellow brick nice big) ex: 1 cluster: description: ( blue ball nice small) ex: 2 negative part ex: 6, (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)

Learning by clustering concept name: NAME positive part cluster: description: ( yellow brick _ _) ex: 1, 3 cluster: description: ( _ ball _ _) ex: 2, 4 negative part ex: 6, (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -)

Learning by clustering concept name: NAME positive part cluster: description: ( yellow _ _ _) ex: 1, 3, 5 cluster: description: ( _ ball _ _) ex: 2, 4 negative part ex: 6, (yellow brick nice big +) 2. (blue ball nice small +) 3. (yellow brick dull small +) 4. (green ball dull big +) 5. (yellow cube nice big +) 6. (blue cube nice small -) 7. (blue brick nice big -) A if yellow or ball

Learning by clustering algorithm 1. Be S the set of examples 2. Create PP and NP 3. Add all ex- from S in NP and remove ex- from S 4. Create a cluster in PP and add first ex+ 5. S = S – ex+ 6. for every ex+ in S e i repeat 6.1 for every cluster C i repeat - Create description e i + C i - if description covers no ex- then add e i to C i 6.2 if e i has not been added to any cluster then create a new cluster with e i end 27

4. Learning in version space Generalization operators in version space Replace constants with variables color(ball, red)color(X, red) Remove literals from conjunctions shape(X, round)  size(X, small)  color(X, red) shape(X, round)  color(X, red) Add disjunctions shape(X, round)  size(X, small)  color(X, red) shape(X, round)  size(X, small)  (color(X, red)  color(X, blue)) Replace an class with the superclass in is-a relations is-a(tom, cat) is-a(tom, animal) 28

Candidate elimination algorithm Version space Version space = the set of concept descriptions which are consistent with the learning examples What is the idea? = reduce the version space based on learning examples 1 algorithm – from specific to general 1 algorithm – from general to specific 1 algorithm – bidirectional search = candidate elimination algorithm 29

Candidate elimination algorithm 30 obj(X, Y, Z) obj(X, Y, ball) obj(X, red, Z) obj(small, Y, Z) obj(X, red, ball)obj(small, Y, ball) obj(small, red, ball) obj(small, red, Z) obj(small, orange, ball)

Generalization and specialization P and Q – the set which unify with p and q in FOPL p is more general than q if and only if P  Q color(X,red)  color(ball,red) p more genarl than q - p  q  x p(x)  positive(x)  x q(x)  positive(x) p covers q if and only if: q(x)  positive(x) is a logical consequence of p(x)  positive(x) Concept spaceobj(X,Y,Z) 31

Generalization and specialization A concept c is maximally specific if it covers all ex+, does not cover any ex- and for  c’ which covers all ex+, c  c’. - S A concept c is maximally general if it does not cover any ex- and for  c’ which does not cover any ex-, c  c’. - G S S – set of hypothesis (candidate concepts) = maximum specific generalizations G G – set of hypothesis (candidate concepts) = maximum general specializations 32

Algorithm for searching from specific to general 1. Initialize S with the first ex+ 2. Initialize N with the empty set 3. for every learning example repeat 3.1 if ex+, p, then for each s  S repeat - if s does not cover p then replace s with the most specific generalization which covers p - Remove from S all hypothesis more general than other hypothesis from S - Remove from S all hypothesis which cover an ex- from N 3.2 if ex-, n, then - Remove from S all hypothesis which cover n - Add n to N (to check for overgeneralization) end 33

Algorithm for searching from specific to general 34 Positive: obj(small, red, ball) Positive: obj(small, white, ball) Positive: obj(large, blue, ball) S: { } S: { obj(small, red, ball) } S: { obj(small, Y, ball) } S: { obj(X, Y, ball) }

Algorithm for searching from general to specific 1. Initialize G with the most general description 2. Initialize P with the empty set 3. for every learning example repeat 3.1 if ex-, n, then for each g  G repeat - if g covers n then replace g with the most general specialization which does not cover n - Remove from G all the hypothesis more specific than other hypothesis in G - Remove from G all hypothesis which does not cover the positive examples from P 3.2 if ex+, p, then - Remove from G all the hypothesis that does not cover p - Add p to P (to check for overspecialization) end 35

Algorithm for searching from general to specific 36 Negative: obj(small, red, brick) Positive: obj(large, white, ball) Negative: obj(large, blue, cube) G: { obj(X, Y, Z) } G: { obj(large, Y, Z), obj(X, white, Z), obj(X, blue, Z), obj(X, Y, ball), obj(X, Y, cube) } Positive: obj(small, blue, ball) G: { obj(large, Y, Z), obj(X, white, Z), obj(X, Y, ball) } G: {obj(X, white, Z), obj(X, Y, ball) } G: obj(X, Y, ball)

Algorithm for searching in version space 1. Initialize G with the most general description 2. Initialize S with the first ex+ 3. for every learning example repeat 3.1 if ex+, p, then Remove from G all the elements that does not cover p for each s  S repeat - if s does not cover p then replace s with the most specific generalization which covers p - Remove from S all hypothesis more general than other hypothesis in S - Remove from S all hypothesis more general than other hypothesis in G 37

Algorithm for searching in version space - cont 3.2 if ex-, n, then Remove from S all the hypothesis that cover n for each g  G repeat - if g covers n then replace g with the most general specialization which does not cover n - Remove from G all hypthesis more specific than other hypothesis in G - Remove from G all hypthesis more specific than other hypothesis in S 4. if G = S and card(S) = 1 then a concept is found 5. if G = S = { } then there is no concept consistent with all hypothesis end 38

Algorithm for searching in version space 39 Negative: obj(large, red, cube) Positive: obj(small, red, ball) Negative: obj(small, blue, ball) G: { obj(X, Y, Z) } S: { } G: { obj(X, Y, Z) } S: { obj(small, red, ball) } Positive: obj(large, red, ball) G: { obj(X, red, ball) } S: { obj(X, red, ball) } G: { obj(X, red, Z) } S: { obj(small, red, ball) } G: { obj(X, red, Z) } S: { obj(X, red, ball) }

Implementation of the algorithm specific to general 40 exemple([pos([large,white,ball]),neg([small,red,brick]), pos([small,blue,ball]),neg([large,blue,cube])]). acopera([],[]). acopera([H1|T1], [H2|T2]) :- var(H1), var(H2), acopera(T1,T2). acopera([H1|T1], [H2|T2]) :- var(H1), atom(H2), acopera(T1,T2). acopera([H1|T1], [H2|T2]) :- atom(H1), atom(H2), H1=H2, acopera(T1,T2). maigeneral(X,Y) :- not(acopera(Y,X)), acopera(X,Y). generaliz([], [], []). generaliz([Atrib|Rest], [Inst|RestInst], [Atrib|RestGen]):- Atrib==Inst, generaliz(Rest,RestInst,RestGen). generaliz([Atrib |Rest], [Inst|RestInst], [_|RestGen]):- Atrib\=Inst, generaliz(Rest,RestInst,RestGen).

41 specgen :- exemple( [pos(H)|Rest] ), speclagen([H], [], Rest). speclagen(H, N, []) :- print('H='), print(H), nl, print('N='), print(N), nl. speclagen(H, N, [Ex|RestEx]) :- process(Ex, H, N, H1, N1), speclagen(H1, N1, RestEx). process(pos(Ex), H, N, H1, N) :- generalizset(H, HGen, Ex), elim(X, HGen, (member(Y,HGen), maigeneral(X,Y)), H2), elim(X, H2, (member(Y,N),acopera(X,Y)), H1). process(neg(Ex), H, N, H1, [Ex|N]) :- elim(X, H, acopera(X,Ex), H1). elim(X,L,Goal,L1):- (bagof(X, (member(X,L), not(Goal)), L1); L1=[]). Implementation of the algorithm specific to general

42 generalizset([], [], _). generalizset([Ipot|Rest], IpotNoua, Ex) :- not(acopera(Ipot,Ex)), (bagof(X, generaliz(Ipot,Ex,X), ListIpot); ListIpot=[]), generalizset(Rest,RestNou,Ex), append(ListIpot,RestNou,IpotNoua). generalizset([Ipot|Rest], [Ipot|RestNou], Ex):- acopera(Ipot,Ex), generalizset(Rest,RestNou,Ex). ?- specgen. H=[[_G390, _G393, ball]] N=[[large, blue, cube], [small, red, brick]] Implementation of the algorithm specific to general