EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt
EIE426-AICV 2 Contents Machine learning concepts and procedures Learning by recording cases Learning by building identification trees Simplification of decision rules
Machine Learning Learning is based on coupling new information to previously acquired knowledge. Usually, a great deal of reasoning is involved. (1) Learning by analyzing differences (2) Learning by managing multiple models (3) Learning by explaining experience (4) Learning by correcting mistakes Learning is based on digging useful regularity out of data (1) Learning by recording cases (2) Learning by building identification trees (3) Learning by training neural nets (4) Learning by simulation evolution Two kinds of learning: EIE426-AICV
Learning by Recording Cases The consistency heuristic: whenever you want to guess a property of something, given nothing else to go on but a set of reference cases, find the most similar case, as measured by known properties, for which the property is known. Guess that the unknown property is the same as that known property. This technique is good for problem domains in which good models are impossible to build. The learning will do nothing to the information in the recorded cases until that information is used. EIE426-AICV
EIE426-AICV Learning by Recording Cases (cont.)
EIE426-AICV Learning by Recording Cases (cont.)
Finding Nearest Neighbors The straightforward way: calculate the distance to each other object and find the minimum among those distances. For n other objects, there are n distances to compute and (n-1) distance comparisons to do. EIE426-AICV
Decision Trees A decision tree is a representation that is a semantic tree in which Each leaf node is connected to a set of possible answers. Each non-leaf node is connected to a test that splits its set of possible answers into subsets corresponding to different test results. Each branch carries a particular test result’s subset to another node EIE426-AICV 8
Decision Trees (cont.) EIE426-AICV 9
K-D Tree A k-d tree is a representation That is a decision tree In which The set of possible answers consists of points, one of which may be the nearest neighbor to a given point. Each test specifies a coordinate, a threshold, and a neutral zone around the threshold containing no points. Each test divides a set of points into two sets, according to on which side of the threshold each point lies EIE426-AICV 10
Height Width Red Orange Yellow Purple Red Violet Blue Green 2.00 U 4.00 U 2.00 U EIE426-AICV K-D Tree (cont.)
Height > 3.5 Height > 5.5? Orange No Width > 3.5 Yes Width > 3.0 Height > 1.5? VioletRedGreenBlue No Yes Red Yes Height > 5.5? PurpleYellow No Yes EIE426-AICV K-D Tree (cont.)
K-D Tree (cont.) To divide the cases into sets, If there is only one case, stop. If this is the first division of cases, pick the vertical axis for comparison; otherwise, pick the axis that is different from the axis at the next higher level. Considering only the axis of comparison, find the average position of the two middle objects. Call this average position the threshold, and construct a decision-tree test that compares unknowns in the axis of comparison against the threshold. Also note the position of the two middle objects in the axis of comparison. Call these positions the upper and lower boundaries. Divide up all the objects into two subsets, according to on which side of the average position they lie. Divide up the objects in each subset, forming a subtree for each, using this procedure EIE426-AICV 13
To find the nearest neighbor using the K-D procedure, Determine whether there is only one element in the set under consideration. If there is only one, report it. Otherwise, compare the unknown, in the axis of comparison, against the current node’s threshold. The result determines the likely set. Find the nearest neighbor in the likely set using this procedure. Determine whether the distance to the nearest neighbor in the likely set is less than or equal to the distance to the other set’s boundary in the axis of comparison: If it is, then report the nearest neighbor in the likely set. If it is not, check the unlikely set using this procedure; return the nearer of the nearest neighbors in the likely set and in the unlikely set. EIE426-AICV K-D Tree (cont.)
Learning by Building Identification Trees Identification-tree building is the most widely used learning method. Thousands of practical identification trees, for applications ranging from medical diagnosis to process control, has been built using the method. EIE426-AICV
From Data to Identification Trees NameHairHeightWeightLotionResult Sarahblondeaveragelightnosunburned Danablonde tallaverageyesnone Alexbrown shortaverageyesnone Annieblonde shortaveragenosunburned Emilyred averageheavynosunburned Petebrown tallheavynonone Johnbrown averageheavynonone Katieblondeshortlightyesnone EIE426-AICV
An identification tree is a representation That is a decision tree In Which Each set of possible conclusions is established implicitly by a list of samples of known class. In the table, there are 3 x 3 x 3 x 2 = 54 possible combinations. The probability of an exact match with someone already observed is 8/54. It can be impractical to classify an unknown object by looking for an exact match. EIE426-AICV From Data to Identification Trees (cont.)
Height Tall Average Short Dana Pete Weight 4Sarah Light Average Heavy Hair color BlondeRed Brown Alex Weight LightAverage Heavy Katie 4Annie Hair Blonde RedBrown 4Emily John EIE426-AICV Identification Tree
The world is inherently simple. Therefore the smallest identification tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly. Which is the right identification tree? How can you construct the smallest identification tree? EIE426-AICV Identification Tree (cont.)
Tests Should Minimize Disorder EIE426-AICV
EIE426-AICV Tests Should Minimize Disorder (cont.) Hair Color: Blonde 4 Samples: Sarah, Dana, Annie, Katie
Information Theory Supplies a Disorder Formula Where n b is the number of samples in branch b, n t is the total number of samples in all branches, n bc is the number of samples in branch b of class c. EIE426-AICV
Disorder Formula EIE426-AICV 23 For two classes, A and B: If they are perfectly balanced, that is, n bc = 0.5 (c=1,2), then If there are only A’s or only B’s (perfect homogeneity), then
As it moves from perfect homogeneity to perfect balance, disorder varies smoothly between zero and one. EIE426-AICV Disorder Measure
TestDisorder Hair0.5 Height0.69 Weight0.94 Lotion0.61 The first test: Thus, the hair-color test is the winner. EIE426-AICV Disorder Measure (cont.)
TestDisorder Height0.5 Weight1 Lotion0 Once the hair test is selected, the choice of another test to separate out the sunburned people from among Sarah, Dana, Annie, and Katie is decided by the following calculations: Thus, the lotion-used test is the clear winner. EIE426-AICV Disorder Measure (cont.)
Identification Tree Algorithm To generate an identification tree using SPROUTER, Until each leaf node is populated by as homogeneous a sample set as possible: Select a leaf node with an inhomogeneous sample set. Replace that leaf node by a test node that divides the inhomogeneous sample set into minimally inhomogeneous subsets, according to some measure of disorder EIE426-AICV 27
From Trees to Rules If the person’s hair color is blonde and the person uses lotion, thennothing happens. Ifthe person’s hair color is blonde and the person uses no lotion, thenthe person turns red. Ifthe person’s hair color is red, thenthe person turns red. If the person’s hair color is brown, thennothing happens. EIE426-AICV
Unnecessary Rule Antecedents Should be Eliminated If the person’s hair color is blonde and the person uses lotion. thennothing happens. Ifthe person uses lotion, then nothing happens. EIE426-AICV
Contingency Table No change Sunburned Person is blonde20 Person is not blonde10 The first antecedent can be eliminated. No changeSunburned Person uses lotion20 Person uses no lotion02 The second antecedent cannot be eliminated. EIE426-AICV Keep the 2 nd antecedent Samples: Dana Alex Katie Keep the 1st antecedent Samples: Sarah Dana Annie Katie
Ifthe person’s hair color is blonde the person does not use lotion thenthe person turns red No change Sunburned Person is blonde02 Person is not blonde21 The first antecedent cannot be eliminated. No change Sunburned Person uses no lotion02 Person uses lotion20 The second antecedent cannot be eliminated either. EIE426-AICV Contingency Table (cont.) Keep the 2 nd antecedent Samples: Sarah, Annie Emily, Pete John Keep the 1st antecedent Samples: Sarah Dana Annie Katie
If the person’s hair color is red, then the person turns red. No changeSunburned Person is red haired 01 Person is not red haired52 The antecedent cannot be eliminated. If the person’s hair color is brown, thennothing happens. No changeSunburned Person is brown haired30 Person is not brown haired23 The antecedent cannot be eliminated. EIE426-AICV Contingency Table (cont.) No antecedent All 8 samples are considered. No antecedent All 8 samples are considered.
Unnecessary Rules Should be Eliminated If the person’s hair color is blonde and the person uses no lotion, thenthe person turns red Rule 1 If the person uses lotion, then nothing happens Rule 2 If the person’s hair color is red, then the person turns red Rule 3 If the person’s hair color is brown, thennothing happens Rule 4 EIE426-AICV
Default Rules and Tie Breaker Default rule: Ifno other rule applies, thenthe person turns red, Rule 5 or Ifno other rule applies, thennothing happens Rule EIE426-AICV 34 Choose the default rule to minimize the total number of rules. Tie breaker 1: Choose the default rule that covers the most common consequent in the sample set. Rule 6 is used together with Rules 1 and 3. Tie breaker 2: Choose the default rule that produces the simplest rules. Rule 5 is used together with Rules 2 and 4.
Rule Generation Algorithm To generate rules from an identification tree using PRUNER, Create one rule for each root-to-leaf path in the identification tree. Simplify each rule by discarding antecedents that have no effect on the conclusion reached by the rule. Replace those rules that share the most common consequent by a default rule that is triggered when on other rule is triggered (eliminating as many other rules as possible). In the event of a tie, use some heuristic tie breaker to choose a default rule EIE426-AICV 35