L6. Learning Systems in Java
Necessity of Learning No Prior Knowledge about all of the situations. Being able to adapt to changes in the environment. Getting better at task through experience.
Some Forms of Learning Rote learning –Copy examples and exactly reproduce the behavior Parameter or weight adjustment –Adjust weight factors over time Induction –A process of learning by example: to extract the important characteristics of the problem to generalize to novel situations or inputs. –The key is that the examples are processed and automatically transformed into a knowledge representation. –Using for classification or regression (prediction) problems. Clustering –Grouping examples and generalizing to new situations. –Using for data mining.
Learning Paradigms Supervised learning – programming by example –The learning agent is trained by showing it examples of the problem state or attributes along with the desired output or action. –The learning agent makes a prediction based on the inputs and if the output differs from the desired output, then the agent is adjusted or adapted to product the correct output. –E.g.: the back propagation neural network, a decision tree Unsupervised learning –The learning agent needs to recognize similarities between inputs or to identify features in the input data. –It partitions the data into group. –E.g.: a Kohonen map Reinforcement learning –A type of supervised learning but the error information is less specific. –Having exact prior information about the desired output is not possible.
Classifier Systems Classifier systems –They were introduced by John Holland as a way to introduce learning to rule-based systems. –Mechanism is based on a techniques known as genetic algorithms. –The rule base is modified by applying genetic algorithms. Genetic Algorithms –Representing the rules as binary strings. –The rules are modified by genetic operators. –Evaluation function is the key of a genetic algorithm. –The whole process is based on Darwin’s evolutionary principle.
Decision Trees example data sets classifiers and prediction models apply information theory By Shanon and Weaver (1949) The unit of information is a bit, and the amount of information in a single binary answer is log 2 P(v), where P(v) is the probability of event v occurring. Information needed for a correct answer, I(p/(p+n), n/(p+n)) = - (p/(p+n) log 2 p/(p+n) ) - n/(p+n)log 2 n/(p+n) ) Information contained in the remained sub-trees, Remainder(A) = (p i + n i ) /(p+n) I(p i /(p i + n i ), n i /(p i + n i )) Gain(A) = I(p/(p+n), n/(p+n)) - Remainder(A)
Information Gain (an example) Suppose that there are the total of 1000 customers, men renew 90 percent of the time, women renew 70 percent, and the customer set is made up half of men and half of women. Information gain by testing whether a customer is a male or female? Gain(Sex) = 1- [(500/1000)I(450/500, 50/500)+(500/1000)I(350/500, 140/500)] = 1-(0.5)I(0.9, 0.1) - (0.5)I(0.7, 0.3) = 1-0.5x x = Information gain by testing on the attribute, usage? Gain(Usage) = 1- [(1/3)I(1/2, 1/2)+(1/3)I(9/10, 1/10)+(1/3)I(1, 0)] = x x x1.0 = Suppose that we had grouped the customers’ usage habits into 3 groups: under 4 hours a month, from 4 to 10 hours, and over 10. The customers are evenly split among three groups. The first group renews at 50 percent, the second at 90 percent, and the third at 100 percent. Conclusion: In building a decision tree, it is better to first split the data based on whether the customer was male or female, and then on how much connect-time they used.
Implementation of a Decision Tree DecisionTree.txt DecisionTree.txt // compute information content, // given # of pos and neg examples double computeInfo(int p, int n) { double total = p + n ; double pos = p / total ; double neg = n / total; double temp; if ((p ==0) || (n == 0)) { temp = 0.0 ; } else { temp = (-1.0 * (pos * Math.log(pos)/Math.log(2))) - (neg * Math.log(neg)/Math.log(2)) ; } return temp ; } double computeRemainder(Variable variable, Vector examples) { int positive[] = new int[variable.labels.size()]; int negative[] = new int[variable.labels.size()]; int index = variable.column; int classIndex = classVar.column; double sum = 0 ; double numValues = variable.labels.size(); double numRecs = examples.size() ; for( int i=0 ; i < numValues ; i++) { String value = variable.getLabel(i); Enumeration enum = examples.elements(); while (enum.hasMoreElements()) { String record[] = (String[])enum.nextElement(); // get next record if (record[index].equals(value)) { if (record[classIndex].equals("yes")) { positive[i]++; } else { negative[i]++; } } /* endwhile */ double weight = (positive[i]+negative[i]) / numRecs; double myrem = weight * computeInfo(positive[i], negative[i]); sum = sum + myrem ; } /* endfor */ return sum ; }
Implementation of a Decision Tree // return the variable with most gain Variable chooseVariable(Hashtable variables, Vector examples) { Enumeration enum = variables.elements() ; double gain = 0.0, bestGain = 0.0 ; Variable best = null ; int counts[] ; counts = getCounts(examples) ; int pos = counts[0] ; int neg = counts[1] ; double info = computeInfo(pos, neg); while(enum.hasMoreElements()) { Variable tempVar = (Variable)enum.nextElement() ; gain = info - computeRemainder(tempVar, examples); if (gain > bestGain) { bestGain = gain ; best = tempVar; } return best; // }
Demo A decision tree. C:\huang\DAI\L5_2004\learning\learn\appletTest.jpr Example data resttree.dat.txt alternate bar FriSat hungry patrons price raining reservatio rtype waitEstimate ClassField variables
Starting DecisionTree Info = 1.0 waitEstimate gain = raining gain = 0.0 hungry gain = price gain = FriSat gain = bar gain = 0.0 patrons gain = alternate gain = 0.0 rtype gain = E-16 reservation gain = Choosing best variable: patrons Subset - there are 4 records with patrons = some Subset - there are 6 records with patrons = full Info = waitEstimate gain = raining gain = hungry gain = price gain = FriSat gain = bar gain = 0.0 patrons gain = 0.0 alternate gain = rtype gain = reservation gain = Choosing best variable: waitEstimate Subset - there are 0 records with waitEstimate = 0-10 Subset - there are 2 records with waitEstimate = Info = 1.0 waitEstimate gain = 0.0 raining gain = 0.0 hungry gain = 0.0 price gain = 0.0 FriSat gain = 1.0 bar gain = 1.0 patrons gain = 0.0 alternate gain = 0.0 rtype gain = 1.0 reservation gain = 0.0 Choosing best variable: FriSat Subset - there are 1 records with FriSat = no Subset - there are 1 records with FriSat = yes Subset - there are 2 records with waitEstimate = 10-30
Info = 1.0 waitEstimate gain = 0.0 raining gain = 0.0 hungry gain = 0.0 price gain = 1.0 FriSat gain = 0.0 bar gain = 1.0 patrons gain = 0.0 alternate gain = 0.0 rtype gain = 1.0 reservation gain = 1.0 Choosing best variable: price Subset - there are 1 records with price = $$$ Subset - there are 1 records with price = $ Subset - there are 0 records with price = $$ Subset - there are 2 records with waitEstimate = >60 Subset - there are 2 records with patrons = none DecisionTree -- classVar = ClassField Interior node - patrons Link - patrons=some Leaf node - yes Link - patrons=full Interior node - waitEstimate Link - waitEstimate=0-10 Leaf node - yes Link - waitEstimate=30-60 Interior node - FriSat Link - FriSat=no Leaf node - no Link - FriSat=yes Leaf node - yes Link - waitEstimate=10-30 Interior node - price Link - price=$$$ Leaf node - no Link - price=$ Leaf node - yes Link - price=$$ Leaf node - yes Link - waitEstimate=>60 Leaf node - no Link - patrons=none Leaf node - no Stopping DecisionTree - success! Draw a decision tree!
Another Demo C:\huang\DAI\L5_2004\learning\DecisionT reeApplet_3.20\source\DecisionTreeApple t.html load: basketball Algorithm-> set splitting function: gain
References miniconference/papers/swere.pdfhttp:// miniconference/papers/swere.pdf Suggestion: Make a presentation – Decision tree and a rule base (Optional) Apply the decision tree learning to your rule base system