CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Spring, 2013 CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development for 20 years Well-developed, maintained, supported Open source Windows, Mac and Unix versions http://www.cs.waikato.ac.nz/ml/weka/index.html Lots of help available at the wiki: http://weka.wikispaces.com/ CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek ROC Curve {Receiver|Relative} Operating Characteristic Curve Name derives from signal detection theory Basically plots sensitivity on the Y axis against specificity on the X-axis (actually 1-specificity) Ideal would be (0,1). Random would be (0.5, 0.5) (in a balanced domain) Useful for evaluating a classifier comparing classifiers setting cutoffs for class membership CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek http://en.wikipedia.org/wiki/File:ROC_space-2.png CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek More Weka Last week -- cross-validated decision tree. Go through section 4.2 of the tutorial. What data set did you use? Which classifier did better based on the confusion matrix? What about the ROC curve? CSC 8520 Spring 2013. Paula Matuszek
Trying a Support Vector Classifier SMO is a support vector classifier http://weka.sourceforge.net/doc/weka/classifier s/functions/SMO.html libSVM is a faster SVM, but it is not installed with Weka; all that is there is a wrapper. CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek Decision Tree vs SMO Repeat section 4.2, replacing the RandomForest classifier with SMO What were the results for your data source? CSC 8520 Spring 2013. Paula Matuszek
Moving on to the Weka Explorer Explore some of the data sets included with Weka. Restart Weka, using the Explorer instead of the KnowledgeFlow. Make sure the Proprocess step is highlighted Use the Open File Option to look at some of the data sets Choose one which is binary usually there is a feature just labeled class And looks interesting. CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek Exploring with Weka Going to go through a different tutorial which uses the Explorer interface The tutorial is at http://www.ibm.com/developerworks/opens ource/library/os-weka2/index.html It uses data which can be downloaded at the Download section about 2/3 of the way down the page. CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek Decision Tree Again The first part of the tutorial creates a decision tree using J48, as in the Knowledge Flow Tutorial. This should give exactly the same results as the KnowledgeFlow approach; it’s just a different interface. Which did you find easier? Try it on the data set you chose earlier. How well did it do? CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek Clustering The second part of the tutorial uses a simpleKMeans cluster algorithm. Try it on the sample data they provide. Do the results for their data make sense? Set the number of clusters to 2 and try it on the data set you chose. Do the results make sense? Do the two clusters match the two classes in your data? Try it again removing the “class” feature. Do you still get reasonable results? CSC 8520 Spring 2013. Paula Matuszek
CSC 8520 Spring 2013. Paula Matuszek Explore! Go ahead and try some of the other capabilities in Weka. CSC 8520 Spring 2013. Paula Matuszek