Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA
2016/7/10University of Waikato2 WEKA: the bird Copyright: Martin Kramer
2016/7/10University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
2016/7/10University of Waikato4 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
2016/7/10University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
2016/7/10University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
2016/7/10University of Waikato7
2016/7/10University of Waikato8
2016/7/10University of Waikato9
2016/7/10University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
2016/7/10University of Waikato11
2016/7/10University of Waikato12 1 2
2016/7/10University of Waikato13
2016/7/10University of Waikato14
2016/7/10University of Waikato15
2016/7/10University of Waikato16
2016/7/10University of Waikato17
2016/7/10University of Waikato18
2016/7/10University of Waikato19
2016/7/10University of Waikato20
2016/7/10University of Waikato21
2016/7/10University of Waikato22
2016/7/10University of Waikato23
2016/7/10University of Waikato24
2016/7/10University of Waikato25
2016/7/10University of Waikato26
2016/7/10University of Waikato27
2016/7/10University of Waikato28
2016/7/10University of Waikato29
2016/7/10University of Waikato30
2016/7/10University of Waikato31
2016/7/10University of Waikato32 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …
2016/7/10University of Waikato33
2016/7/10University of Waikato34 1 2
2016/7/10University of Waikato35
2016/7/10University of Waikato36
2016/7/10University of Waikato37
2016/7/10University of Waikato38
2016/7/10University of Waikato39
2016/7/10University of Waikato40
2016/7/10University of Waikato41
2016/7/10University of Waikato42
2016/7/10University of Waikato43
2016/7/10University of Waikato44
2016/7/10University of Waikato45
2016/7/10University of Waikato46
2016/7/10University of Waikato47
2016/7/10University of Waikato48
2016/7/10University of Waikato49
2016/7/10University of Waikato50
2016/7/10University of Waikato51
2016/7/10University of Waikato52
2016/7/10University of Waikato53
2016/7/10University of Waikato54
2016/7/10University of Waikato55
2016/7/10University of Waikato56
2016/7/10University of Waikato57
2016/7/10University of Waikato58 1 2
2016/7/10University of Waikato59
2016/7/10University of Waikato60
2016/7/10University of Waikato61 Node 0 Node 1 Node 2 Node 3 Node 4 Node 5
2016/7/10University of Waikato62 1 2
2016/7/10University of Waikato63 2 1
2016/7/10University of Waikato64 1 2
2016/7/10University of Waikato
2016/7/10University of Waikato66
2016/7/10University of Waikato67 1 2
2016/7/10University of Waikato68
2016/7/10University of Waikato69
2016/7/10University of Waikato70
2016/7/10University of Waikato71
2016/7/10University of Waikato72
2016/7/10University of Waikato73 ROC curve
2016/7/10University of Waikato74
2016/7/10University of Waikato75 Use a numeric attribute as output
2016/7/10University of Waikato76
2016/7/10University of Waikato77
2016/7/10University of Waikato M5 Model trees and rules: Combines a decision tree with linear regression Use M5P to predict petal length of a iris flower
2016/7/10University of Waikato79 Generate 3 models Model 1 Model 2 Model 3
2016/7/10University of Waikato80 Linear Model 1 Linear Model 2 Linear Model 3
2016/7/10University of Waikato81 right click “Visualize classifier error”
2016/7/10University of Waikato82
2016/7/10University of Waikato83 Click a data point to show the data window
2016/7/10University of Waikato84
2016/7/10University of Waikato85
2016/7/10University of Waikato86
2016/7/10University of Waikato87 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on log likelihood if clustering scheme produces a probability distribution
2016/7/10University of Waikato88
2016/7/10University of Waikato89 1 2
2016/7/10University of Waikato90
2016/7/10University of Waikato91
2016/7/10University of Waikato
2016/7/10University of Waikato93 Enter 3 clusters
2016/7/10University of Waikato94
2016/7/10University of Waikato95
2016/7/10University of Waikato96
2016/7/10University of Waikato97
2016/7/10University of Waikato98 Right click
2016/7/10University of Waikato99
2016/7/10University of Waikato100
2016/7/10University of Waikato101 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence
2016/7/10University of Waikato102
2016/7/10University of Waikato103
2016/7/10University of Waikato104 Load vote data set
2016/7/10University of Waikato105
2016/7/10University of Waikato106
2016/7/10University of Waikato107
2016/7/10University of Waikato108 Expand the Window
2016/7/10University of Waikato109 minimum confidence = 0.9 minimum support = 0.45 Change parameters 2 10 rules
2016/7/10University of Waikato110 minimum confidence minimum support Set number of rules=15
2016/7/10University of Waikato rules
2016/7/10University of Waikato112 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two
2016/7/10University of Waikato113
2016/7/10University of Waikato114
2016/7/10University of Waikato115
2016/7/10University of Waikato116
2016/7/10University of Waikato117
2016/7/10University of Waikato118
2016/7/10University of Waikato119
2016/7/10University of Waikato120
2016/7/10University of Waikato121 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1D) and pairs of attributes (2D) To do: rotating 3D visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function
2016/7/10University of Waikato122 Load Glass data set 1 2
2016/7/10University of Waikato123
2016/7/10University of Waikato Change PointSize
2016/7/10University of Waikato125
2016/7/10University of Waikato Change PlotSize
2016/7/10University of Waikato127 PlotSize is changed
2016/7/10University of Waikato128 Double click to enlarge
2016/7/10University of Waikato129
2016/7/10University of Waikato130
2016/7/10University of Waikato131 Zoom-in 1 2
2016/7/10University of Waikato132
2016/7/10University of Waikato133
2016/7/10University of Waikato134
2016/7/10University of Waikato135 Performing experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in!
2016/7/10University of Waikato136
2016/7/10University of Waikato137
2016/7/10University of Waikato Add a few data sets 2. Add a few algorithms
2016/7/10University of Waikato139
2016/7/10University of Waikato140
2016/7/10University of Waikato141 Result message Running status
2016/7/10University of Waikato142
2016/7/10University of Waikato143
2016/7/10University of Waikato144
2016/7/10University of Waikato145
2016/7/10University of Waikato146
2016/7/10University of Waikato147
2016/7/10University of Waikato148 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later
2016/7/10University of Waikato149
2016/7/10University of Waikato150
2016/7/10University of Waikato151
2016/7/10University of Waikato152
2016/7/10University of Waikato153
2016/7/10University of Waikato154
2016/7/10University of Waikato155
2016/7/10University of Waikato156
2016/7/10University of Waikato157
2016/7/10University of Waikato158
2016/7/10University of Waikato159
2016/7/10University of Waikato160
2016/7/10University of Waikato161
2016/7/10University of Waikato162
2016/7/10University of Waikato163
2016/7/10University of Waikato164
2016/7/10University of Waikato165
2016/7/10University of Waikato166
2016/7/10University of Waikato167
2016/7/10University of Waikato168
2016/7/10University of Waikato169 Conclusion: try it yourself! WEKA is available at Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang