Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA
6/21/2015University of Waikato2 WEKA: the bird Copyright: Martin Kramer
6/21/2015University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
6/21/2015University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
6/21/2015University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
6/21/2015University of Waikato6
6/21/2015University of Waikato7 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
6/21/2015University of Waikato8
6/21/2015University of Waikato9
6/21/2015University of Waikato10
6/21/2015University of Waikato11
6/21/2015University of Waikato12
6/21/2015University of Waikato13
6/21/2015University of Waikato14
6/21/2015University of Waikato15
6/21/2015University of Waikato16
6/21/2015University of Waikato17
6/21/2015University of Waikato18
6/21/2015University of Waikato19
6/21/2015University of Waikato20
6/21/2015University of Waikato21
6/21/2015University of Waikato22
6/21/2015University of Waikato23
6/21/2015University of Waikato24
6/21/2015University of Waikato25
6/21/2015University of Waikato26
6/21/2015University of Waikato27
6/21/2015University of Waikato28
6/21/2015University of Waikato29 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …
6/21/2015University of Waikato30
6/21/2015University of Waikato31
6/21/2015University of Waikato32
6/21/2015University of Waikato33
6/21/2015University of Waikato34
6/21/2015University of Waikato35
6/21/2015University of Waikato36
6/21/2015University of Waikato37
6/21/2015University of Waikato38
6/21/2015University of Waikato39
6/21/2015University of Waikato40
6/21/2015University of Waikato41
6/21/2015University of Waikato42
6/21/2015University of Waikato43
6/21/2015University of Waikato44
6/21/2015University of Waikato45
6/21/2015University of Waikato46
6/21/2015University of Waikato47
6/21/2015University of Waikato48
6/21/2015University of Waikato49
6/21/2015University of Waikato50
6/21/2015University of Waikato51
6/21/2015University of Waikato52
6/21/2015University of Waikato53 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution
6/21/2015University of Waikato54
6/21/2015University of Waikato55
6/21/2015University of Waikato56
6/21/2015University of Waikato57
6/21/2015University of Waikato58
6/21/2015University of Waikato59
6/21/2015University of Waikato60
6/21/2015University of Waikato61
6/21/2015University of Waikato62
6/21/2015University of Waikato63
6/21/2015University of Waikato64
6/21/2015University of Waikato65
6/21/2015University of Waikato66
6/21/2015University of Waikato67
6/21/2015University of Waikato68
6/21/2015University of Waikato69 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence
6/21/2015University of Waikato70
6/21/2015University of Waikato71
6/21/2015University of Waikato72
6/21/2015University of Waikato73
6/21/2015University of Waikato74
6/21/2015University of Waikato75
6/21/2015University of Waikato76
6/21/2015University of Waikato77 Conclusion: try it yourself! WEKA is available at Also has a list of projects based on WEKA