Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA - a reminder (?) based on notes by
10/25/2015University of Waikato2 WEKA: the bird Copyright: Martin Kramer
10/25/2015University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
10/25/2015University of Waikato4 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book 1 st edition WEKA 3.2: “GUI version” adds graphical user interfaces (earlier version is command-line only) WEKA on SoC linux and ISS windows This talk is based on snapshots of WEKA 3.3 … with some extra up-to-date snapshots Only changes are “layout” and some extras
10/25/2015University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
10/25/2015University of age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
10/25/2015University of Waikato7
10/25/2015University of Waikato8
10/25/2015University of Waikato9
10/25/2015University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” BUT it may be easier to reformat to ARFF yourself (write a program in python / java … or just use WordPad to type in the text – but make sure format is right!), this helps with data understanding
10/25/2015University of Waikato11
10/25/2015University of Waikato12
10/25/2015University of Waikato13
10/25/2015University of Waikato14
10/25/2015University of Waikato15
10/25/2015University of Waikato16
10/25/2015University of Waikato17
10/25/2015University of Waikato18 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … You explore by trying different classifiers, see which works best for you…
10/25/2015University of Waikato19
10/25/2015University of Waikato20
10/25/2015University of Waikato21
10/25/2015University of Waikato22
10/25/2015University of Waikato23
10/25/2015University of Waikato24
10/25/2015University of Waikato25
10/25/2015University of Waikato26
10/25/2015University of Waikato27
10/25/2015University of Waikato28
10/25/2015University of Waikato29
10/25/2015University of Waikato30
10/25/2015University of Waikato31
10/25/2015University of Waikato32
10/25/2015University of Waikato33
10/25/2015University of Waikato34
10/25/2015University of Waikato35
10/25/2015University of Waikato36
10/25/2015University of Waikato37
10/25/2015University of Waikato38
10/25/2015University of Waikato39
10/25/2015University of Waikato40
10/25/2015University of Waikato41
10/25/2015University of Waikato42
10/25/2015University of Waikato43
10/25/2015University of Waikato44
10/25/2015University of Waikato45
10/25/2015University of Waikato46
10/25/2015University of Waikato47
10/25/2015University of Waikato48
10/25/2015University of Waikato49
10/25/2015University of Waikato50
10/25/2015University of Waikato51
10/25/2015University of Waikato52
10/25/2015University of Waikato53 WEKA has more… Clustering data into groups Finding associations between attributes Visualisation - online analytical processing Experimenter to run and compare different MLs Knowledge Flow GUI 3 rd -party add-ons: sourceforge.net
WEKA from ISS PC 2009
@relation center centre centerpercent color colour colorpercent english 1,32,3, 0,20,0, UK 0,25,0, 0,12,0, UK 9,27,25, 0,84,0, UK 0,19,0, 0,24,0, UK 0,16,0, 0,14,0, UK 0,16,0, 0,12,0, UK 0,21,0, 0,38,0, UK 0,25,0, 0,34,0, UK 2,26,7, 2,3,40, UK 2,32,5, 1,59,2, UK 31,0,100, 55,0,100, US 61,0,100, 26,0,100, US 24,0,100, 11,0,100, US 12,1,92, 21,4,84, US 8,0,100, 4,2,67, US 10,0,100, 8,0,100, US 19,0,100, 22,0,100, US 14,0,100, 7,0,100, US 14,0,100, 6,0,100, US 8,5,62, 24,0,100, US
@relation center centre centerpercent color colour colorpercent english 10,5,33, 0,20,0, UK
10/25/2015University of Waikato76 WEKA has more… Clustering data into groups Finding associations between attributes Visualisation - online analytical processing Experimenter to run and compare different MLs Knowledge Flow GUI 3 rd -party add-ons: sourceforge.net