Download presentation
Presentation is loading. Please wait.
Published byPatrick Carpenter Modified over 8 years ago
1
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA
2
2016/7/10University of Waikato2 WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl)
3
2016/7/10University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
4
2016/7/10University of Waikato4 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
5
2016/7/10University of Waikato5 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
6
2016/7/10University of Waikato6 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
7
2016/7/10University of Waikato7
8
2016/7/10University of Waikato8
9
2016/7/10University of Waikato9
10
2016/7/10University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
11
2016/7/10University of Waikato11
12
2016/7/10University of Waikato12 1 2
13
2016/7/10University of Waikato13
14
2016/7/10University of Waikato14
15
2016/7/10University of Waikato15
16
2016/7/10University of Waikato16
17
2016/7/10University of Waikato17
18
2016/7/10University of Waikato18
19
2016/7/10University of Waikato19
20
2016/7/10University of Waikato20
21
2016/7/10University of Waikato21
22
2016/7/10University of Waikato22
23
2016/7/10University of Waikato23
24
2016/7/10University of Waikato24
25
2016/7/10University of Waikato25
26
2016/7/10University of Waikato26
27
2016/7/10University of Waikato27
28
2016/7/10University of Waikato28
29
2016/7/10University of Waikato29
30
2016/7/10University of Waikato30
31
2016/7/10University of Waikato31
32
2016/7/10University of Waikato32 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …
33
2016/7/10University of Waikato33
34
2016/7/10University of Waikato34 1 2
35
2016/7/10University of Waikato35
36
2016/7/10University of Waikato36
37
2016/7/10University of Waikato37
38
2016/7/10University of Waikato38
39
2016/7/10University of Waikato39
40
2016/7/10University of Waikato40
41
2016/7/10University of Waikato41
42
2016/7/10University of Waikato42
43
2016/7/10University of Waikato43
44
2016/7/10University of Waikato44
45
2016/7/10University of Waikato45
46
2016/7/10University of Waikato46
47
2016/7/10University of Waikato47
48
2016/7/10University of Waikato48
49
2016/7/10University of Waikato49
50
2016/7/10University of Waikato50
51
2016/7/10University of Waikato51
52
2016/7/10University of Waikato52
53
2016/7/10University of Waikato53
54
2016/7/10University of Waikato54
55
2016/7/10University of Waikato55
56
2016/7/10University of Waikato56
57
2016/7/10University of Waikato57
58
2016/7/10University of Waikato58 1 2
59
2016/7/10University of Waikato59
60
2016/7/10University of Waikato60
61
2016/7/10University of Waikato61 Node 0 Node 1 Node 2 Node 3 Node 4 Node 5
62
2016/7/10University of Waikato62 1 2
63
2016/7/10University of Waikato63 2 1
64
2016/7/10University of Waikato64 1 2
65
2016/7/10University of Waikato65 2 3 1
66
2016/7/10University of Waikato66
67
2016/7/10University of Waikato67 1 2
68
2016/7/10University of Waikato68
69
2016/7/10University of Waikato69
70
2016/7/10University of Waikato70
71
2016/7/10University of Waikato71
72
2016/7/10University of Waikato72
73
2016/7/10University of Waikato73 ROC curve
74
2016/7/10University of Waikato74
75
2016/7/10University of Waikato75 Use a numeric attribute as output
76
2016/7/10University of Waikato76
77
2016/7/10University of Waikato77
78
2016/7/10University of Waikato78 1 2 M5 Model trees and rules: Combines a decision tree with linear regression Use M5P to predict petal length of a iris flower
79
2016/7/10University of Waikato79 Generate 3 models 1 2 3 Model 1 Model 2 Model 3
80
2016/7/10University of Waikato80 Linear Model 1 Linear Model 2 Linear Model 3
81
2016/7/10University of Waikato81 right click “Visualize classifier error”
82
2016/7/10University of Waikato82
83
2016/7/10University of Waikato83 Click a data point to show the data window
84
2016/7/10University of Waikato84
85
2016/7/10University of Waikato85
86
2016/7/10University of Waikato86
87
2016/7/10University of Waikato87 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on log likelihood if clustering scheme produces a probability distribution
88
2016/7/10University of Waikato88
89
2016/7/10University of Waikato89 1 2
90
2016/7/10University of Waikato90
91
2016/7/10University of Waikato91
92
2016/7/10University of Waikato92 1 3 2
93
2016/7/10University of Waikato93 Enter 3 clusters
94
2016/7/10University of Waikato94
95
2016/7/10University of Waikato95
96
2016/7/10University of Waikato96
97
2016/7/10University of Waikato97
98
2016/7/10University of Waikato98 Right click
99
2016/7/10University of Waikato99
100
2016/7/10University of Waikato100
101
2016/7/10University of Waikato101 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence
102
2016/7/10University of Waikato102
103
2016/7/10University of Waikato103
104
2016/7/10University of Waikato104 Load vote data set
105
2016/7/10University of Waikato105
106
2016/7/10University of Waikato106
107
2016/7/10University of Waikato107
108
2016/7/10University of Waikato108 Expand the Window
109
2016/7/10University of Waikato109 minimum confidence = 0.9 minimum support = 0.45 Change parameters 2 10 rules
110
2016/7/10University of Waikato110 minimum confidence minimum support Set number of rules=15
111
2016/7/10University of Waikato111 15 rules
112
2016/7/10University of Waikato112 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two
113
2016/7/10University of Waikato113
114
2016/7/10University of Waikato114
115
2016/7/10University of Waikato115
116
2016/7/10University of Waikato116
117
2016/7/10University of Waikato117
118
2016/7/10University of Waikato118
119
2016/7/10University of Waikato119
120
2016/7/10University of Waikato120
121
2016/7/10University of Waikato121 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1D) and pairs of attributes (2D) To do: rotating 3D visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function
122
2016/7/10University of Waikato122 Load Glass data set 1 2
123
2016/7/10University of Waikato123
124
2016/7/10University of Waikato124 1 2 Change PointSize
125
2016/7/10University of Waikato125
126
2016/7/10University of Waikato126 1 2 Change PlotSize
127
2016/7/10University of Waikato127 PlotSize is changed
128
2016/7/10University of Waikato128 Double click to enlarge
129
2016/7/10University of Waikato129
130
2016/7/10University of Waikato130
131
2016/7/10University of Waikato131 Zoom-in 1 2
132
2016/7/10University of Waikato132
133
2016/7/10University of Waikato133
134
2016/7/10University of Waikato134
135
2016/7/10University of Waikato135 Performing experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in!
136
2016/7/10University of Waikato136
137
2016/7/10University of Waikato137
138
2016/7/10University of Waikato138 1. Add a few data sets 2. Add a few algorithms
139
2016/7/10University of Waikato139
140
2016/7/10University of Waikato140
141
2016/7/10University of Waikato141 Result message Running status
142
2016/7/10University of Waikato142
143
2016/7/10University of Waikato143
144
2016/7/10University of Waikato144
145
2016/7/10University of Waikato145
146
2016/7/10University of Waikato146
147
2016/7/10University of Waikato147
148
2016/7/10University of Waikato148 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later
149
2016/7/10University of Waikato149
150
2016/7/10University of Waikato150
151
2016/7/10University of Waikato151
152
2016/7/10University of Waikato152
153
2016/7/10University of Waikato153
154
2016/7/10University of Waikato154
155
2016/7/10University of Waikato155
156
2016/7/10University of Waikato156
157
2016/7/10University of Waikato157
158
2016/7/10University of Waikato158
159
2016/7/10University of Waikato159
160
2016/7/10University of Waikato160
161
2016/7/10University of Waikato161
162
2016/7/10University of Waikato162
163
2016/7/10University of Waikato163
164
2016/7/10University of Waikato164
165
2016/7/10University of Waikato165
166
2016/7/10University of Waikato166
167
2016/7/10University of Waikato167
168
2016/7/10University of Waikato168
169
2016/7/10University of Waikato169 Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.