Download presentation
Presentation is loading. Please wait.
Published byDustin Hodges Modified over 9 years ago
1
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA
2
11/10/2015University of Waikato2 WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl)
3
11/10/2015University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms
4
11/10/2015University of Waikato4 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
5
11/10/2015University of Waikato5 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
6
11/10/2015University of Waikato6 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files
7
11/10/2015University of Waikato7
8
11/10/2015University of Waikato8
9
11/10/2015University of Waikato9
10
11/10/2015University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
11
11/10/2015University of Waikato11
12
11/10/2015University of Waikato12
13
11/10/2015University of Waikato13
14
11/10/2015University of Waikato14
15
11/10/2015University of Waikato15
16
11/10/2015University of Waikato16
17
11/10/2015University of Waikato17
18
11/10/2015University of Waikato18
19
11/10/2015University of Waikato19
20
11/10/2015University of Waikato20
21
11/10/2015University of Waikato21
22
11/10/2015University of Waikato22
23
11/10/2015University of Waikato23
24
11/10/2015University of Waikato24
25
11/10/2015University of Waikato25
26
11/10/2015University of Waikato26
27
11/10/2015University of Waikato27
28
11/10/2015University of Waikato28
29
11/10/2015University of Waikato29
30
11/10/2015University of Waikato30
31
11/10/2015University of Waikato31
32
11/10/2015University of Waikato32 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …
33
11/10/2015University of Waikato33
34
11/10/2015University of Waikato34
35
11/10/2015University of Waikato35
36
11/10/2015University of Waikato36
37
11/10/2015University of Waikato37
38
11/10/2015University of Waikato38
39
11/10/2015University of Waikato39
40
11/10/2015University of Waikato40
41
11/10/2015University of Waikato41
42
11/10/2015University of Waikato42
43
11/10/2015University of Waikato43
44
11/10/2015University of Waikato44
45
11/10/2015University of Waikato45
46
11/10/2015University of Waikato46
47
11/10/2015University of Waikato47
48
11/10/2015University of Waikato48
49
11/10/2015University of Waikato49
50
11/10/2015University of Waikato50
51
11/10/2015University of Waikato51
52
11/10/2015University of Waikato52
53
11/10/2015University of Waikato53
54
11/10/2015University of Waikato54
55
11/10/2015University of Waikato55
56
11/10/2015University of Waikato56
57
11/10/2015University of Waikato57
58
11/10/2015University of Waikato58
59
11/10/2015University of Waikato59
60
11/10/2015University of Waikato60
61
11/10/2015University of Waikato61
62
11/10/2015University of Waikato62
63
11/10/2015University of Waikato63
64
11/10/2015University of Waikato64
65
11/10/2015University of Waikato65
66
11/10/2015University of Waikato66
67
11/10/2015University of Waikato67
68
11/10/2015University of Waikato68
69
11/10/2015University of Waikato69
70
11/10/2015University of Waikato70
71
11/10/2015University of Waikato71
72
11/10/2015University of Waikato72
73
11/10/2015University of Waikato73
74
11/10/2015University of Waikato74
75
11/10/2015University of Waikato75
76
11/10/2015University of Waikato76
77
11/10/2015University of Waikato77
78
11/10/2015University of Waikato78
79
11/10/2015University of Waikato79
80
11/10/2015University of Waikato80
81
11/10/2015University of Waikato81
82
11/10/2015University of Waikato82
83
11/10/2015University of Waikato83
84
11/10/2015University of Waikato84
85
11/10/2015University of Waikato85
86
11/10/2015University of Waikato86
87
11/10/2015University of Waikato87
88
11/10/2015University of Waikato88
89
11/10/2015University of Waikato89
90
11/10/2015University of Waikato90
91
11/10/2015University of Waikato91
92
11/10/2015University of Waikato92 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution
93
11/10/2015University of Waikato93
94
11/10/2015University of Waikato94
95
11/10/2015University of Waikato95
96
11/10/2015University of Waikato96
97
11/10/2015University of Waikato97
98
11/10/2015University of Waikato98
99
11/10/2015University of Waikato99
100
11/10/2015University of Waikato100
101
11/10/2015University of Waikato101
102
11/10/2015University of Waikato102
103
11/10/2015University of Waikato103
104
11/10/2015University of Waikato104
105
11/10/2015University of Waikato105
106
11/10/2015University of Waikato106
107
11/10/2015University of Waikato107
108
11/10/2015University of Waikato108 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence
109
11/10/2015University of Waikato109
110
11/10/2015University of Waikato110
111
11/10/2015University of Waikato111
112
11/10/2015University of Waikato112
113
11/10/2015University of Waikato113
114
11/10/2015University of Waikato114
115
11/10/2015University of Waikato115
116
11/10/2015University of Waikato116 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two
117
11/10/2015University of Waikato117
118
11/10/2015University of Waikato118
119
11/10/2015University of Waikato119
120
11/10/2015University of Waikato120
121
11/10/2015University of Waikato121
122
11/10/2015University of Waikato122
123
11/10/2015University of Waikato123
124
11/10/2015University of Waikato124
125
11/10/2015University of Waikato125 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function
126
11/10/2015University of Waikato126
127
11/10/2015University of Waikato127
128
11/10/2015University of Waikato128
129
11/10/2015University of Waikato129
130
11/10/2015University of Waikato130
131
11/10/2015University of Waikato131
132
11/10/2015University of Waikato132
133
11/10/2015University of Waikato133
134
11/10/2015University of Waikato134
135
11/10/2015University of Waikato135
136
11/10/2015University of Waikato136
137
11/10/2015University of Waikato137
138
11/10/2015University of Waikato138 Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.