Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.

Similar presentations


Presentation on theme: "Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression."— Presentation transcript:

1 Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Flow GUI Conclusions Machine Learning with WEKA

2 6/9/2015University of Waikato2 WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl)

3 6/9/2015University of Waikato3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features:  Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods  Graphical user interfaces (incl. data visualization)  Environment for comparing learning algorithms

4 6/9/2015University of Waikato4 WEKA: versions There are several versions of WEKA:  WEKA 3.0: “book version” compatible with description in data mining book  WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)  WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)

5 6/9/2015University of Waikato5 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

6 6/9/2015University of Waikato6 @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

7 6/9/2015University of Waikato7

8 6/9/2015University of Waikato8

9 6/9/2015University of Waikato9

10 6/9/2015University of Waikato10 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for:  Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

11 6/9/2015University of Waikato11

12 6/9/2015University of Waikato12

13 6/9/2015University of Waikato13

14 6/9/2015University of Waikato14

15 6/9/2015University of Waikato15

16 6/9/2015University of Waikato16

17 6/9/2015University of Waikato17

18 6/9/2015University of Waikato18

19 6/9/2015University of Waikato19

20 6/9/2015University of Waikato20

21 6/9/2015University of Waikato21

22 6/9/2015University of Waikato22

23 6/9/2015University of Waikato23

24 6/9/2015University of Waikato24

25 6/9/2015University of Waikato25

26 6/9/2015University of Waikato26

27 6/9/2015University of Waikato27

28 6/9/2015University of Waikato28

29 6/9/2015University of Waikato29

30 6/9/2015University of Waikato30

31 6/9/2015University of Waikato31

32 6/9/2015University of Waikato32 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include:  Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include:  Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

33 6/9/2015University of Waikato33

34 6/9/2015University of Waikato34

35 6/9/2015University of Waikato35

36 6/9/2015University of Waikato36

37 6/9/2015University of Waikato37

38 6/9/2015University of Waikato38

39 6/9/2015University of Waikato39

40 6/9/2015University of Waikato40

41 6/9/2015University of Waikato41

42 6/9/2015University of Waikato42

43 6/9/2015University of Waikato43

44 6/9/2015University of Waikato44

45 6/9/2015University of Waikato45

46 6/9/2015University of Waikato46

47 6/9/2015University of Waikato47

48 6/9/2015University of Waikato48

49 6/9/2015University of Waikato49

50 6/9/2015University of Waikato50

51 6/9/2015University of Waikato51

52 6/9/2015University of Waikato52

53 6/9/2015University of Waikato53

54 6/9/2015University of Waikato54

55 6/9/2015University of Waikato55

56 6/9/2015University of Waikato56

57 6/9/2015University of Waikato57

58 6/9/2015University of Waikato58

59 6/9/2015University of Waikato59

60 6/9/2015University of Waikato60

61 6/9/2015University of Waikato61

62 6/9/2015University of Waikato62

63 6/9/2015University of Waikato63

64 6/9/2015University of Waikato64

65 6/9/2015University of Waikato65

66 6/9/2015University of Waikato66

67 6/9/2015University of Waikato67

68 6/9/2015University of Waikato68

69 6/9/2015University of Waikato69

70 6/9/2015University of Waikato70

71 6/9/2015University of Waikato71

72 6/9/2015University of Waikato72

73 6/9/2015University of Waikato73

74 6/9/2015University of Waikato74

75 6/9/2015University of Waikato75

76 6/9/2015University of Waikato76

77 6/9/2015University of Waikato77

78 6/9/2015University of Waikato78

79 6/9/2015University of Waikato79

80 6/9/2015University of Waikato80

81 6/9/2015University of Waikato81

82 6/9/2015University of Waikato82

83 6/9/2015University of Waikato83

84 6/9/2015University of Waikato84

85 6/9/2015University of Waikato85

86 6/9/2015University of Waikato86

87 6/9/2015University of Waikato87

88 6/9/2015University of Waikato88

89 6/9/2015University of Waikato89

90 6/9/2015University of Waikato90

91 6/9/2015University of Waikato91

92 6/9/2015University of Waikato92 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are:  k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution

93 6/9/2015University of Waikato93

94 6/9/2015University of Waikato94

95 6/9/2015University of Waikato95

96 6/9/2015University of Waikato96

97 6/9/2015University of Waikato97

98 6/9/2015University of Waikato98

99 6/9/2015University of Waikato99

100 6/9/2015University of Waikato100

101 6/9/2015University of Waikato101

102 6/9/2015University of Waikato102

103 6/9/2015University of Waikato103

104 6/9/2015University of Waikato104

105 6/9/2015University of Waikato105

106 6/9/2015University of Waikato106

107 6/9/2015University of Waikato107

108 6/9/2015University of Waikato108 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules  Works only with discrete data Can identify statistical dependencies between groups of attributes:  milk, butter  bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence

109 6/9/2015University of Waikato109

110 6/9/2015University of Waikato110

111 6/9/2015University of Waikato111

112 6/9/2015University of Waikato112

113 6/9/2015University of Waikato113

114 6/9/2015University of Waikato114

115 6/9/2015University of Waikato115

116 6/9/2015University of Waikato116 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts:  A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking  An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two

117 6/9/2015University of Waikato117

118 6/9/2015University of Waikato118

119 6/9/2015University of Waikato119

120 6/9/2015University of Waikato120

121 6/9/2015University of Waikato121

122 6/9/2015University of Waikato122

123 6/9/2015University of Waikato123

124 6/9/2015University of Waikato124

125 6/9/2015University of Waikato125 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)  To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function

126 6/9/2015University of Waikato126

127 6/9/2015University of Waikato127

128 6/9/2015University of Waikato128

129 6/9/2015University of Waikato129

130 6/9/2015University of Waikato130

131 6/9/2015University of Waikato131

132 6/9/2015University of Waikato132

133 6/9/2015University of Waikato133

134 6/9/2015University of Waikato134

135 6/9/2015University of Waikato135

136 6/9/2015University of Waikato136

137 6/9/2015University of Waikato137

138 6/9/2015University of Waikato138 Performing experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in!

139 6/9/2015University of Waikato139

140 6/9/2015University of Waikato140

141 6/9/2015University of Waikato141

142 6/9/2015University of Waikato142

143 6/9/2015University of Waikato143

144 6/9/2015University of Waikato144

145 6/9/2015University of Waikato145

146 6/9/2015University of Waikato146

147 6/9/2015University of Waikato147

148 6/9/2015University of Waikato148

149 6/9/2015University of Waikato149

150 6/9/2015University of Waikato150

151 6/9/2015University of Waikato151

152 6/9/2015University of Waikato152 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later

153 6/9/2015University of Waikato153

154 6/9/2015University of Waikato154

155 6/9/2015University of Waikato155

156 6/9/2015University of Waikato156

157 6/9/2015University of Waikato157

158 6/9/2015University of Waikato158

159 6/9/2015University of Waikato159

160 6/9/2015University of Waikato160

161 6/9/2015University of Waikato161

162 6/9/2015University of Waikato162

163 6/9/2015University of Waikato163

164 6/9/2015University of Waikato164

165 6/9/2015University of Waikato165

166 6/9/2015University of Waikato166

167 6/9/2015University of Waikato167

168 6/9/2015University of Waikato168

169 6/9/2015University of Waikato169

170 6/9/2015University of Waikato170

171 6/9/2015University of Waikato171

172 6/9/2015University of Waikato172

173 6/9/2015University of Waikato173 Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka  Also has a list of projects based on WEKA  WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer, Brent Martin, Peter Flach, Eibe Frank,Gabi Schmidberger,Ian H. Witten, J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall,Remco Bouckaert, Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang


Download ppt "Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression."

Similar presentations


Ads by Google