CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools
Explorer: Pre-Processing the Data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 2
3 WEKA GUI Chooser
4 Preprocessing Data in WEKA
5
6
7
8
9
10 Preprocessing Data in WEKA
11 Preprocessing Data in WEKA
12 Preprocessing Data in WEKA
13 Preprocessing Data in WEKA
14 Preprocessing Data in WEKA
15 Preprocessing Data in WEKA
16 Preprocessing Data in WEKA
17 Preprocessing Data in WEKA
18 Preprocessing Data in WEKA
19 Preprocessing Data in WEKA
20 Preprocessing Data in WEKA
21 Preprocessing Data in WEKA
22 Preprocessing Data in WEKA
23 Preprocessing Data in WEKA
WEKA Explorer: Attribute Selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 24
25 Attribute Selection in WEKA
26 Attribute Selection in WEKA
27 Attribute Selection in WEKA
28 Attribute Selection in WEKA
29 Attribute Selection in WEKA
30 Attribute Selection in WEKA
31 Attribute Selection in WEKA
32 Attribute Selection in WEKA
33 Attribute Selection in WEKA
Explorer: Data Visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 34
35 Data Visualization in WEKA
36 Data Visualization in WEKA
37 Data Visualization in WEKA
38 Data Visualization in WEKA
39 Data Visualization in WEKA
40 Data Visualization in WEKA
41 Data Visualization in WEKA
42 Data Visualization in WEKA
43 Data Visualization in WEKA
The Mystery Sound And what would this be? 44