Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools.

Similar presentations


Presentation on theme: "CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools."— Presentation transcript:

1

2 CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools

3 Explorer: Pre-Processing the Data  Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary  Data can also be read from a URL or from an SQL database (using JDBC)  Pre-processing tools in WEKA are called “filters”  WEKA contains filters for:  Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 2

4 3 WEKA GUI Chooser

5 4 Preprocessing Data in WEKA

6 5

7 6

8 7

9 8

10 9

11 10 Preprocessing Data in WEKA

12 11 Preprocessing Data in WEKA

13 12 Preprocessing Data in WEKA

14 13 Preprocessing Data in WEKA

15 14 Preprocessing Data in WEKA

16 15 Preprocessing Data in WEKA

17 16 Preprocessing Data in WEKA

18 17 Preprocessing Data in WEKA

19 18 Preprocessing Data in WEKA

20 19 Preprocessing Data in WEKA

21 20 Preprocessing Data in WEKA

22 21 Preprocessing Data in WEKA

23 22 Preprocessing Data in WEKA

24 23 Preprocessing Data in WEKA

25 WEKA Explorer: Attribute Selection  Panel that can be used to investigate which (subsets of) attributes are the most predictive ones  Attribute selection methods contain two parts:  A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking  An evaluation method: correlation-based, wrapper, information gain, chi-squared, …  Very flexible: WEKA allows (almost) arbitrary combinations of these two 24

26 25 Attribute Selection in WEKA

27 26 Attribute Selection in WEKA

28 27 Attribute Selection in WEKA

29 28 Attribute Selection in WEKA

30 29 Attribute Selection in WEKA

31 30 Attribute Selection in WEKA

32 31 Attribute Selection in WEKA

33 32 Attribute Selection in WEKA

34 33 Attribute Selection in WEKA

35 Explorer: Data Visualization  Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem  WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)  Color-coded class values  “Jitter” option to deal with nominal attributes (and to detect “hidden” data points)  “Zoom-in” function 34

36 35 Data Visualization in WEKA

37 36 Data Visualization in WEKA

38 37 Data Visualization in WEKA

39 38 Data Visualization in WEKA

40 39 Data Visualization in WEKA

41 40 Data Visualization in WEKA

42 41 Data Visualization in WEKA

43 42 Data Visualization in WEKA

44 43 Data Visualization in WEKA

45 The Mystery Sound  And what would this be? 44


Download ppt "CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools."

Similar presentations


Ads by Google