Presentation is loading. Please wait.

Presentation is loading. Please wait.

W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.

Similar presentations


Presentation on theme: "W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005."— Presentation transcript:

1 W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005

2 Goals Aquisition of functional knowledge about the WEKA platform Ability of processing (own) data in WEKA identify a problem transform into data choose appropriate DM technique apply to dataevaluate results interpretation

3 Some basic facts about WEKA: WEKA(1)WEKA(1) = a flightless bird with an inquisitive nature (found only on the islands of New Zealand) WEKA(2)WEKA(2) = a software ‘workbench’ incorporating several standard ML/DM techniques AuthorsAuthors = Ian H. Witten, Eibe Frank (et. al.) Programming languageProgramming language = JAVA OriginOrigin = The University of Waikato, New Zealand LiteratureLiterature = Ian H. Witten, Eibe Frank: Practical Machine Learning Tools with JAVA Implementations, Morgan Kaufmann, 1999 HomepageHomepage = http://www.cs.waikato.ac.nz/~ml/wekahttp://www.cs.waikato.ac.nz/~ml/weka What is WEKA ?

4 make ML/DM techniques generally available apply them to practical problems (in agriculture) develop new ML/DM algorithms contribute to the theoretical framework of the field (ML/DM) Objectives of WEKA

5 Versions of WEKA There are several versions of WEKA: –WEKA 3.0: “book version” compatible with description in data mining book –WEKA 3.2: “first GUI version” adds graphical user interfaces (book version is command-line only) –WEKA 3.5: “development version” with lots of improvements This workshop is based on WEKA 3.5(.2)

6 Outline WEKA on the WEB Transforming data into the “right” format Using the “Explorer” WEKA from the command-line (Simple CLI) Knowledge flow in brief Performing the experiments Tips & tricks The PRO’s and the CON’s of WEKA

7 WEKA on the WEB

8 ARFF ( A ttribute- R elation F ile F ormat ) format - “flat” files : example: Play-tennis domain The input to WEKA %this is an example of a knowledge %domain in ARFF format @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes... Conversion to the ARFF format ? Example: converting from MS-EXCEL to ARFF

9 Starting WEKA – the GUI

10 Preprocess panel A quick tour of the “explorer” Domain info. panel Attributes panel Status bar Filters panel Attribute info. panel Log file Attribute visualization panel

11 Classify panel Classifier panel Class attribute Output panel Test options panel Result panel A quick tour of the “explorer”

12 Visualize panel A quick tour of the “explorer”

13 example: The command line C:\Temp>java weka.classifiers.trees.J48 Weka exception: No training file and no object input file given. General options: -t Sets training file. -T Sets test file. If missing, a cross-validation will be performed on the training data. -c Sets index of class attribute (default: last). -x Sets number of folds for cross-validation (default: 10). -s Sets random number seed for cross-validation (default: 1). -m Sets file with cost matrix. -l Sets model input file. -d Sets model output file. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -i Outputs detailed information-retrieval statistics for each class. -k Outputs information-theoretic statistics. -p Only outputs predictions for test instances. -r Only outputs cumulative margin distribution. -z Only outputs the source representation of the classifier, giving it the supplied name. -g Only outputs the graph representation of the classifier. Options specific to weka.classifiers.j48.J48: -U Use unpruned tree. -C Set confidence threshold for pruning. (default 0.25) -M Set minimum number of instances per leaf. (default 2) -R Use reduced error pruning. -N Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3) -B Use binary splits only. -S Don't perform subtree raising. -L Do not clean up after the tree has been built.

14 Using the “Simple CLI”

15 The “flow of knowledge”

16 Performing the experiments

17 Tips & tricks More memory: java -mx100000000 -oss100000000... Converting to ARFF & verify: java weka.core.converters.CSVLoader filename.csv > filename.arff java weka.core.Instances filename.arff Checking available memory: –rigth-clich on the status bar

18 GUI (+): visualisation of data and (some) models GUI (-): not all the parameters can be set (reduced functionality) GUI vs. command line Command line (-): only textual visualisation of models awkward to use Command line (+): full functionality batch processing

19 PROs: open source (GNU licence) platform-independent (JAVA) easy to use (relatively) easy to modify PROs & CONs of WEKA CONs: relatively slow (JAVA) ‘incomplete’ documentation (some GUI features could be explained better) some features available only from command line

20 That’s it !!! Thanks


Download ppt "W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005."

Similar presentations


Ads by Google