Download presentation
Presentation is loading. Please wait.
Published byOphelia Maxwell Modified over 8 years ago
1
W E K A Waikato Environment for Knowledge Aquisition
2
Goals of the workshop Aquisition of functional knowledge about the WEKA platform Ability of processing (own) data in WEKA Write seminar work identifying a problem transform into data choose appropriate DM technique apply to data evaluate & interpret the results
3
Some basic facts about WEKA: WEKA(1)WEKA(1) = a flightless bird with an inquisitive nature (found only on the islands of New Zealand) WEKA(2)WEKA(2) = a software ‘workbench’ incorporating several standard ML/DM techniques AuthorsAuthors = Ian H. Witten, Eibe Frank (et. al.) Programming languageProgramming language = JAVA OriginOrigin = The University of Waikato, New Zealand LiteratureLiterature = Ian H. Witten, Eibe Frank: Practical Machine Learning Tools with JAVA Implementations, Morgan Kaufmann, 1999 HomepageHomepage = http://www.cs.waikato.ac.nz/~ml/wekahttp://www.cs.waikato.ac.nz/~ml/weka What is WEKA ?
4
make ML/DM techniques generally available apply them to practical problems (in agriculture) develop new ML/DM algorithms contribute to the theoretical framework of the field (ML/DM) Objectives of WEKA
5
Versions of WEKA There are several versions of WEKA: –WEKA 3.0: “book version” compatible with description in data mining book –WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command- line only) –WEKA 3.4: “development version” with lots of improvements This workshop is based on WEKA 3.4(.3)
6
ARFF format (“flat” files) : example: Play-tennis domain The input to WEKA %this is an example of a knowledge %domain in ARFF format @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes... Conversion to the ARFF format ? Example: converting from MS-EXCEL to ARFF
7
Starting WEKA – the GUI
8
Preprocess panel A quick tour of the “explorer” Domain info. panel Attributes panel Status bar Filters panel Attribute info. panel Log file Attribute visualization panel
9
Classify panel Classifier panel Class attribute Output panel Test options panel Result panel A quick tour of the “explorer”
10
Visualize panel A quick tour of the “explorer”
11
example: The command line C:\Temp>java weka.classifiers.trees.J48 Weka exception: No training file and no object input file given. General options: -t Sets training file. -T Sets test file. If missing, a cross-validation will be performed on the training data. -c Sets index of class attribute (default: last). -x Sets number of folds for cross-validation (default: 10). -s Sets random number seed for cross-validation (default: 1). -m Sets file with cost matrix. -l Sets model input file. -d Sets model output file. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -i Outputs detailed information-retrieval statistics for each class. -k Outputs information-theoretic statistics. -p Only outputs predictions for test instances. -r Only outputs cumulative margin distribution. -z Only outputs the source representation of the classifier, giving it the supplied name. -g Only outputs the graph representation of the classifier. Options specific to weka.classifiers.j48.J48: -U Use unpruned tree. -C Set confidence threshold for pruning. (default 0.25) -M Set minimum number of instances per leaf. (default 2) -R Use reduced error pruning. -N Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3) -B Use binary splits only. -S Don't perform subtree raising. -L Do not clean up after the tree has been built.
12
GUI (+): visualisation of data and (some) models GUI (-): not all the parameters can be set (reduced functionality) GUI vs. command line Command line (-): only textual visualisation of models awkward to use Command line (+): full functionality (‘saving the model’) batch processing
13
PROs: open source (GNU licence) platform-independent (JAVA) easy to use (relatively) easy to modify PROs & CONs of WEKA CONs: relatively slow (JAVA) ‘incomplete’ documentation (some GUI features could be explained better) some features available only from command line
14
Let’s go to work
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.