Download presentation
Presentation is loading. Please wait.
Published byCharles Preston Modified over 9 years ago
2
The Weka The Weka is an well known bird of New Zealand.. W(aikato) E(nvironment) for K(nowlegde) A(nalysis) Developed by the University of Waikato in New Zealand It is Comprehensive suite of Java class libraries Implement many state-of-the-art machine learning and data mining algorithms It supports data files like CSV(Comma Separated file), ARFF(Attribute-Relation File Format)…
3
Collection of ML(Machine Learning) algorithms – open-source Java package Schemes for classification include: decision trees, rule learners, naive Bayes, decision tables, locally weighted regression, SVMs, instance-based learners, logistic regression, voted perceptrons, multi-layer perceptron Schemes for numeric prediction include: linear regression, model tree generators, locally weighted regression, instance-based learners, decision tables, multi-layer perceptron Meta-schemes include: Bagging, boosting, stacking, regression via classification, classification via regression, cost sensitive classification Schemes for clustering: EM and Cobweb
4
49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 15 attribute/subset evaluators + 10 search algorithms for feature selection 3 algorithms for finding association rules 3 graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The Knowledge Flow” (new process model inspired interface)
5
Require declarations of @RELATION, @ATTRIBUTE and @DATA @RELATION declaration associates a name with the dataset Syntax: @RELATION E.g. @RELATION stud @ATTRIBUTE declaration specifies the name and type of an attribute Syntax: @attribute Datatype can be numeric, nominal, string or date E. g. @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA declaration is a single line denoting the start of the data segment Missing values are represented by ? @DATA 5.1, 3.5, 1.4, 0.2, Iris-setosa 4.9, ?, 1.4, ?, Iris-versicolor
6
In addition to nominal and numeric attributes, exemplified by the weather data, the ARFF format has two further attribute types: string attributes and date attributes. String attributes have values that are textual. Suppose you have a string attribute that you want to call description. In the block defining the attributes, it is specified as follows: @attribute description string Then, in the instance data, include any character string in quotation marks (to include quotation marks in your string, use the standard convention of preceding each one by a backslash, \). Strings are stored internally in a string table and represented by their address in that table. Thus two strings that contain the same characters will have the same value.
7
In Date attributes are strings with a special format and are introduced like this: @attribute today date (for an attribute called today). Weka, the machine learning software discussed in Part II of this book, uses the ISO-8601 combined date and time format yyyy-MM-dd-THH:mm:ss with four digits for the year, two each for the month and day, then the letter T followed by the time with two digits for each of hours, minutes, and seconds.1 In the data section of the file, dates are specified as the orresponding string representation of the date and time, for example, 2004-04-03T12:00:00. Although they are specified as strings, dates are converted to numeric form when the input file is read. Dates can also be converted internally to different formats, so you can have absolute timestamps in the data file and use transformations to forms such as time of day or day of the week to detect periodic behavior.
8
Similar to AARF files except that data value 0 are not represented Non-zero attributes are specified by attribute number and value For examples of ARFF files see $WEKAHOME/data @data 0, X, 0, Y, “class A” 0, 0, W, 0, "class B" @data {1 X, 3 Y, 4 "class A"} {2 W, 4 "class B"}
9
-t Specify training file represented -T If none, CV is performed on training data -x Number of folds for cross-validation -s For CV -l Use saved model -d Output model to file
10
Internal variables private Should have protected or package- level access SparseInstance for Strings requires dummy at index 0 Problem: Strings are mapped into internal indices to an array String at position 0 is mapped to value “0” When written out as SparseInstance, it will not be written (0 value) If read back in, first String missing from Instances Solution: Put dummy string in position 0 when writing a SparseInstance with strings Dummy will be ignored while writing, actual instance will be written properly
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.