Copyright: Martin Kramer (mkramer@wxs.nl) WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl)
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... Flat file in ARFF format
WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... numeric attribute nominal attribute
iris.arff @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-virginica} @DATA 5.1, 3.5, 1.4, 0.2, Iris-setosa 4.9, 3.0, 1.4, 0.2, Iris-setosa … 7.0, 3.2, 4.7, 1.4, Iris-versicolor 6.3, 3.3, 6.0, 2.5, Iris-virginica
Start Weka 3.4 under Windows.
Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC, Java Database Connectivity ) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
Selected attribute entries Name. The name of the attribute, the same as that given in the attribute list. Type. The type of attribute, most commonly Nominal or Numeric. Missing. The number (and percentage) of instances in the data for which this attribute is missing (unspecified). Distinct. The number of different values that the data contains for this attribute. Unique. The number (and percentage) of instances in the data having a value for this attribute that no other instances have. Unshared values.