Weka Free and Open Source ML Suite Ian Witten & Eibe Frank

Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
University of Waikato

Overview Classifiers, Regressors, and clusterers
Multiple evaluation schemes Bagging and Boosting Feature Selection Experimenter Visualizer Text not up to date. They welcome additions.

Learning Tasks Classification: given examples labelled from a finite domain, generate a procedure for labelling unseen examples. Regression: given examples labelled with a real value, generate procedure for labelling unseen examples. Clustering: from a set of examples, partitioning examples into “interesting” groups. What scientists want.

Data Format: IRIS @RELATION iris @ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa Etc. General from @atttribute attribute-name REAL or list of values

J48 = Decision Tree petalwidth <= 0.6: Iris-setosa (50.0) : # under node petalwidth > # ..number wrong | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0)

Cross-validation Correctly Classified Instances 143 95.3%
Incorrectly Classified Instances % Default 10-fold cross validation i.e. Split data into 10 equal sized pieces Train on 9 pieces and test on remainder Do for all possibilities and average

J48 Confusion Matrix Old data set from statistics: 50 of each class
a b c <-- classified as | a = Iris-setosa | b = Iris-versicolor | c = Iris-virginica

Other Evaluation Schemes
Leave-one-out cross-validation Cross-validation where n = number of training instanced Specific train and test set Allows for exact replication Ok if train/test large, e.g. 10,000 range.

Bootstrap sampling Randomly select n with replacement from n
Expect about 2/3 to be chosen for training Prob of not chosen = (1-1/n)^n ~ 1/e. Testing on remainder Repeat about 30 times and average. Avoids partition bias

Weka Free and Open Source ML Suite Ian Witten & Eibe Frank

Similar presentations

Presentation on theme: "Weka Free and Open Source ML Suite Ian Witten & Eibe Frank"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Weka Free and Open Source ML Suite Ian Witten & Eibe Frank

Similar presentations

Presentation on theme: "Weka Free and Open Source ML Suite Ian Witten & Eibe Frank"— Presentation transcript:

Similar presentations

About project

Feedback