Download presentation
Presentation is loading. Please wait.
Published byDewi Dharmawijaya Modified over 6 years ago
1
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
University of Waikato
2
Overview Classifiers, Regressors, and clusterers
Multiple evaluation schemes Bagging and Boosting Feature Selection Experimenter Visualizer Text not up to date. They welcome additions.
3
Learning Tasks Classification: given examples labelled from a finite domain, generate a procedure for labelling unseen examples. Regression: given examples labelled with a real value, generate procedure for labelling unseen examples. Clustering: from a set of examples, partitioning examples into “interesting” groups. What scientists want.
4
Data Format: IRIS @RELATION iris @ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa Etc. General from @atttribute attribute-name REAL or list of values
5
J48 = Decision Tree petalwidth <= 0.6: Iris-setosa (50.0) : # under node petalwidth > # ..number wrong | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0)
6
Cross-validation Correctly Classified Instances 143 95.3%
Incorrectly Classified Instances % Default 10-fold cross validation i.e. Split data into 10 equal sized pieces Train on 9 pieces and test on remainder Do for all possibilities and average
7
J48 Confusion Matrix Old data set from statistics: 50 of each class
a b c <-- classified as | a = Iris-setosa | b = Iris-versicolor | c = Iris-virginica
8
Other Evaluation Schemes
Leave-one-out cross-validation Cross-validation where n = number of training instanced Specific train and test set Allows for exact replication Ok if train/test large, e.g. 10,000 range.
9
Bootstrap sampling Randomly select n with replacement from n
Expect about 2/3 to be chosen for training Prob of not chosen = (1-1/n)^n ~ 1/e. Testing on remainder Repeat about 30 times and average. Avoids partition bias
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.