WEKA Machine Learning Toolbox. You can install Weka on your computer from

Slides:



Advertisements
Similar presentations
Machine Learning Homework
Advertisements

Florida International University COP 4770 Introduction of Weka.
Authorship Verification Authorship Identification Authorship Attribution Stylometry.
Learning Algorithm Evaluation
WEKA - Experimenter (sumber: WEKA Explorer user Guide for Version 3-5-5)
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.
A Short Introduction to Weka Natural Language Processing Thursday, September 25th.
A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg.
Ranga Rodrigo April 5, 2014 Most of the sides are from the Matlab tutorial. 1.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
An Exercise in Machine Learning
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Evaluation – next steps
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
WOW World of Walkover-weight “My God, it’s full of cows!” (David Bowman, 2001)
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experimental Evaluation of Learning Algorithms Part 1.
Hands-on predictive models and machine learning for software Foutse Khomh, Queen’s University Segla Kpodjedo, École Polytechnique de Montreal PASED - Canadian.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Weka: Experimenter and Knowledge Flow interfaces Neil Mac Parthaláin
CpSc 810: Machine Learning Evaluation of Classifier.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.
Evaluating Classifiers
CS 8520: Artificial Intelligence
Prepared by Kimberly Sayre and Jinbo Bi
Features & Decision regions
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Tutorial for WEKA Heejun Kim June 19, 2018.
network of simple neuron-like computing elements
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning in Practice Lecture 7
Learning Chapter 18 and Parts of Chapter 20
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Roc curves By Vittoria Cozza, matr
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Assignment 8 : logistic regression
CS639: Data Management for Data Science
Neural Networks Weka Lab
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

WEKA Machine Learning Toolbox

You can install Weka on your computer from

Click Explorer Open file iris_train.arff You should see the screen on the next page On the top-right, there is an edit window where you can view, edit the arff file On the bottom-left, you see the attributes screen You can select to remove some features On the bottom-right (slide 4), you see the “Visualize all” sub window that shows you the distribution of features and classes

Here we see that there are 19 samples total in the first bin, most of them coming from the blue class and 1 (in this case) each from the other two classes.

Training Choose Classify from Top tabs Choose Classifier -> Trees -> J48 You may edit parameters You will see what the parameters are when you hover over them; leave that for later Test options You have a train file, now you can say how the testing should be: 1.Using training set: This will give you training error after doing a test after training. Should be done just to see training error; does not indicate generalisation performance! 2.Supplied test set: Use the training set for train AND a separate test set (e.g. iris- test.arff) for testing. Those two files must match in number of features etc. 3.Cross-validation: Use k-fold CV on the training data (5 or 10 fold is often good) 4.% split: Split part of the training for testing. Do this only if you have lots and lots of data. Note that the split is random, so I don’t suggest. If you want to split a part for test, do it yourself, so it is not random and you can do it stratified (making sure to take samples from each class, not just randomly) Choose Supplied test set and enter iris-test.arff

Interpreting the Output After you hit Start, training starts and ends with testing. You see the whole info on the right hand side: === Run information === Scheme:weka.classifiers.trees.J48 -C M 20 //The classifier used Relation: whatever Instances: 126 //number of samples/instances in the training data Attributes: 5 petalWidth petalHeight F3 F4 Class Test mode:10-fold cross-validation === Classifier model (full training set) === J48 pruned tree //This is the resulting tree (because I said have at least 20 samples in each leaf, the tree is pretty simple) F4 <= 0.6: Iris-setosa (42.0/1.0) //42 samples of the label (=iris-setosa) and 1 other label (whatever it is) F4 > 0.6 | F4 <= 1.7: Iris-versicolor (47.0/5.0) | F4 > 1.7: Iris-virginica (37.0) Number of Leaves : 3 Size of the tree : 5 Time taken to build model: 0 seconds === Stratified cross-validation === //so it does actually stratified, which is good Correctly Classified Instances % Incorrectly Classified Instances % Relative absolute error % Root relative squared error % Total Number of Instances 126 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class Iris-setosa Iris-versicolor Iris-virginica Weighted Avg === Confusion Matrix === a b c <-- classified as | a = Iris-setosa | b = Iris-versicolor | c = Iris-virginica

Understanding Error Rates & Confusion Matrices These are per-class accuracies. True Positive rate (TP) for iris-setosa means: TP iris-setosa = # correctly classified as iris-setosa / over all iris-setosas = = 40/41 FP iris-setosa = # falsely classified as iris-setosa / over all NON-iris-setosas = = 1/ 85 (yani iris-setosa olmayanların arasından kaçına yanlışlıkla iris-setosa dedi) === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class Iris-setosa Iris-versicolor Iris-virginica Weighted Avg === Confusion Matrix === a b c <-- classified as | a = Iris-setosa //Out of the 41 iris-setosas, 40 are classified as iris-setosa, 1 classified as i-versicolor | b = Iris-versicolor //Out of the 43 iris-versicolor, 39 are classified as iris-versicolor, 1 classified as i-setosa… | c = Iris-virginica …

Result-list All of your runs can be viewed in the bottom-left window They are ordered by time Click on one and you can see its results (on the right hand window) Furthermore, you can right-click on a run, to see several options: Visualize classifier error (see X axis as “actual” class and y-axis as predicted class on the bottom-left image) Visualize tree

Other sources for help: WEKA - Neural Network Tutorial Video or the full WEKA-Reference-tutorial under Lectures/

What To Know File Open (in future, prepare ARFF files) Choose a classifier Specify test set, CV etc. Be able to understand the output (most relevant parts for now): Scheme:weka.classifiers.trees.J48 -C M 2 the used parameter set The given (sideways) tree Error measures: Correctly Classified Instances % Incorrectly Classified Instances % Total Number of Instances 24 Confusion matrix

Results-List Righ-Click Options ctd. Load and Save models are useful when training takes a long time (e.g. neural network or SVM trainings); or when you want to compare a model to a previous run. Note that if a learning algorithm is non-deterministic (e.g. NN starting from different initial weights)