In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Slides:



Advertisements
Similar presentations
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Advertisements

Florida International University COP 4770 Introduction of Weka.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
WEKA (sumber: Machine Learning with WEKA). What is WEKA? Weka is a collection of machine learning algorithms for data mining tasks. Weka contains.
Introduction to Weka and NetDraw
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
An Extended Introduction to WEKA. Data Mining Process.
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
Machine Learning with WEKA. WEKA: the bird Copyright: Martin Kramer
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
An Exercise in Machine Learning
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 01: WEKA Navigation.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Weka: Experimenter and Knowledge Flow interfaces Neil Mac Parthaláin
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Artificial Neural Network Building Using WEKA Software
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
 A collection of open source ML algorithms ◦ pre-processing ◦ classifiers ◦ clustering ◦ association rule  Created by researchers at the University.
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Introduction to Weka ML Seminar for Rookies Byoung-Hee Kim Biointelligence Lab, Seoul National University.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Weka. Weka A Java-based machine vlearning tool Implements numerous classifiers and other ML algorithms Uses a common.
Machine Learning with WEKA - Yohan Chin. WEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning algorithms for data tasks.
WEKA's Knowledge Flow Interface Data Mining Knowledge Discovery in Databases ELIE TCHEIMEGNI Department of Computer Science Bowie State University, MD.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
An Introduction to WEKA
Machine Learning: Decision Trees in AIMA and WEKA
An Introduction to WEKA
Machine Learning: Decision Trees in AIMA and WEKA
Machine Learning with WEKA
Waikato Environment for Knowledge Analysis
WEKA.
Sampath Jayarathna Cal Poly Pomona
An Introduction to WEKA
Machine Learning with WEKA
Machine Learning with WEKA
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Machine Learning with Weka
An Introduction to WEKA
Tutorial for WEKA Heejun Kim June 19, 2018.
Machine Learning with Weka
Machine Learning with WEKA
Lecture 10 – Introduction to Weka
Statistical Learning Introduction to Weka
Copyright: Martin Kramer
Machine Learning: Decision Trees in AIMA and WEKA
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer

What is WEKA? Waikato Environment for Knowledge Analysis A data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand. 2 6/10/2016

How does it works? First, you select a dataset and a Machine learning algorithm You can manipulate the dataset in sevral ways, as we will see. When datset is ready, you select a ML algorithm from the list, ad adjust learning parameters, as we will see When you run a ML algorithm, the system will: 1. Split the data set into training and testing subsets; 2. Learn a classification function C(x) based on examples in the training set; 3. Classify instances x in the test set based on the learned function C(x); 4. Measure the performances by comparing the generated classifications with the “ground truth” in the test set.

Download and Install WEKA Website: Support multiple platforms (written in java): Windows, Mac OS X and Linux 4 6/10/2016

Main Features 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection 5 6/10/2016

Main GUI Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface) Simple CLI- provides users without a graphic interface option the ability to execute commands from a terminal window 6 6/10/2016

Explorer The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources 7 6/10/2016

8 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

6/10/2016 age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

6/10/2016 age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

6/10/2016 University of Waikato 11

6/10/2016 University of Waikato 12

You can either open arff file or convert from other formats

IRIS dataset 150 instances of IRIS (a flower) 5 attributes, one is the classification c(x) 3 classes: iris setosa, iris versicolor, iris virginica

6/10/2016 University of Waikato 17 For any selected attribute you can get statistics

Attribute data Min, max and average value of attributes distribution of values :number of items for which: class: distribution of attribute values in the classes The class (e.g. C(x), the classification function to be learned) is by default THE LAST ATTRIBUTE of the list.

6/10/2016 University of Waikato 20

6/10/2016 University of Waikato 21

6/10/2016 University of Waikato 22

6/10/2016 University of Waikato 23 Here you can see the complete statistics (distribution of values) for all the 5 attributes Here you can see the complete statistics (distribution of values) for all the 5 attributes

Filtering attributes Once the initial data has been selected and loaded the user can select options for refining the experimental data. The options in the preprocess window include selection of optional filters to apply and the user can select or remove different attributes of the data set as necessary to identify specific information (or even write a regex in Perl). The user can modify the attribute selection and change the relationship among the different attributes by deselecting different choices from the original data set. There are many different filtering options available within the preprocessing window and the user can select the different options based on need and type of data present.

6/10/2016 University of Waikato 25

6/10/2016 University of Waikato 26

6/10/2016 University of Waikato 27

6/10/2016 University of Waikato 28

6/10/2016 University of Waikato 29

6/10/2016 University of Waikato 30

6/10/2016 University of Waikato 31

6/10/2016 University of Waikato 32

6/10/2016 University of Waikato 33 Discretizes in 10 bins of equal frequency

6/10/2016 University of Waikato 34 Discretizes in 10 bins of equal frequency

6/10/2016 University of Waikato 35 Discretizes in 10 bins of equal frequency

6/10/2016 University of Waikato 36

6/10/2016 University of Waikato 37

6/10/2016 University of Waikato 38

WILL SEE MORE ON FILTERING DURING FIRST LAB!!

6/10/ Explorer: building “classifiers” “Classifiers” in WEKA are machine learning algorithms for predicting nominal or numeric values of a selected attribute (e.g. the CLASS attribute in the IRIS file) Implemented learning algorithms include: Conjunctive rules, decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … Most, but not all, the algorithms that we will present in this course (e.g. no genetic or reinforcement algorithms)

Explore Conjunctive Rules learner Need a simple dataset with few attributes, let’s select the weather dataset

Select a Classifier

Right-click to select parameters numAntds= number of antecedents, -1= empty rule e.g. ()  class If -1 is selected, you obtain the most likely classification in the datsset numAntds= number of antecedents, -1= empty rule e.g. ()  class If -1 is selected, you obtain the most likely classification in the datsset

Select numAntds=10

Select training method Even if you do not understand for now, select “Cross validation” with 10 Folds. Even if you do not understand for now, select “Cross validation” with 10 Folds.

Select the right hand side of the rule (the classification function)

Run the algorithm

Performance data

Confusion Matrix System classified as aSystem classified as b Truly classified as a# of instances that system classifies a, ground truth is a # of instances that system classifies b, ground truth is a Truly classified as b# of instances that system classifies a, ground truth is b # of instances that system classifies b, ground truth is b Cells (1,1) and (2,2) represent “good” classifications. The others are wrong. In fact, we are told that there are 5 correctly classified instances and 9 errors. Cells (1,1) and (2,2) represent “good” classifications. The others are wrong. In fact, we are told that there are 5 correctly classified instances and 9 errors.