Data Mining – Algorithms: OneR Chapter 4, Section 4.1.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Classification Techniques: Decision Tree Learning
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Decision Trees.
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Algorithms for Classification: The Basic Methods.
5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z
Issues with Data Mining
Machine Learning Chapter 3. Decision Tree Learning
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Data Mining – Input: Concepts, instances, attributes Chapter 2.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Data Mining – Algorithms: Linear Models Chapter 4, Section 4.6.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Decision-Tree Induction & Decision-Rule Induction
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
W E K A Waikato Environment for Knowledge Aquisition.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Data Science Algorithms: The Basic Methods
Data Mining – Algorithms: Instance-Based Learning
Classification Algorithms
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Techniques for Data Mining
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Clustering.
Machine Learning: Lecture 3
CSCI N317 Computation for Scientific Applications Unit Weka
Dept. of Computer Science University of Liverpool
Evaluating Classifiers
Data Mining CSCI 307, Spring 2019 Lecture 15
Neural Networks Weka Lab
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Data Mining – Algorithms: OneR Chapter 4, Section 4.1

Simplicity First Simple Algorithms sometimes work surprisingly well It is worth trying simple approaches first Different approaches may work better for different data There is more than one simple approach First to be examined: OneR (or 1R) – learns one rule for the dataset – actually a bit of a misnomer – one level decision tree

OneR – Holte (1993) Simple, cheap method Often performs surprisingly well Many real datasets may not have complicated things going on Idea: –Make rules that test a single attribute and branch accordingly (each branch corresponds to a different value for that attribute) –Classification for a given branch is the “majority” class for that branch in the training data –Evaluate use of each attribute via “error rate” on training data –Choose the best attribute

Figure 4.1 Pseudo-code for 1R. At least in the simplest version, “missing” is treated as a separate value

Example: My Weather (Nominal) OutlookTempHumidWindyPlay? sunnyhothighFALSEno sunnyhothighTRUEyes overcasthothighFALSEno rainymildhighFALSEno rainycoolnormalFALSEno rainycoolnormalTRUEno overcastcoolnormalTRUEyes sunnymildhighFALSEyes sunnycoolnormalFALSEyes rainymildnormalFALSEno sunnymildnormalTRUEyes overcastmildhighTRUEyes overcasthotnormalFALSEno rainymildhighTRUEno

Let’s take this a little more realistic than book does Divide into training and test data Let’s save the last record as a test

For each attribute – start with Outlook Make a rule for each value –Sunny  yes 1/5 errors –Overcast  yes* 2/4 errors –Rainy  no 0/4 errors –Total errors = 3/13 Move on to next attribute – temperature –Hot  no 1/4 errors –Mild  yes2/5 errors –Cool  no*2/4 errors –Total errors = 5/13 * - means tie – arbitrarily broken (maybe random)

Continue with Humidity Make a rule for each value –High  yes* 3/6 errors –Normal  no 3/7 errors –Total errors = 6/13 Move on to next attribute – windy –False  no 2/8 errors –True  yes1/5 errors –Total errors = 3/13 - means tie – arbitrarily broken (maybe random) First and last attributes tie – one would have to be arbitrarily chosen On the test record, the first would end up being correct, the last wouldn’t

Again being more realistic than the book, this will be cross validated Normally 10-fold is used, but with 14 instances that is a little awkward – –6 of the tests would be on 1 instance –4 of the tests will be on 2 instances I’m going to do 14-fold instead – having one test instance for each test Next test, save 13 th instance out as test data

For each attribute – start with Outlook Make a rule for each value –Sunny  yes 1/5 errors –Overcast  yes 1/3 errors –Rainy  no 0/5 errors –Total errors = 2/13 Move on to next attribute – temperature –Hot  no 1/3 errors –Mild  yes*3/6 errors –Cool  no*2/4 errors –Total errors = 6/13 * - means tie – arbitrarily broken (maybe random)

Continue with Humidity Make a rule for each value –High  no 3/7 errors –Normal  yes* 3/6 errors –Total errors = 6/13 Move on to next attribute – windy –False  no 2/7 errors –True  yes2/6 errors –Total errors = 4/13 - means tie – arbitrarily broken (maybe random) First attribute wins On the test record, this makes an incorrect prediction

In a 14-fold cross validation, this would continue 12 more times Let’s run WEKA on this …

WEKA results – first look near the bottom === Stratified cross-validation === === Summary === Correctly Classified Instances % Incorrectly Classified Instances % ============================================ On the cross validation – it got 9 out of 14 tests correct (I don’t know which way it went on arbitrary decisions so we may not re-create exactly if we walk all of the way through

More Detailed Results === Confusion Matrix === a b <-- classified as 4 2 | a = yes 3 5 | b = no ==================================== Here we see –the program 7 times predicted play=yes, on 4 of those it was correct The program 7 times predicted play = no, on 5 of those it was correct There were 6 instances whose actual value was play=yes, the program correctly predicted that on 4 of them There were 8 instances whose actual value was play=no, the program correctly predicted that on 5 of them

Part of our purpose is to have a take-home message for humans Not 14 take home messages! So instead of reporting each of the things learned on each of the 14 training sets … … The program runs again on all of the data and builds a pattern for that – a take home message

For each attribute – start with Outlook Make a rule for each value –Sunny  yes 1/5 errors –Overcast  yes* 2/4 errors –Rainy  no 0/5 errors –Total errors = 3/14 Move on to next attribute – temperature –Hot  no 1/4 errors –Mild  yes*3/6 errors –Cool  no*2/4 errors –Total errors = 6/14 * - means tie – arbitrarily broken (maybe random)

Continue with Humidity Make a rule for each value –High  no 3/7 errors –Normal  no 3/7 errors –Total errors = 6/14 Move on to next attribute – windy –False  no 2/8 errors –True  yes2/6 errors –Total errors = 4/14 - means tie – arbitrarily broken (maybe random) First attribute wins - see WEKA results on next slide

WEKA - Take-Home === Classifier model (full training set) === outlook: sunny-> yes overcast-> yes rainy-> no (11/14 instances correct) This very simple classifier rule-set could be the take home message from running this algorithm on this data – if you are satisfied with the results! This 11/14 correct is NOT a good indicator of quality – it is looking at % correct on TRAINING DATA The cross validation results previously shown (9/14) is a much more fair judgment because it is on TEST DATA

Let’s Try WEKA OneR on njcrimenominal Try 10-fold unemploy: hi-> bad med-> ok low-> ok (27/32 instances correct) === Confusion Matrix === a b <-- classified as 1 6 | a = bad 3 22 | b = ok

Numeric Attributes For OneR, numeric attributes are “ discretized ” – the range of values is divided into a set of intervals (Too) simple method: –Sort –Put breakpoint wherever class changes (this is “supervised” discetization) –See my weather data … Temp- erature Play?YNNYNNYYNYYNNN With OneR, there would only be one error on the training data … but..

This is “overfitting” What makes 64 a different group than 65? Using this technique, ideal division would be with a numeric primary key – every attribute value could get its own group and error on training data would be 0 (but unlikely to be valuable for future prediction) Improvement via a Heuristic – each group must have at least N members of majority class (and go further if keep having majority class) In book, example N = 3. In WEKA, default N = 6.

With N = 3 on My Weather temperature Hit 3 rd No with 70, then continue and include 71 Hit 3 rd Yes with 75, then continue and include 80 We’re actually just lucky here that the last group reaches 3 in a majority class. If one had been No, that still would have been the last group – no choice Temp- erature Play?YNNYNNYYNYYNNN 3 errors on this training data with this discretized attribute, but more likely to be useful for future predictions

With N = 3 on My Weather humidity In Class Exercise – What groups will we have? Humid ity Play?YNYYNNNNNYYNYN

Let’s run WEKA My Weather Data First with default options Next with 3 (double click option area – WEKA option B)

Another Thing or Two Using this method, if two adjacent groups have the same majority class, they can be collapsed into one group (this doesn’t happen for temperature or humidity) We can’t do anything about missing values, they have to be in their own group

OneR in context The machine learning community had been using a set of available datasets to compare algorithms for a number of years – Algorithms were getting more and more complicated, with small gains in improvement Holte (1993) said “the emperor has no clothes” – state of the art methods were often only a few pct points better, and with much more complicated structural patterns (concept descriptions) OneR can provide a “ baseline ” against which other, more complicated methods can be compared –If they improve significantly, use them, otherwise …

Class Exercise

Let’s run WEKA OneR on japanbank B option = 3

We can actually discretize and save data for future use using WEKA Preprocess Tab Select Choose Unsupervised > Attribute > Discretize Choose Options –Attribute indices (#s to be binned – e.g. attr 3-4) –FindNumBins – to have WEKA find a good number of groups for this data –NumBins = max # groups to consider Choose Apply Button Choose Save Button, to save in permanent file Undo if necessary

End Section 4.1