COMP1942 Classification: More Concept Prepared by Raymond Wong

Slides:

Advertisements

Similar presentations

Decision Tree Algorithm

Advertisements

Basic Data Mining Techniques Chapter Decision Trees.

Basic Data Mining Techniques

Bagging and Boosting in Data Mining Carolina Ruiz

Three kinds of learning

Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.

Basic Data Mining Techniques

Bayesian Networks. Male brain wiring Female brain wiring.

Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)

Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong

COMP53311 Classification Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chan ’ s notes Presented by Raymond Wong

COMP53311 Other Classification Models: Neural Network Prepared by Raymond Wong Some of the notes about Neural Network are borrowed from LW Chan’s notes.

Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.

CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.

CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.

Data Science Credibility: Evaluating What’s Been Learned

Ensemble Classifiers.

Machine Learning: Ensemble Methods

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

DECISION TREES An internal node represents a test on an attribute.

Bagging and Random Forests

Other Classification Models: Neural Network

An Artificial Intelligence Approach to Precision Oncology

Classification Algorithms

Prepared by: Mahmoud Rafeek Al-Farra

Classification 3 (Nearest Neighbor Classifier)

COMP1942 Exploring and Visualizing Data Overview

Other Classification Models: Neural Network

CSSE463: Image Recognition Day 11

Chapter 6 Classification and Prediction

Classification with Perceptrons Reading:

SAD: 6º Projecto.

Dipartimento di Ingegneria «Enzo Ferrari»,

Decision Tree Saed Sayad 9/21/2018.

Final Year Project Presentation --- Magic Paint Face

Classification and Prediction

Lecture Notes for Chapter 4 Introduction to Data Mining

Features & Decision regions

Machine Learning Week 1.

Mitchell Kossoris, Catelyn Scholl, Zhi Zheng

Data Mining Practical Machine Learning Tools and Techniques

Volume 10, Issue 6, Pages (June 2018)

CSSE463: Image Recognition Day 11

Data Mining – Chapter 3 Classification

Machine Learning: Lecture 3

Classification and Prediction

CSCI N317 Computation for Scientific Applications Unit Weka

Machine Learning Interpretability

Model Evaluation and Selection

Evaluating Models Part 1

COMP5331 Advanced Topics Prepared by Raymond Wong

Ensemble learning.

CS539 Project Report -- Evaluating hypothesis

Decision Tree Concept of Decision Tree

Other Classification Models: Support Vector Machine (SVM)

CSSE463: Image Recognition Day 11

CSSE463: Image Recognition Day 11

Using Clustering to Make Prediction Intervals For Neural Networks

Practice Project Overview

COSC 4368 Intro Supervised Learning Organization

Report 7 Brandon Silva.

Information Organization: Evaluation of Classification Performance

An introduction to Machine Learning (ML)

Presentation transcript:

COMP1942 Classification: More Concept Prepared by Raymond Wong Presented by Raymond Wong raywong@cse COMP1942

Classification Concept Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

Accuracy Given a classification model (e.g., decision tree) an arbitrary dataset where its target attribute is known the accuracy of a classification model is defined to be the proportion of the values in the target attribute correctly predicted. COMP1942 COMP1942 3

Accuracy e.g.1 root child=yes child=no Income=high Income=low Race Income Child Insurance black high no yes white low Training set Decision tree COMP1942 COMP1942 COMP1942 4 4

Accuracy e.g.1 root child=yes child=no Income=high Income=low Race Income Child Insurance black high no yes white low Predicted root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No yes yes yes yes no no no no Accuracy = 8/8*100 = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 5 5

Accuracy e.g.1 root child=yes child=no Income=high Income=low Race Income Child Insurance white high yes low black no An arbitrary dataset Decision tree COMP1942 COMP1942 COMP1942 6 6

Accuracy e.g.1 root child=yes child=no Income=high Income=low Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Accuracy = 5/6 * 100 = 83.3% An arbitrary dataset Decision tree COMP1942 COMP1942 COMP1942 7 7

Classification Concept Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

Precision Precision Total no. of values in the target attribute correctly predicted as “Yes” = Total no. of values in the target attribute predicted as “Yes” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Precision = 2/3 = 0.67 An arbitrary dataset COMP1942 Decision tree

Classification Concept Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

Recall Recall Total no. of values in the target attribute correctly predicted as “Yes” = Total no. of actual values in the target attribute equal to “Yes” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Recall = 2/2 = 1.0 An arbitrary dataset COMP1942 Decision tree

Classification Concept Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

f-measure f-measure is also called “f1-score”. f-measure is a measurement by considering precision and recall together f-measure 2 x Precision x Recall = Precision + Recall COMP1942

f-measure root child=yes child=no Income=high Income=low Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes An arbitrary dataset f-measure = 2 x 0.67 x 1.0 0.67 + 1.0 = 0.80 Decision tree COMP1942

Classification Concept Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

Specificity Specificity (similar to Recall (reverse way)) Total no. of values in the target attribute correctly predicted as “No” = Total no. of actual values in the target attribute equal to “No” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Recall = 3/4 = 0.75 An arbitrary dataset COMP1942 Decision tree

Classification Concept Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

False Positive/False Negative True Positive The value is predicted as “Yes” and the actual value is “Yes” False Positive The value is predicted as “Yes” but the actual value is “No” True Negative The value is predicted as “No” and the actual value is “No” False Negative The value is predicted as “No” but the actual value is “Yes” COMP1942

False Positive/False Negative Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes No. of true positives = 2 No. of false positives = 1 No. of true negatives = 1 No. of false negatives = 2 COMP1942

False Positive/False Negative Race Income Child Insurance white high yes low black no Predicted yes no no no yes yes No. of true positives = ? No. of false positives = ? No. of true negatives = ? No. of false negatives = ? COMP1942

Classification Concept Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

Two Phases Two Phases Training Phase Prediction Phase Training Set Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 22

Training Phase What we learnt is the following. root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low COMP1942 Training set Decision tree

Training Phase Training set Original dataset Validation set Test set Race Income Child Insurance black high no yes white low Original dataset Race Income Child Insurance black high no yes white low … Validation set Race Income Child Insurance white high yes low black no Test set Race Income Child Insurance white low yes black no COMP1942

Two Phases Two Phases Training Phase Prediction Phase Training Set Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 25

Training Set Used to train or build a model e.g.1 root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 26 26

Training Set e.g.2 input Neuron Race output Income Insurance Child black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 27 27

Training Set e.g.3 input Race output Income Insurance Child black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 28 28

Training Set e.g.4 input Race output Income Insurance Child black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 29 29

Two Phases Two Phases Training Phase Prediction Phase Training Set Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 30

Validation Set We obtain a model from the training set. Validation set Used to fine-tune the model Different models have different ways of fine-tuning COMP1942 COMP1942 31

Validation Set – Neural Network e.g.1 input output Race Insurance Neuron Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 32 32

Validation Set – Neural Network e.g.1 Structure 1 input output Race Insurance Neuron Income Child Race Income Child Insurance white high yes low black no Accuracy = 80% Validation set Neural Network COMP1942 COMP1942 COMP1942 33 33

Validation Set – Neural Network e.g.2 input output Race Insurance Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 34 34

Validation Set – Neural Network e.g.2 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 95% Validation set Neural Network COMP1942 COMP1942 COMP1942 35 35

Validation Set – Neural Network e.g.3 input output Race Insurance Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 36 36

Validation Set – Neural Network e.g.3 Structure 3 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 70% Validation set Neural Network COMP1942 COMP1942 COMP1942 37 37

Validation Set – Neural Network Which structure is the best? Why? COMP1942

Validation Set – Decision Tree e.g.1 root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 39 39

Validation Set – Decision Tree e.g.1 Original tree root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Accuracy = 70% Validation set Decision tree COMP1942 COMP1942 COMP1942 40 40

Validation Set – Decision Tree An operation to remove the whole subtree Pruning e.g.1 Variation of the original tree root child=yes child=no Race Income Child Insurance white high yes low black no 80% Yes 20% No 100% Yes 0% No Accuracy = 95% Validation set Decision tree COMP1942 COMP1942 COMP1942 41 41

Validation Set – Decision Tree Which tree is the best? Why? COMP1942

Two Phases Two Phases Training Phase Prediction Phase Training Set Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 43

Test Set The validation test is often used to fine-tune the model In such a case, when a model is finally chosen, its accuracy with the validation set is still an optimistic estimate of how it would perform with unseen data Thus, we use test set to evaluate the accuracy of the model on completely unseen data COMP1942

Test set – Neural Network e.g.1 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 95% Validation set Neural Network COMP1942 COMP1942 COMP1942 45 45

Test set – Neural Network e.g.1 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white low yes black no Accuracy = 90% Test set Neural Network COMP1942

Test set – Decision Tree e.g.2 Variation of the original tree root child=yes child=no Race Income Child Insurance white high yes low black no 80% Yes 20% No 100% Yes 0% No Accuracy = 95% Validation set Decision tree COMP1942 COMP1942 COMP1942 47 47

Test set – Decision Tree e.g.2 Variation of the original tree root child=yes child=no 80% Yes 20% No 100% Yes 0% No Race Income Child Insurance white low yes black no Accuracy = 90% Test set Decision tree COMP1942 COMP1942 COMP1942 48 48

Two Phases Two Phases Training Phase Prediction Phase Training Set Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 49

New Set New set white high no ? root child=yes child=no Income=high Suppose there is a person. Race Income Child Insurance white high no ? root Race Income Child Insurance black high no yes white low child=yes child=no 100% Yes 0% No Income=high Income=low 100% Yes 0% No 0% Yes 100% No Training set Decision tree COMP1942 50