Download presentation
Presentation is loading. Please wait.
Published byMegan Oliver Modified over 6 years ago
1
COMP1942 Classification: More Concept Prepared by Raymond Wong
Presented by Raymond Wong COMP1942
2
Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942
3
Accuracy Given a classification model (e.g., decision tree) an arbitrary dataset where its target attribute is known the accuracy of a classification model is defined to be the proportion of the values in the target attribute correctly predicted. COMP1942 COMP1942 3
4
Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance black high no yes white low Training set Decision tree COMP1942 COMP1942 COMP1942 4 4
5
Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance black high no yes white low Predicted root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No yes yes yes yes no no no no Accuracy = 8/8*100 = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 5 5
6
Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance white high yes low black no An arbitrary dataset Decision tree COMP1942 COMP1942 COMP1942 6 6
7
Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Accuracy = 5/6 * 100 = 83.3% An arbitrary dataset Decision tree COMP1942 COMP1942 COMP1942 7 7
8
Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942
9
Precision Precision Total no. of values in the target attribute correctly predicted as “Yes” = Total no. of values in the target attribute predicted as “Yes” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Precision = 2/3 = 0.67 An arbitrary dataset COMP1942 Decision tree
10
Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942
11
Recall Recall Total no. of values in the target attribute correctly predicted as “Yes” = Total no. of actual values in the target attribute equal to “Yes” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Recall = 2/2 = 1.0 An arbitrary dataset COMP1942 Decision tree
12
Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942
13
f-measure f-measure is also called “f1-score”.
f-measure is a measurement by considering precision and recall together f-measure 2 x Precision x Recall = Precision + Recall COMP1942
14
f-measure root child=yes child=no Income=high Income=low
Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes An arbitrary dataset f-measure = 2 x 0.67 x 1.0 = 0.80 Decision tree COMP1942
15
Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942
16
Specificity Specificity (similar to Recall (reverse way))
Total no. of values in the target attribute correctly predicted as “No” = Total no. of actual values in the target attribute equal to “No” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Recall = 3/4 = 0.75 An arbitrary dataset COMP1942 Decision tree
17
Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942
18
False Positive/False Negative
True Positive The value is predicted as “Yes” and the actual value is “Yes” False Positive The value is predicted as “Yes” but the actual value is “No” True Negative The value is predicted as “No” and the actual value is “No” False Negative The value is predicted as “No” but the actual value is “Yes” COMP1942
19
False Positive/False Negative
Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes No. of true positives = 2 No. of false positives = 1 No. of true negatives = 1 No. of false negatives = 2 COMP1942
20
False Positive/False Negative
Race Income Child Insurance white high yes low black no Predicted yes no no no yes yes No. of true positives = ? No. of false positives = ? No. of true negatives = ? No. of false negatives = ? COMP1942
21
Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942
22
Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 22
23
Training Phase What we learnt is the following. root child=yes
child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low COMP1942 Training set Decision tree
24
Training Phase Training set Original dataset Validation set Test set
Race Income Child Insurance black high no yes white low Original dataset Race Income Child Insurance black high no yes white low … Validation set Race Income Child Insurance white high yes low black no Test set Race Income Child Insurance white low yes black no COMP1942
25
Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 25
26
Training Set Used to train or build a model e.g.1 root child=yes
child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 26 26
27
Training Set e.g.2 input Neuron Race output Income Insurance Child
black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 27 27
28
Training Set e.g.3 input Race output Income Insurance Child
black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 28 28
29
Training Set e.g.4 input Race output Income Insurance Child
black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 29 29
30
Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 30
31
Validation Set We obtain a model from the training set. Validation set
Used to fine-tune the model Different models have different ways of fine-tuning COMP1942 COMP1942 31
32
Validation Set – Neural Network
e.g.1 input output Race Insurance Neuron Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 32 32
33
Validation Set – Neural Network
e.g.1 Structure 1 input output Race Insurance Neuron Income Child Race Income Child Insurance white high yes low black no Accuracy = 80% Validation set Neural Network COMP1942 COMP1942 COMP1942 33 33
34
Validation Set – Neural Network
e.g.2 input output Race Insurance Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 34 34
35
Validation Set – Neural Network
e.g.2 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 95% Validation set Neural Network COMP1942 COMP1942 COMP1942 35 35
36
Validation Set – Neural Network
e.g.3 input output Race Insurance Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 36 36
37
Validation Set – Neural Network
e.g.3 Structure 3 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 70% Validation set Neural Network COMP1942 COMP1942 COMP1942 37 37
38
Validation Set – Neural Network
Which structure is the best? Why? COMP1942
39
Validation Set – Decision Tree
e.g.1 root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 39 39
40
Validation Set – Decision Tree
e.g.1 Original tree root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Accuracy = 70% Validation set Decision tree COMP1942 COMP1942 COMP1942 40 40
41
Validation Set – Decision Tree
An operation to remove the whole subtree Pruning e.g.1 Variation of the original tree root child=yes child=no Race Income Child Insurance white high yes low black no 80% Yes 20% No 100% Yes 0% No Accuracy = 95% Validation set Decision tree COMP1942 COMP1942 COMP1942 41 41
42
Validation Set – Decision Tree
Which tree is the best? Why? COMP1942
43
Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 43
44
Test Set The validation test is often used to fine-tune the model
In such a case, when a model is finally chosen, its accuracy with the validation set is still an optimistic estimate of how it would perform with unseen data Thus, we use test set to evaluate the accuracy of the model on completely unseen data COMP1942
45
Test set – Neural Network
e.g.1 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 95% Validation set Neural Network COMP1942 COMP1942 COMP1942 45 45
46
Test set – Neural Network
e.g.1 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white low yes black no Accuracy = 90% Test set Neural Network COMP1942
47
Test set – Decision Tree
e.g.2 Variation of the original tree root child=yes child=no Race Income Child Insurance white high yes low black no 80% Yes 20% No 100% Yes 0% No Accuracy = 95% Validation set Decision tree COMP1942 COMP1942 COMP1942 47 47
48
Test set – Decision Tree
e.g.2 Variation of the original tree root child=yes child=no 80% Yes 20% No 100% Yes 0% No Race Income Child Insurance white low yes black no Accuracy = 90% Test set Decision tree COMP1942 COMP1942 COMP1942 48 48
49
Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 49
50
New Set New set white high no ? root child=yes child=no Income=high
Suppose there is a person. Race Income Child Insurance white high no ? root Race Income Child Insurance black high no yes white low child=yes child=no 100% Yes 0% No Income=high Income=low 100% Yes 0% No 0% Yes 100% No Training set Decision tree COMP1942 50
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.