Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP1942 Classification: More Concept Prepared by Raymond Wong

Similar presentations


Presentation on theme: "COMP1942 Classification: More Concept Prepared by Raymond Wong"— Presentation transcript:

1 COMP1942 Classification: More Concept Prepared by Raymond Wong
Presented by Raymond Wong COMP1942

2 Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

3 Accuracy Given a classification model (e.g., decision tree) an arbitrary dataset where its target attribute is known the accuracy of a classification model is defined to be the proportion of the values in the target attribute correctly predicted. COMP1942 COMP1942 3

4 Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance black high no yes white low Training set Decision tree COMP1942 COMP1942 COMP1942 4 4

5 Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance black high no yes white low Predicted root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No yes yes yes yes no no no no Accuracy = 8/8*100 = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 5 5

6 Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance white high yes low black no An arbitrary dataset Decision tree COMP1942 COMP1942 COMP1942 6 6

7 Accuracy e.g.1 root child=yes child=no Income=high Income=low
Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Accuracy = 5/6 * 100 = 83.3% An arbitrary dataset Decision tree COMP1942 COMP1942 COMP1942 7 7

8 Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

9 Precision Precision Total no. of values in the target attribute correctly predicted as “Yes” = Total no. of values in the target attribute predicted as “Yes” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Precision = 2/3 = 0.67 An arbitrary dataset COMP1942 Decision tree

10 Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

11 Recall Recall Total no. of values in the target attribute correctly predicted as “Yes” = Total no. of actual values in the target attribute equal to “Yes” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Recall = 2/2 = 1.0 An arbitrary dataset COMP1942 Decision tree

12 Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

13 f-measure f-measure is also called “f1-score”.
f-measure is a measurement by considering precision and recall together f-measure 2 x Precision x Recall = Precision + Recall COMP1942

14 f-measure root child=yes child=no Income=high Income=low
Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes An arbitrary dataset f-measure = 2 x 0.67 x 1.0 = 0.80 Decision tree COMP1942

15 Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

16 Specificity Specificity (similar to Recall (reverse way))
Total no. of values in the target attribute correctly predicted as “No” = Total no. of actual values in the target attribute equal to “No” root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes Recall = 3/4 = 0.75 An arbitrary dataset COMP1942 Decision tree

17 Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

18 False Positive/False Negative
True Positive The value is predicted as “Yes” and the actual value is “Yes” False Positive The value is predicted as “Yes” but the actual value is “No” True Negative The value is predicted as “No” and the actual value is “No” False Negative The value is predicted as “No” but the actual value is “Yes” COMP1942

19 False Positive/False Negative
Race Income Child Insurance white high yes low black no Predicted yes yes no no no yes No. of true positives = 2 No. of false positives = 1 No. of true negatives = 1 No. of false negatives = 2 COMP1942

20 False Positive/False Negative
Race Income Child Insurance white high yes low black no Predicted yes no no no yes yes No. of true positives = ? No. of false positives = ? No. of true negatives = ? No. of false negatives = ? COMP1942

21 Classification Concept
Other Five Measurements Accuracy Precision Recall f-measure Specificity False Positive/False Negative Two Phases COMP1942

22 Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 22

23 Training Phase What we learnt is the following. root child=yes
child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low COMP1942 Training set Decision tree

24 Training Phase Training set Original dataset Validation set Test set
Race Income Child Insurance black high no yes white low Original dataset Race Income Child Insurance black high no yes white low Validation set Race Income Child Insurance white high yes low black no Test set Race Income Child Insurance white low yes black no COMP1942

25 Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 25

26 Training Set Used to train or build a model e.g.1 root child=yes
child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 26 26

27 Training Set e.g.2 input Neuron Race output Income Insurance Child
black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 27 27

28 Training Set e.g.3 input Race output Income Insurance Child
black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 28 28

29 Training Set e.g.4 input Race output Income Insurance Child
black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 29 29

30 Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 30

31 Validation Set We obtain a model from the training set. Validation set
Used to fine-tune the model Different models have different ways of fine-tuning COMP1942 COMP1942 31

32 Validation Set – Neural Network
e.g.1 input output Race Insurance Neuron Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 32 32

33 Validation Set – Neural Network
e.g.1 Structure 1 input output Race Insurance Neuron Income Child Race Income Child Insurance white high yes low black no Accuracy = 80% Validation set Neural Network COMP1942 COMP1942 COMP1942 33 33

34 Validation Set – Neural Network
e.g.2 input output Race Insurance Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 34 34

35 Validation Set – Neural Network
e.g.2 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 95% Validation set Neural Network COMP1942 COMP1942 COMP1942 35 35

36 Validation Set – Neural Network
e.g.3 input output Race Insurance Income Child Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Neural Network COMP1942 COMP1942 COMP1942 36 36

37 Validation Set – Neural Network
e.g.3 Structure 3 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 70% Validation set Neural Network COMP1942 COMP1942 COMP1942 37 37

38 Validation Set – Neural Network
Which structure is the best? Why? COMP1942

39 Validation Set – Decision Tree
e.g.1 root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance black high no yes white low Accuracy = 100% Training set Decision tree COMP1942 COMP1942 COMP1942 39 39

40 Validation Set – Decision Tree
e.g.1 Original tree root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Race Income Child Insurance white high yes low black no Accuracy = 70% Validation set Decision tree COMP1942 COMP1942 COMP1942 40 40

41 Validation Set – Decision Tree
An operation to remove the whole subtree Pruning e.g.1 Variation of the original tree root child=yes child=no Race Income Child Insurance white high yes low black no 80% Yes 20% No 100% Yes 0% No Accuracy = 95% Validation set Decision tree COMP1942 COMP1942 COMP1942 41 41

42 Validation Set – Decision Tree
Which tree is the best? Why? COMP1942

43 Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 43

44 Test Set The validation test is often used to fine-tune the model
In such a case, when a model is finally chosen, its accuracy with the validation set is still an optimistic estimate of how it would perform with unseen data Thus, we use test set to evaluate the accuracy of the model on completely unseen data COMP1942

45 Test set – Neural Network
e.g.1 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white high yes low black no Accuracy = 95% Validation set Neural Network COMP1942 COMP1942 COMP1942 45 45

46 Test set – Neural Network
e.g.1 Structure 2 input output Race Insurance Income Child Race Income Child Insurance white low yes black no Accuracy = 90% Test set Neural Network COMP1942

47 Test set – Decision Tree
e.g.2 Variation of the original tree root child=yes child=no Race Income Child Insurance white high yes low black no 80% Yes 20% No 100% Yes 0% No Accuracy = 95% Validation set Decision tree COMP1942 COMP1942 COMP1942 47 47

48 Test set – Decision Tree
e.g.2 Variation of the original tree root child=yes child=no 80% Yes 20% No 100% Yes 0% No Race Income Child Insurance white low yes black no Accuracy = 90% Test set Decision tree COMP1942 COMP1942 COMP1942 48 48

49 Two Phases Two Phases Training Phase Prediction Phase Training Set
Validation Set Test Set Prediction Phase New Set COMP1942 COMP1942 49

50 New Set New set white high no ? root child=yes child=no Income=high
Suppose there is a person. Race Income Child Insurance white high no ? root Race Income Child Insurance black high no yes white low child=yes child=no 100% Yes 0% No Income=high Income=low 100% Yes 0% No 0% Yes 100% No Training set Decision tree COMP1942 50


Download ppt "COMP1942 Classification: More Concept Prepared by Raymond Wong"

Similar presentations


Ads by Google