Download presentation
Presentation is loading. Please wait.
Published byGlenna Sutedja Modified over 6 years ago
1
Cost Sensitive Evaluation Measures for F-term Classification
Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield
2
Outline Measures for closeness of two concepts in taxonomy.
Cost sensitive measures for document classification and information retrieval. Applying the new measures on the submitted runs of the F-term patent classification sub-task of NTCIR-6. 2(12)
3
Hierarchical Classification
Class labels are organised in a hierarchical fashion. Cost of misclassification is dependent upon closeness of two classes. It’s better for evaluation measure to be closeness sensitive. 3(12)
4
Closeness of Concepts in Taxonomy
Agirre and Rigau (1996) proposed the following criteria for measuring the closeness of two concepts in a taxonomy. C1. Be dependent on length of the shortest path connecting the two concepts. C2. Concepts in deeper part of taxonomy should be closer. C3. Concepts in denser part should be relatively closer. C4. Be independent of the number of concepts in the graph. 4(12)
5
Three Measures of Closeness
Three measures and their compatibilities with the four criteria. C1 C2 C3 C4 Distance based √ Learning accuracy BDM 5(12)
6
Conventional Precision
Given a particular class X and a subset S of test example with the predicted class labels, the conventional precision P =nmatch/|S| where nmatch is the number of the exactly matched examples. 6(12)
7
Cost Sensitive Precision
Having the same assumption as in last slide, and also assume that S0 consists of the examples in test set S which belong to class X, and Ye is the predicted label of example e, then the cost sensitive precision is as P =∑e S0(CL(X,Ye)/|S| where CL(X,Ye) denotes the closeness of two classes X and Ye. 7(12)
8
Cost Sensitive Measures
Three commonly used measures in document classification and information retrieval A-Precision, R-Precision, and F-measure. We can obtain the cost sensitive version of the three measures in a straight way, by using the cost sensitive precision defined in last slide. 8(12)
9
Measures for F-term classification
F-term patent classification consists of hierarchical classification problems. We defined two cost sensitive measures for F-term classification, both based on the BDM scores In generous way: BDM_high. In conservative way: BDM_low. 9(12)
10
Results of F-term Classification
Differences of the A-Precision between the BDM measures and the conventional one for all the official submitted runs of the NTCIR-6 F-term Classification sub-task. 10(12)
11
Effect on Ranking of Submitted Runs
The Kendall’s tau for the pairs of rankings of all submitted runs using BDM score and binary score, respectively. A-Precision R-Precision F-measure BDM_high 0.887 0.861 0.614 BDM_low 0.912 0.848 0.752 11(12)
12
Conclusions Extend the measures for document classification to take into account the closeness of two concepts in a taxonomy. Apply the new measures to the submitted runs of the NTCIR-6 F-term patent classification Demonstrated the difference learning strategies better than conventional ones. Does have effect on the rankings. 12(12)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.