Download presentation
Presentation is loading. Please wait.
Published byDorothy Thornton Modified over 9 years ago
1
Processing of large document collections Part 2
2
Feature selection: IG zInformation gain: measures the number of bits of information obtained for category prediction by knowing the presence or absence of a term in a document
3
Feature selection: estimating the probabilities zLet yterm t occurs in B documents, A of them are in category c ycategory c has D documents, of the whole of N documents in the collection
4
Feature selection: estimating the probabilities zFor instance, yP(t): B/N yP(~t): (N-B)/N yP(c): D/N yP(c|t): A/B yP(c|~t): (D-A)/(N-B)
5
Evaluation of text classifiers zEvaluation of document classifiers is typically conducted experimentally, rather than analytically zreason: in order to evaluate a system analytically, we would need a formal specification of the problem that the system is trying to solve ztext categorization is non-formalisable
6
Evaluation zThe experimental evaluation of a classifier usually measures its effectiveness (rather than its efficiency) yeffectiveness= ability to take the right classification decisions yefficiency= time and space requirements
7
Evaluation zAfter a classifier is constructed using a training set, the effectiveness is evaluated using a test set zthe following counts are computed for each category i: yTP i : true positives yFP i : false positives yTN i : true negatives yFN i : false negatives
8
Evaluation zTP i : true positives w.r.t. category c i ythe set of documents that both the classifier and the previous judgments (as recorded in the test set) classify under c i zFP i : false positives w.r.t. category c i ythe set of documents that the classifier classifies under ci, but the test set indicates that they do not belong to c i
9
Evaluation zTN i : true negatives w.r.t. c i yboth the classifier and the test set agree that the documents in TN i do not belong to c i zFN i : false negatives w.r.t. c i ythe classifier do not classify the documents in FN i under c i, but the test set indicates that they should be classified under c i
10
Evaluation measures zPrecision wrt c i zRecall wrt c i
11
Evaluation measures zFor obtaining estimates for precision and recall in the collection as a whole, two different methods may be adopted: ymicroaveraging xcounts for true positives, false positives and false negatives for all categories are first summed up xprecision and recall are calculated using the global values ymacroaveraging xaverage of precision (recall) for individual categories
12
Evaluation measures zMicroaveraging and macroaveraging may give quite different results, if the different categories have very different generality ze.g. the ability of a classifier to behave well also on categories with low generality (i.e. categories with few positive training instances) will be emphasized by macroaveraging zchoice depends on the application
13
Evaluation measures zAccuracy zis not widely used in TC ylarge value of the denominator makes A insensitive to variations in the number of correct decisions (TP+TN) ytrivial rejector tends to outperform all non- trivial classifiers
14
Evaluation measures zEfficiency yseldom used, although important for applicative purposes ydifficult: environment parameters change ytwo parts xtraining efficiency = average time it takes to build a classifier for a category from a training set xclassification efficiency = average time it takes to classify a new document under a category
15
Combined effectiveness measures zNeither precision nor recall make sense in isolation of each other zthe trivial acceptor (each document is classified under each category) has a recall = 1 yin this case, precision would usually be very low zhigher levels of precision may be obtained at the price of low values of recall
16
Combined effectiveness measures zA classifier should be evaluated by means of a measure which combines recall and precision
17
Reminder: Inductive construction of classifiers zA hard classifier for a category ydefinition of a function that returns true or false, or ydefinition of a function that returns a value between 0 and 1, followed by a definition of a threshold xif the value is higher than the threshold -> true xotherwise -> false
18
Combined effectiveness measures z11-point average precision zthe breakeven point zF1 measure
19
11-point average measure zIn constructing the classifier, the threshold is repeatedly tuned so as to allow recall (for the category) to take up values 0.0, 0.1., …, 0.9, 1.0. zPrecision (for the category) is computed for these 11 different values of precision, and averaged over the 11 resulting values
20
Breakeven point zProcess analoguous to the one used for 11-point average precision is used za plot of precision as a function of recall is computed by repeatedly varying the thresholds zbreakeven is the value where precision equals recall
21
F 1 measure zF 1 measure is defined as: zthe breakeven of a classifier is always less or equal than its F 1 value
22
Effectiveness zOnce an effectiveness measure is chosen, a classifier can be tuned (e.g. thresholds and other parameters can be set) so that the resulting effectiveness is the best achievable by that classifier
23
Conducting experiments zIn general, different sets of experiments may be used for cross-classifier comparison only if the experiments have been performed yon exactly the same collection (same documents and same categories) ywith the same split between training set and test set ywith the same evaluation measure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.