SAD: 6º Projecto
Lift Charts Comparing classifiers: 1,000,000 prospective respondents prediction that 0.1% of all households (1,000,000) will respond prediction that 0.4% of a specified 100,000 homes will respond. lift factor=increase in response rate=4 Given a classifier that outputs probabilities for the predicted class value for each test instance, what to do?
Lift Factor sample success proportion lift factor = (number of positive instances in sample) / sample size lift factor = (sample success proportion) / (total test set success proportion)
Lift
Evaluation: The confusion matrix Incorrectly classified instances a b classified as 7 2 | a = yes 4 1 | b = no Correctly classified instances Comments: For a boolean classification, the entropy is 0 if all entities belong to the same class; the entropy is 1 if the collection contains an equal number of positive and negative examples. Typical measure of entropy: bits of information needed for encoding the classification. Note that the first term for gain is the entropy of the original collection, and the second term is the expected value of the entropy after C is partitioned using attribute A. Gain is the exptected reduction in entropy caused by knowing the value of attribute A.
b) (1 pts) Perform Cross Validation of all your algorithms with Fold Count 4, 8, Maximum Cases should be 1000. Which algorithm is the best, which varies less? Which is the better choice?
Decision Tree
Naïve Bayes
NN
Paired Sample t Test Given a set of paired observations (from two normal populations) A B =A-B x1 y1 x1-x2 x2 y2 x2-y2 x3 y3 x3-y3 x4 y4 x4-y4 x5 y5 x5-y5
Calculate the mean and the standard deviation s of the the differences H0: =0 (no difference) H0: =k (difference is a constant)
DT
NB
NN
DT 188 +- 5.79 [182 – 188 – 193.79] NB 196 +-7.43 [184 – 191.5 – 196.93] NN 166+-2.9 [163.1 – 166 – 168]
DT 94 +- 4.72 [89.28– 94 – 98.72] NB 96.13+-4.4 [91.73 – 96.13 –100.53 ] NN 83.38+-7 [76.38– 83.38 –90.38]
Shannon formalized these intuitions Given a universe of messages M={m1,m2,...,mn} and a probability p(mi) for the occurrence of each message, the information content (also called entropy)of a message M is given
The amount of information needed to complete the tree is defined as weighted average of the information content of each sub tree by the percentage of the examples present C a set of training instances. If property (for example income) with n values, C will be divided into the subsets {C1,C2,..,Cn} Expected information needed to complete the tree after making P root
The gain from the property P is computed by subtracting the expected information to complete E(P) fro the total information
2. (6pts) Decision Tree