Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall
Credibility: Evaluating What’s Been Learned Issues: training, testing, tuning Predicting performance Holdout, cross-validation, bootstrap Comparing schemes: the t-test Predicting probabilities: loss functions Cost-sensitive measures Evaluating numeric prediction The Minimum Description Length principle
Evaluating Numeric Prediction Same strategies: independent training, validation and test sets, significance tests, etc. (avoid cross-validation and bootstrapping for reporting) Difference: error measures Actual target values: y1 y2 …yN Predicted target values: y^1 y^2 … y^N Most popular measure: mean-squared error Easy to manipulate mathematically
Other Measures The root mean-squared error : The mean absolute error is less sensitive to outliers than the mean-squared error: Sometimes relative error values are more appropriate (e.g. 10% for an error of 50 when predicting 500)
Improvement on the Mean How much does the scheme improve on simply predicting the average? The relative squared error is: Root relative squared error Relative absolute error
Correlation Coefficient Measures the statistical correlation between the predicted values and the actual values Pearson product-moment correlation coefficient, rho Scale independent, between –1 and +1 Good performance leads to large values
Pearson product-moment correlation coefficient Examples of scatter diagrams with different values of correlation coefficient (ρ)
Pearson product-moment correlation coefficient Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). Note: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero.
Which Measure? Best to look at all of them Often it doesn’t matter Student Q: In what situations would we want to use the correlation coefficient as a performance measure for numeric prediction? Best to look at all of them Often it doesn’t matter Example: 0.91 0.89 0.88 Correlation coefficient 30.4% 34.8% 40.1% 43.1% Relative absolute error 35.8% 39.4% 57.2% 42.2% Root rel squared error 29.2 33.4 38.5 41.3 Mean absolute error 57.4 63.3 91.7 67.8 Root mean-squared error D C B A D best C second-best A, B arguable