Download presentation
Presentation is loading. Please wait.
2
© sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University www.cs.cmu.edu/~10610
3
© sebastian thrun, CMU, 20002 Problem 1 You find out on testing data, your speech recognizer can recognize sentences with 68% word accuracy, whereas previous recognizers achieve 60%. Would you advice a company to adopt your speech recognizer?
4
© sebastian thrun, CMU, 20003 Problem 2 On testing data, your data mining algorithm can predict emergency C-sections with 68% accuracy, whereas a previous $1,000 test achieves 60% accuracy. Do you recommend to replace the previous test by your new method?
5
© sebastian thrun, CMU, 20004 Characterize: What Should We Worry about? cost/loss FP/FN errors regression quadratic error unsupervised learning log likelihood pattern classification + - classification error
6
© sebastian thrun, CMU, 20005 ROC Curves (ROC=Receiver Operating Characteristic)
7
© sebastian thrun, CMU, 20006 Error Types Type I error, alpha error, false positive: Probability of accepting hypothesis if not true Type II error, beta error, false negative: Probability of rejecting hypothesis when it is true
8
© sebastian thrun, CMU, 20007 ROC Curves (ROC=Receiver Operating Characteristic)
9
© sebastian thrun, CMU, 20008 ROC Curves (ROC=Receiver Operating Characteristic) Sensitivity: probability that a test result will be positive when the disease is present Specificity: probability that a test result will be negative when the disease is not present Positive likelihood ratio: ratio between the probability of a positive test result given the presence of the disease and the probability of a positive test result given the absence of the disease Negative likelihood ratio: ratio between the probability of a negative test result given the presence of the disease and the probability of a negative test result given the absence of the disease Positive predictive value (PPV): probability that the disease is present when the test is positive Negative predictive value (NPV): probability that the disease is not present when the test is negative
10
© sebastian thrun, CMU, 20009 Evaluating Machine Learning Algorithms plenty datalittle data
11
© sebastian thrun, CMU, 200010 Holdout Set Data evaluate error train Often also used for parameter optimization
12
© sebastian thrun, CMU, 200011 Example: Hypothesis misclassifies 12 out of 40 examples in cross validation set S. Q: What will the “true” error on future examples? A:
13
© sebastian thrun, CMU, 200012 Finite Cross-Validation Set True error: Test error: D = all data m = #test samples S = test data (true risk) (empirical risk)
14
© sebastian thrun, CMU, 200013 Confidence Intervals (See Mitchell 97) If S contains m examples, drawn independently m 30 Then With approximately 95% probability, the true error e D lies in the interval
15
© sebastian thrun, CMU, 200014 Example: Hypothesis misclassifies 12 out of 40 examples in cross validation set S. Q: What will the “true” error on future examples? A: With 95% confidence, the true error will be in the interval:
16
© sebastian thrun, CMU, 200015 Confidence Intervals (See Mitchell 97) If S contains n examples, drawn independently n 30 Then With approximately N% probability, the true error e D lies in the interval N%50%68%80%90%95%98%99% zNzN 0.671.01.281.641.962.332.58
17
© sebastian thrun, CMU, 200016 Finite Cross-Validation Set True error: Test error: Number of test errors: Is Binomially distributed:
18
© sebastian thrun, CMU, 200017 Binomial Distribution Binomial distribution for e D =0.3 and m =40 P(k)P(k) Approximates Normal distribution (Central Limit Theorem)
19
© sebastian thrun, CMU, 200018 95% Confidence Intervals
20
© sebastian thrun, CMU, 200019 Question What’s the difference between variance and confidence intervals? Basically a factor
21
© sebastian thrun, CMU, 200020 Common Performance Plot Testing Error 95% confidence intervals
22
© sebastian thrun, CMU, 200021 Comparing Different Hypotheses True difference: Test set difference: 95% Confidence interval:
23
© sebastian thrun, CMU, 200022 Evaluating Machine Learning Algorithms plenty datalittle data
24
© sebastian thrun, CMU, 200023 Holdout Set Data evaluate error train
25
© sebastian thrun, CMU, 200024 k-fold Cross Validation Data Train on yellow, evaluate on pink error 5 Train on yellow, evaluate on pink error 6 Train on yellow, evaluate on pink error 7 Train on yellow, evaluate on pink error 1 Train on yellow, evaluate on pink error 3 Train on yellow, evaluate on pink error 4 Train on yellow, evaluate on pink error 8 Train on yellow, evaluate on pink error 2 error = error i / k k-way split
26
© sebastian thrun, CMU, 200025 The Jackknife Data
27
© sebastian thrun, CMU, 200026 The Bootstrap Data Repeat and average Train on yellow, evaluate on pink error
28
© sebastian thrun, CMU, 200027 What’s the Problem? Confidence intervals assume independence. But our individual estimates are dependent.
29
© sebastian thrun, CMU, 200028 Comparing Different Hypotheses: Paired t test True difference: For each partition k : Average: N % Confidence interval: test error for partition k k-1 is degrees of freedom N is confidence level 90%95%98%99% =2 2.924.306.969.92 =5 2.022.573.364.03 =10 1.812.232.763.17 =20 1.722.092.532.84 =30 1.702.042.462.75 =120 1.661.982.362.62 = 1.641.962.332.58
30
© sebastian thrun, CMU, 200029 Evaluating Machine Learning Algorithms plenty datalittle data unlimited data
31
© sebastian thrun, CMU, 200030 Asymptotic Prediction Useful for very large data sets
32
© sebastian thrun, CMU, 200031 Summary Know your loss function! Finite testing data: report confidence intervals Scarce data: Repartition training/testing set Asymptotic prediction: exponential Put thoughts into your evaluation, and be critical. Convince yourself!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.