Presentation is loading. Please wait.

Presentation is loading. Please wait.

© sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University www.cs.cmu.edu/~10610.

Similar presentations


Presentation on theme: "© sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University www.cs.cmu.edu/~10610."— Presentation transcript:

1

2 © sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University www.cs.cmu.edu/~10610

3 © sebastian thrun, CMU, 20002 Problem 1  You find out on testing data, your speech recognizer can recognize sentences with 68% word accuracy, whereas previous recognizers achieve 60%. Would you advice a company to adopt your speech recognizer?

4 © sebastian thrun, CMU, 20003 Problem 2  On testing data, your data mining algorithm can predict emergency C-sections with 68% accuracy, whereas a previous $1,000 test achieves 60% accuracy. Do you recommend to replace the previous test by your new method?

5 © sebastian thrun, CMU, 20004 Characterize: What Should We Worry about? cost/loss FP/FN errors regression quadratic error unsupervised learning log likelihood pattern classification + - classification error

6 © sebastian thrun, CMU, 20005 ROC Curves (ROC=Receiver Operating Characteristic)

7 © sebastian thrun, CMU, 20006 Error Types  Type I error, alpha error, false positive: Probability of accepting hypothesis if not true  Type II error, beta error, false negative: Probability of rejecting hypothesis when it is true

8 © sebastian thrun, CMU, 20007 ROC Curves (ROC=Receiver Operating Characteristic)

9 © sebastian thrun, CMU, 20008 ROC Curves (ROC=Receiver Operating Characteristic)  Sensitivity: probability that a test result will be positive when the disease is present  Specificity: probability that a test result will be negative when the disease is not present  Positive likelihood ratio: ratio between the probability of a positive test result given the presence of the disease and the probability of a positive test result given the absence of the disease  Negative likelihood ratio: ratio between the probability of a negative test result given the presence of the disease and the probability of a negative test result given the absence of the disease  Positive predictive value (PPV): probability that the disease is present when the test is positive  Negative predictive value (NPV): probability that the disease is not present when the test is negative

10 © sebastian thrun, CMU, 20009 Evaluating Machine Learning Algorithms plenty datalittle data

11 © sebastian thrun, CMU, 200010 Holdout Set Data evaluate  error train  Often also used for parameter optimization

12 © sebastian thrun, CMU, 200011 Example:  Hypothesis misclassifies 12 out of 40 examples in cross validation set S.  Q: What will the “true” error on future examples?  A:

13 © sebastian thrun, CMU, 200012 Finite Cross-Validation Set  True error:  Test error: D = all data m = #test samples S = test data (true risk) (empirical risk)

14 © sebastian thrun, CMU, 200013 Confidence Intervals (See Mitchell 97) If S contains m examples, drawn independently m  30 Then With approximately 95% probability, the true error e D lies in the interval

15 © sebastian thrun, CMU, 200014 Example:  Hypothesis misclassifies 12 out of 40 examples in cross validation set S.  Q: What will the “true” error on future examples?  A: With 95% confidence, the true error will be in the interval:

16 © sebastian thrun, CMU, 200015 Confidence Intervals (See Mitchell 97) If S contains n examples, drawn independently n  30 Then With approximately N% probability, the true error e D lies in the interval N%50%68%80%90%95%98%99% zNzN 0.671.01.281.641.962.332.58

17 © sebastian thrun, CMU, 200016 Finite Cross-Validation Set  True error:  Test error:  Number of test errors: Is Binomially distributed:

18 © sebastian thrun, CMU, 200017 Binomial Distribution Binomial distribution for e D =0.3 and m =40 P(k)P(k) Approximates Normal distribution (Central Limit Theorem)

19 © sebastian thrun, CMU, 200018 95% Confidence Intervals

20 © sebastian thrun, CMU, 200019 Question  What’s the difference between variance and confidence intervals? Basically a factor

21 © sebastian thrun, CMU, 200020 Common Performance Plot Testing Error 95% confidence intervals

22 © sebastian thrun, CMU, 200021 Comparing Different Hypotheses  True difference:  Test set difference:  95% Confidence interval:

23 © sebastian thrun, CMU, 200022 Evaluating Machine Learning Algorithms plenty datalittle data

24 © sebastian thrun, CMU, 200023 Holdout Set Data evaluate  error train

25 © sebastian thrun, CMU, 200024 k-fold Cross Validation Data Train on yellow, evaluate on pink  error 5 Train on yellow, evaluate on pink  error 6 Train on yellow, evaluate on pink  error 7 Train on yellow, evaluate on pink  error 1 Train on yellow, evaluate on pink  error 3 Train on yellow, evaluate on pink  error 4 Train on yellow, evaluate on pink  error 8 Train on yellow, evaluate on pink  error 2 error =  error i / k k-way split

26 © sebastian thrun, CMU, 200025 The Jackknife Data

27 © sebastian thrun, CMU, 200026 The Bootstrap Data  Repeat and average  Train on yellow, evaluate on pink  error

28 © sebastian thrun, CMU, 200027 What’s the Problem?  Confidence intervals assume independence. But our individual estimates are dependent. 

29 © sebastian thrun, CMU, 200028 Comparing Different Hypotheses: Paired t test  True difference:  For each partition k :  Average:  N % Confidence interval: test error for partition k k-1 is degrees of freedom N is confidence level 90%95%98%99% =2 2.924.306.969.92 =5 2.022.573.364.03 =10 1.812.232.763.17 =20 1.722.092.532.84 =30 1.702.042.462.75 =120 1.661.982.362.62 =  1.641.962.332.58

30 © sebastian thrun, CMU, 200029 Evaluating Machine Learning Algorithms plenty datalittle data unlimited data

31 © sebastian thrun, CMU, 200030 Asymptotic Prediction  Useful for very large data sets

32 © sebastian thrun, CMU, 200031 Summary  Know your loss function!  Finite testing data: report confidence intervals  Scarce data: Repartition training/testing set  Asymptotic prediction: exponential  Put thoughts into your evaluation, and be critical. Convince yourself!


Download ppt "© sebastian thrun, CMU, 20001 10-610 The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University www.cs.cmu.edu/~10610."

Similar presentations


Ads by Google