CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Evaluating Classifiers
Learning Algorithm Evaluation
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Sampling: Final and Initial Sample Size Determination
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Evaluating Hypotheses. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 2 Some notices Exam can be made in Artificial Intelligence (Department.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.
© sebastian thrun, CMU, The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Evaluation.
Evaluating Hypotheses
T-test.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Experimental Evaluation
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Central Limit Theorem The Normal Distribution The Standardised Normal.
5-3 Inference on the Means of Two Populations, Variances Unknown
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
AM Recitation 2/10/11.
Overview Definition Hypothesis
Hypothesis Testing in Linear Regression Analysis
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 Machine Learning: Experimental Evaluation. 2 Motivation Evaluating the performance of learning systems is important because: –Learning systems are usually.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Topic 5 Statistical inference: point and interval estimate
Chapter 8: Confidence Intervals
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Random Sampling, Point Estimation and Maximum Likelihood.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Manijeh Keshtgary Chapter 13.  How to report the performance as a single number? Is specifying the mean the correct way?  How to report the variability.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
CpSc 881: Machine Learning Evaluating Hypotheses.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Machine Learning Chapter 5. Evaluating Hypotheses
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
1 Evaluation of Learning Models Literature: Literature: T. Mitchel, Machine Learning, chapter 5 T. Mitchel, Machine Learning, chapter 5 I.H. Witten and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
© Copyright McGraw-Hill 2004
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Evaluating Hypotheses
Evaluating Classifiers
Empirical Evaluation (Ch 5)
Evaluating Hypotheses
Evaluating Hypothesis
Machine Learning: Lecture 5
Presentation transcript:

CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution, Central Limit Theorem Paired t-tests Comparing Learning Methods

CS 8751 ML & KDDEvaluating Hypotheses2 Problems Estimating Error 1. Bias: If S is training set, error S (h) is optimistically biased For unbiased estimate, h and S must be chosen independently 2. Variance: Even with unbiased S, error S (h) may still vary from error D (h)

CS 8751 ML & KDDEvaluating Hypotheses3 Two Definitions of Error The true error of hypothesis h with respect to target function f and distribution D is the probability that h will misclassify an instance drawn at random according to D. The sample error of h with respect to target function f and data sample S is the proportion of examples h misclassifies How well does error S (h) estimate error D (h)?

CS 8751 ML & KDDEvaluating Hypotheses4 Example Hypothesis h misclassifies 12 of 40 examples in S. What is error D (h)?

CS 8751 ML & KDDEvaluating Hypotheses5 Estimators Experiment: 1. Choose sample S of size n according to distribution D 2. Measure error S (h) error S (h) is a random variable (i.e., result of an experiment) error S (h) is an unbiased estimator for error D (h) Given observed error S (h) what can we conclude about error D (h)?

CS 8751 ML & KDDEvaluating Hypotheses6 Confidence Intervals If S contains n examples, drawn independently of h and each other Then With approximately N% probability, error D (h) lies in interval

CS 8751 ML & KDDEvaluating Hypotheses7 Confidence Intervals If S contains n examples, drawn independently of h and each other Then With approximately 95% probability, error D (h) lies in interval

CS 8751 ML & KDDEvaluating Hypotheses8 error S (h) is a Random Variable Rerun experiment with different randomly drawn S (size n) Probability of observing r misclassified examples:

CS 8751 ML & KDDEvaluating Hypotheses9 Binomial Probability Distribution

CS 8751 ML & KDDEvaluating Hypotheses10 Normal Probability Distribution

CS 8751 ML & KDDEvaluating Hypotheses11 Normal Distribution Approximates Binomial

CS 8751 ML & KDDEvaluating Hypotheses12 Normal Probability Distribution

CS 8751 ML & KDDEvaluating Hypotheses13 Confidence Intervals, More Correctly If S contains n examples, drawn independently of h and each other Then With approximately 95% probability, error S (h) lies in interval equivalently, error D (h) lies in interval which is approximately

CS 8751 ML & KDDEvaluating Hypotheses14 Calculating Confidence Intervals 1. Pick parameter p to estimate error D (h) 2. Choose an estimator error S (h) 3. Determine probability distribution that governs estimator error S (h) governed by Binomial distribution, approximated by Normal when 4. Find interval (L,U) such that N% of probability mass falls in the interval Use table of z N values

CS 8751 ML & KDDEvaluating Hypotheses15 Central Limit Theorem

CS 8751 ML & KDDEvaluating Hypotheses16 Difference Between Hypotheses

CS 8751 ML & KDDEvaluating Hypotheses17 Paired t test to Compare h A,h B

CS 8751 ML & KDDEvaluating Hypotheses18 N-Fold Cross Validation Popular testing methodology Divide data into N even-sized random folds For n = 1 to N –Train set = all folds except n –Test set = fold n –Create learner with train set –Count number of errors on test set Accumulate number of errors across N test sets and divide by N (result is error rate) For comparing algorithms, use the same set of folds to create learners (results are paired)

CS 8751 ML & KDDEvaluating Hypotheses19 N-Fold Cross Validation Advantages/disadvantages –Estimate of error within a single data set –Every point used once as a test point –At the extreme (when N = size of data set), called leave-one-out testing –Results affected by random choices of folds (sometimes answered by choosing multiple random folds – Dietterich in a paper expressed significant reservations)

CS 8751 ML & KDDEvaluating Hypotheses20 Results Analysis: Confusion Matrix For many problems (especially multiclass problems), often useful to examine the sources of error Confusion matrix:

CS 8751 ML & KDDEvaluating Hypotheses21 Results Analysis: Confusion Matrix Building a confusion matrix –Zero all entries –For each data point add one in row corresponding to actual class of problem under column corresponding to predicted class Perfect prediction has all values down the diagonal Off diagonal entries can often tell us about what is being mis-predicted

CS 8751 ML & KDDEvaluating Hypotheses22 Receiver Operator Characteristic (ROC) Curves Originally from signal detection Becoming very popular for ML Used in: –Two class problems –Where predictions are ordered in some way (e.g., neural network activation is often taken as an indication of how strong or weak a prediction is) Plotting an ROC curve: –Sort predictions (right) by their predicted strength –Start at the bottom left –For each positive example, go up 1/P units where P is the number of positive examples –For each negative example, go right 1/N units where N is the number of negative examples

CS 8751 ML & KDDEvaluating Hypotheses23 ROC Curve False Positives (%) True Positives (%)

CS 8751 ML & KDDEvaluating Hypotheses24 ROC Properties Can visualize the tradeoff between coverage and accuracy (as we lower the threshold for prediction how many more true positives will we get in exchange for more false positives) Gives a better feel when comparing algorithms –Algorithms may do well in different portions of the curve A perfect curve would start in the bottom left, go to the top left, then over to the top right –A random prediction curve would be a line from the bottom left to the top right When comparing curves: –Can look to see if one curve dominates the other (is always better) –Can compare the area under the curve (very popular – some people even do t-tests on these numbers)