Evaluating Classifiers Lecture 2 Instructor: Max Welling.

Slides:



Advertisements
Similar presentations
Evaluating Classifiers
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Topics Today: Case I: t-test single mean: Does a particular sample belong to a hypothesized population? Thursday: Case II: t-test independent means: Are.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
Business 205. Review Sampling Continuous Random Variables Central Limit Theorem Z-test.
1 MF-852 Financial Econometrics Lecture 4 Probability Distributions and Intro. to Hypothesis Tests Roy J. Epstein Fall 2003.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.
Evaluation.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Evaluating Hypotheses
T-test.
DEPENDENT SAMPLES t-TEST What is the Purpose?What Are the Assumptions?How Does it Work?
CONFIDENCE INTERVALS What is the Purpose of a Confidence Interval?
Experimental Evaluation
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
Standard error of estimate & Confidence interval.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
AM Recitation 2/10/11.
Hypothesis Testing in Linear Regression Analysis
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
1 Machine Learning: Experimental Evaluation. 2 Motivation Evaluating the performance of learning systems is important because: –Learning systems are usually.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Random Sampling, Point Estimation and Maximum Likelihood.
Hypothesis Testing CSCE 587.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Experimental Evaluation of Learning Algorithms Part 1.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
CpSc 881: Machine Learning Evaluating Hypotheses.
USC3002 Picturing the World Through Mathematics Wayne Lawton Department of Mathematics S , Theme for Semester I, 2008/09.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 29 February 2008 William.
Machine Learning Chapter 5. Evaluating Hypotheses
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Confidence Interval & Unbiased Estimator Review and Foreword.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research The Classical Model and Hypothesis Testing.
T tests comparing two means t tests comparing two means.
Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased – error:
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Independent Samples: Comparing Means Lecture 39 Section 11.4 Fri, Apr 1, 2005.
Statistical Inference
Evaluating Hypotheses
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Chapter 4. Inference about Process Quality
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Evaluating Classifiers
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Empirical Evaluation (Ch 5)
STAT 5372: Experimental Statistics
Lecture 2 – Monte Carlo method in finance
Evaluating Hypotheses
Chapter 24 Comparing Two Means.
Evaluating Hypothesis
CS639: Data Management for Data Science
Machine Learning: Lecture 5
Presentation transcript:

Evaluating Classifiers Lecture 2 Instructor: Max Welling

Evaluation Given: Hypothesis h(x): X  C, in hypothesis space H, mapping attributes x to classes c=[1,2,3,...C] A data-sample S(n) of size n. Questions: What is the error of “h” on unseen data? If we have two competing hypotheses, which one is better on unseen data? How do we compare two learning algorithms in the face of limited data? How certain are we about our answers?

Sample and True Error We can define two errors: 1) Error(h|S) is the error on the sample S: 2) Error(h|P) is the true error on the unseen data sampled from the distribution P(x): where f(x) is the true hypothesis.

Flow of Thought Determine the property you want to know about the future data (e.g. error(h|P)) Find an unbiased estimator E for this quantity based on observing data X (e.g. error(h|X)) Determine the distribution P(error) under the assumption you have infinitely many sample sets X1,X2,...of some size n. Estimate the parameters of P(error) from an actual data sample S. Compute mean and variance of P(error) and pray P(error) it is close to a Normal distribution. (sums of random variables converge to normal distributions – central limit theorem) State your confidence interval as: with confidence N%, error(h|P) is contained in the interval

Assumptions We only consider discrete valued hypotheses (i.e. classes) Training data and test data are drawn IID from the same distribution P(x). (IID: independently & identically distributed) The hypothesis must be chosen independently from the data sample S! When you obtain a hypothesis from a learning algorithm, split the data into a training set and a testing set. Find the hypothesis using the training set and estimate error on the testing set.

Paired Tests Consider the following data: error(h1|s1)=0.1 error(h2|s1)=0.11 error(h1|s2)=0.2 error(h2|s2)=0.21 error(h1|s3)=0.66 error(h2|s3)=0.67 error(h1|s4)=0.45 error(h2|s4)=0.46 and so on. We have var(error(h1)) = large, var(error(h2)) = large. The total variance of error(h1)-error(h2) is their sum. However, h1 is consistently better than h2. We ignored the fact that we compare on the same data. We want a different estimator that compares data one by one. You can use a “paired t-test” (e.g. in matlab) to see if the two errors are significantly different, or if one error is significantly larger than the other.

Paired t-test On each subset T1,...,Tk with |Ti|>30 compute: Now compute: State: With N% confidence the difference in error between h1 and h2 is: “t” is the t-statistic which is related to the student-t distribution.

Comparing Learning Algorithms In general it is a really bad idea to estimate error rates on the same data on which a learning algorithm is trained. So just as in x-validation, we split the data into k subsets: S  {T1,T2,...Tk}. Train both learning algorithm 1 (L1) and learning algorithm 2 (L2) on the complement of each subset: {S-T1,S-T2,...) to produce hypotheses {L1(S-Ti), L2(S-Ti)} for all i. Compute for all i : Note: we train on S-Ti, but test on Ti. As in the last slide perform a paired t-test on these differences to compute an estimate and a confidence interval for the relative error of the hypothesis produced by L1 and L2.

Conclusion Never (ever) draw error-curves without confidence intervals (The second most important sentence of this course)