Computer Vision Lecture 8 Performance Evaluation.

Slides:



Advertisements
Similar presentations
Sample size estimation
Advertisements

Evaluating Classifiers
POINT ESTIMATION AND INTERVAL ESTIMATION
Evaluation.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter 7: Statistical Analysis Evaluating the Data.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Evaluating Classifiers Lecture 2 Instructor: Max Welling Read chapter 5.
Evaluation.
Evaluating Hypotheses
Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.
Discrete Probability Distributions
7-2 Estimating a Population Proportion
Lecture Slides Elementary Statistics Twelfth Edition
Experimental Evaluation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Chapter 10: Estimating with Confidence
Chapter 4 Continuous Random Variables and Probability Distributions
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter 5 Sampling Distributions
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
1 1 Slide Chapter 7 (b) – Point Estimation and Sampling Distributions Point estimation is a form of statistical inference. Point estimation is a form of.
Estimation of Statistical Parameters
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Random Sampling, Point Estimation and Maximum Likelihood.
PARAMETRIC STATISTICAL INFERENCE
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #5 Jose M. Cruz Assistant Professor.
Modular 11 Ch 7.1 to 7.2 Part I. Ch 7.1 Uniform and Normal Distribution Recall: Discrete random variable probability distribution For a continued random.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 5-2 Random Variables.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
CpSc 881: Machine Learning Evaluating Hypotheses.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Machine Learning Chapter 5. Evaluating Hypotheses
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
THE NORMAL APPROXIMATION TO THE BINOMIAL. Under certain conditions the Normal distribution can be used as an approximation to the Binomial, thus reducing.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
© Copyright McGraw-Hill 2004
Analysis of Experimental Data; Introduction
Binomial Distributions Chapter 5.3 – Probability Distributions and Predictions Mathematics of Data Management (Nelson) MDM 4U.
9-1 Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
THE NORMAL DISTRIBUTION
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Confidence Intervals and Sample Size
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Chapter 7 (b) – Point Estimation and Sampling Distributions
Chapter 8: Inference for Proportions
Evaluating Classifiers
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
LESSON 8: RANDOM VARIABLES EXPECTED VALUE AND VARIANCE
Lecture Slides Elementary Statistics Twelfth Edition
Lecture Slides Essentials of Statistics 5th Edition
Presentation transcript:

Computer Vision Lecture 8 Performance Evaluation

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 1 This Lecture Estimates of random variables and confidence intervals –Probability estimation –Parameter estimation Receiver operating characteristic –Theory –Experiment Training and testing –Bias due to re-testing –Leave-one-out

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 2 Experiment We develop a vision system that detects whether it is safe to cross the street. We test it 100 times and it works every time. What can we say about the probability of success?

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 3 Repeated Tosses of a Fair Coin We toss a coin 10 times –How many heads? –Numerical experiment Function RAND() in Excel produces random numbers uniformly distibuted from 0 to 1 Expression IF(RAND()>0.5,1,0) will produce 0 and 1 50% of the time Number of 1’s in 10 trials: 7, 6, 6, 6, 7

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 4 Theory The number of 1’s in n trials is given by P(k, n, p) =C(k, n) p k (1-p) n-k where p is the probability of a 1 and C(k, n) = n!/(k!(n-k)!) is the binomial coefficient. P(5, 10, 0.5) = P(6, 10, 0.5) = 0.205

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 5 We can say with 99% confidence that the probability of detection is between 0 and % confidence between 0 and 0.07

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 6 Parameter Estimation We perform an experiment with a binary outcome n times and obtain k successes. What can we say about the probability of error? The expected number of successes is np. The standard deviation of the number of successes is [np(1-p)] 0.5 The usual estimate of the probability is p^=k/n What range of values could have produced this result?

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 7 Cumulative Probability Plot P(35)=0.027, P(54) = The interval [0.35, 0.54] has a probability of We call this the 95% confidence interval for estimating the probability of success

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 8 Experimental Design We need to perform an experiment to test the reliability of a collection of parts. The test is destructive: we want to test as few parts as possible. The test consists of examining n parts, and rejecting the collection if k fail. The test is designed using the binomial formula. The design is complicated because –We need to specify the type of test to use and what constitutes a success/failure, how many bad parts are we willing to accept –We need to specify how to select the parts to be tested –There are two types of errors: we need to decide acceptable levels for both

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 9 Two-Category Test We perform a test on a patient, and get a measurement x. There are two possibilities: the patient is healthy or sick. The density functions for the two possibilities are shown below. We chose a threshold t, and decide that the patient is sick if the value x is higher than t. The two probabilities are called P(False Positive) and P(True Positive) t

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 10 ROC Plot As the threshold is varied, both P(FP) and P(TP) change. This curve is called the Receiver Operating Characteristic (ROC).

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 11 Evaluating the ROC The ROC lies in the unit square. Ideally, the curve should go up from (0, 0) to (0, 1), and then horizontally to (1,1). Rather than evaluating one combination of the two probabilities, it is desirable to measure the whole curve. A single measure of quality is given by A z, the area under the operating curve. –On the right the red curve has A z = 1, the blue curve has A z = 0.89

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 12 Experimental Results In any experiment with actual data, the performance, such as the error probability or A z should be treated as a random variable. The evaluation results should consist not only of the performance estimate but also a confidence interval.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 13 Performance of Classifiers In the last lecture, we considered the performance of classifiers, such as neural networks. These elements contain a number of parameters whose values must be chosen to obtain the correct operation. –For example, if the input to the classifier is one- dimensional, and the classifier uses a threshold to discriminate between two categories, we need to find the threshold from the training data.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 14 Training and Evaluation The quality of performance depends on the accuracy with which we estimate the parameters. Suppose we are measuring the error rate. We will represent the error rate obtainable with the best values of the classifier parameters by P min –To measure this probability we need to have a very large collection of samples, so the confidence interval of the estimate is very small –If we do not use the right values of parameters, then the probability will be greater than P min

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 15 Training Error We find the classifier parameters by examining some training samples and adjusting the classifier to correctly identify these data. –With a finite sample, the values we get will be a random variable. Since these will differ from the best values, the performance with these parameters will be worse than with the best parameters. To avoid this we need a large training set. –If we test the classifier with the same sample as that used for training, then the performance may seem to be better than the true performance. This is an optimistic bias, and can be very large.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 16 Independent Testing To avoid the bias of testing on the same data as training, it is necessary to divide the available data into two groups: A training set and a test set. One set is used for training and the other for testing. If the total number of available data is N, then one can use N 1 data items for training and N 2 items for testing, with N = N 1 + N 2.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 17 Testing Dilemma N = N 1 + N 2 To obtain accurate parameter values, N 1 should be large To obtain accurate performance values, N 2 should be large

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 18 Leave-One-Out One way of ameliorating this problem is to use the leave-one-out design –We exclude one data item, and train the system on N-1 items –Test the system on the remaining item –Repeat this for each item in the set This produces training sets with N 1 = N-1 elements and each test is independent of the training set. –The tests are not independent, so that confidence interval estimation is complicated.

Computer Vision, Lecture 8 Oleh Tretiak © 2005Slide 19 Summary Complex classifiers can be designed to correctly identify large data sets. However, they may not perform well on data they have not encountered –This is called a generalization problem To obtain a valid evaluation of performance, one must use independent training and test data. The complexity of a classifier depends not only on the problem (data distribution) but also on the size of the training set. –A classifier with very many parameters may perform poorly when trained with a small set because the parameters are not estimated accurately.