1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

Evaluating Classifiers
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Model Assessment, Selection and Averaging
ESTIMATION AND HYPOTHESIS TESTING
Chapter 7 Introduction to Sampling Distributions
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
Assuming normally distributed data! Naïve Bayes Classifier.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Chapter 6 Introduction to Sampling Distributions
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Evaluation.
Evaluating Hypotheses
Parametric Inference.
SAMPLING DISTRIBUTIONS. SAMPLING VARIABILITY
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Experimental Evaluation
System Evaluation To evaluate the error probability of the designed Pattern Recognition System Resubstitution Method – Apparent Error Overoptimistic Holdout.
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Business Statistics: Communicating with Numbers
Computer Vision Lecture 8 Performance Evaluation.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
CLassification TESTING Testing classifier accuracy
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Networks. Male brain wiring Female brain wiring.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
The Central Limit Theorem and the Normal Distribution.
Optimal Bayes Classification
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
1  The goals:  Select the “optimum” number l of features  Select the “best” l features  Large l has a three-fold disadvantage:  High computational.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query
Machine Learning Chapter 5. Evaluating Hypotheses
Chapter 6 Cross Validation.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Nonparametric Methods II 1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Validation methods.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Chapter 7, part D. VII. Sampling Distribution of The sampling distribution of is the probability distribution of all possible values of the sample proportion.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 3: Maximum-Likelihood Parameter Estimation
Virtual University of Pakistan
Parameter Estimation 主講人:虞台文.
Maximum Likelihood Estimation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Presentation transcript:

1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class for testing. the number of test points.  Let P i the probability error for class ω i  The classifier is assumed to have been designed using another independent data set  Assuming that the feature vectors in the test data set are independent, the probability of k i vectors from ω i being in error is SYSTEM EVALUATION

2  Since P i ’s are not known, estimate P i by maximizing the above binomial distribution. It turns out that  Thus, count the errors and divide by the total number of test points in class  Total probability of error

3  Statistical Properties Thus the estimator is unbiased but only asymptotically consistent. Hence for small N, may not be reliable  A theoretically derived estimate of a sufficient number N of the test data set is

4  Exploiting the finite size of the data set.  Resubstitution method: Use the same data for training and testing. It underestimates the error. The estimate improves for large N and large ratios.  Holdout Method: Given N divide it into: N 1 : training points N 2 : test points N=N 1 +N 2 Problem: Less data both for training and test

5  Leave-one-out Method The steps: Choose one sample out of the N. Train the classifier using the remaining N-1 samples. Test the classifier using the selected sample. Count an error if it is misclassified. Repeat the above by excluding a different sample each time. Compute the error probability by averaging the counted errors

6  Advantages: Use all data for testing and training Assures independence between test and training samples  Disadvantages: Complexity in computations high  Variants of it exclude k>1 points each time, to reduce complexity