Visual Recognition Tutorial

Slides:



Advertisements
Similar presentations
The Simple Regression Model
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7. Statistical Estimation and Sampling Distributions
Statistical Estimation and Sampling Distributions
POINT ESTIMATION AND INTERVAL ESTIMATION
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Integration of sensory modalities
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
Visual Recognition Tutorial
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
The Simple Linear Regression Model: Specification and Estimation
Maximum likelihood (ML) and likelihood ratio (LR) test
The Mean Square Error (MSE):. Now, Examples: 1) 2)
Simple Linear Regression
Point estimation, interval estimation
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Maximum likelihood (ML)
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Parametric Inference.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Maximum likelihood (ML)
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Chapter 6: Sampling Distributions
Simulation Output Analysis
Chapter 7 Estimation: Single Population
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 9 Samples.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
Estimators and estimates: An estimator is a mathematical formula. An estimate is a number obtained by applying this formula to a set of sample data. 1.
5. Consistency We cannot always achieve unbiasedness of estimators. -For example, σhat is not an unbiased estimator of σ -It is only consistent -Where.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
Joint Moments and Joint Characteristic Functions.
Introduction to Estimation Theory: A Tutorial
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
STATISTICS People sometimes use statistics to describe the results of an experiment or an investigation. This process is referred to as data analysis or.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Chapter 6: Sampling Distributions
STATISTICS POINT ESTIMATION
Visual Recognition Tutorial
12. Principles of Parameter Estimation
Chapter 6: Sampling Distributions
Chapter 2 Minimum Variance Unbiased estimation
Basic Econometrics Chapter 4: THE NORMALITY ASSUMPTION:
Summarizing Data by Statistics
Computing and Statistical Data Analysis / Stat 7
Learning From Observed Data
12. Principles of Parameter Estimation
Chapter 8 Estimation.
Data Exploration and Pattern Recognition © R. El-Yaniv
Presentation transcript:

236607 Visual Recognition Tutorial Bias and variance of estimators The score and Fisher information Cramer-Rao inequality 236607 Visual Recognition Tutorial

Estimators and their Properties Let be a parametric set of distributions. Given a sample drawn i.i.d from one of the distributions in the set we would like to estimate its parameter (thus identifying the distribution). An estimator for w.r.t. is any function notice that an estimator is a random variable. How do we measure the quality of an estimator? Consistency: An estimator for is consistent if this is a (desirable) asymptotic property that motivates us to acquire large samples. But we should emphasize that we are also interested in measures for finite (and small!) sample sizes. 236607 Visual Recognition Tutorial

Estimators and their Properties Bias: Define the bias of an estimator to be Here, the expectation is w.r.t. to the distribution The estimator is unbiased if its bias is zero Example: the estimators and , for the mean of a normal distribution, are both unbiased. The estimator for its variance is biased whereas the estimator is unbiased. Variance: another important property of an estimator is its variance . We would like to find estimators with minimum bias and variance. Which is more important, bias or variance? 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Risky Estimators Employ our decision-theoretic framework to measure the quality of estimators. Abbreviate and consider the square error loss function The conditional risk associated with when is the true parameter Claim: Proof: 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Bias vs. Variance So, for a given level of conditional risk, there is a tradeoff between bias and variance. This tradeoff is among the most important facts in pattern recognition and machine learning. Classical approach: Consider only unbiased estimators and try to find those with minimum possible variance. This approach is not always fruitful: The unbiasedness only means that the average of the estimator (w.r.t. to ) is . It doesn’t mean it will be near for a particular sample (if variance is large). In general, an unbiased estimate is not guaranteed to exist. 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial The Score The score of the family is the random variable measures the “sensitivity” of as a function of the parameter . Claim: Proof: Corollary: 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial The Score - Example Consider the normal distribution clearly, and 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial The Score - Vector Form In case where is a vector, the score is the vector whose th component is Example: 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Fisher Information Fisher information: Designed to provide a measure of how much information the parametric probability law carries about the parameter . An adequate definition of such information should possess the following properties: The larger the sensitivity of to changes in , the larger should be the information The information should be additive: The information carried by the combined law should be the sum of those carried by and The information should be insensitive to the sign of the change in and preferably positive The information should be a deterministic quantity; should not depend on the specific random observation 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Fisher Information Definition (scalar form): Fisher information (about ), is the variance of the score Example: consider a random variable 236607 Visual Recognition Tutorial

Fisher Information - Cntd. Whenever is a vector, Fisher information is the matrix where Remainder: Remark: the Fisher information is only defined whenever the distributions satisfy some regularity conditions. (For example, they should be differentiable w.r.t. and all the distributions in the parametric family must have same support set). 236607 Visual Recognition Tutorial

Fisher Information - Cntd. Claim: Let be i.i.d. random variables . The score of is the sum of the individual scores. Proof: Example: If are i.i.d. , the score is 236607 Visual Recognition Tutorial

Fisher Information - Cntd. Based on i.i.d. samples, the Fisher information about is Thus, the Fisher information is additive w.r.t. i.i.d. random variables. Example: Suppose are i.i.d. . From previous example we know that the Fisher information about the parameter based on one sample is Therefore, based on the entire sample, 236607 Visual Recognition Tutorial

The Cramer-Rao Inequality Theorem: Let be an unbiased estimator for . Then Proof: Using we have: 236607 Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd. Now 236607 Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd. So, By the Cauchy-Schwarz inequality Therefore, For a biased estimator we have: 236607 Visual Recognition Tutorial

The Cramer-Rao General Case The Cramer-Rao inequality also true in general form: The error covariance matrix for is bounded as follows: 236607 Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd. Example: Let be i.i.d. . From previous example Now let be an (unbiased) estimator for . So matches the Cramer-Rao lower bound. Def: An unbiased estimator whose covariance meets the Cramer-Rao lower bound is called efficient. 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Efficiency Theorem (Efficiency): The unbiased estimator is efficient, that is, iff Proof (If): If then meaning 236607 Visual Recognition Tutorial

236607 Visual Recognition Tutorial Efficiency Only if: Recall the cross covariance between The Cauchy-Schwarz inequality for random variables says with equality iff 236607 Visual Recognition Tutorial

Cramer-Rao Inequality and ML - Cntd. Theorem: Suppose there exists an efficient estimator for all . Then the ML estimator is . Proof: By assumption By previous claim or for all This holds at and since this is a maximum point the left side is zero so 236607 Visual Recognition Tutorial