CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Advertisements

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Statistical Estimation and Sampling Distributions
Sampling: Final and Initial Sample Size Determination
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Maximum likelihood (ML) and likelihood ratio (LR) test
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Presenting: Assaf Tzabari
Parametric Inference.
Visual Recognition Tutorial
Statistical Background
Visual Recognition Tutorial
Chapter 7 Estimation: Single Population
Computer vision: models, learning and inference
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.
Maximum likelihood (ML)
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Chapter 12 Review of Calculus and Probability
Statistical Decision Theory
Model Inference and Averaging
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Chapter 6 parameter estimation
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
Joint Moments and Joint Characteristic Functions.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Conditional Expectation
STATISTICS POINT ESTIMATION
Visual Recognition Tutorial
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Chapter 4. Inference about Process Quality
More about Posterior Distributions
Summarizing Data by Statistics
6 Point Estimation.
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Applied Statistics and Probability for Engineers
Presentation transcript:

CHAPTER 8 More About Estimation

8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering Bayesian estimates, which are also based upon sufficient statistics if the latter exist. We shall now describe the Bayesian approach to the problem of estimation. This approach takes into account any prior knowledge of the experiment that the statistician has and it is one application of a principle of statistical inference that may be called Bayesian statistics. Consider a random variable X that has a distribution of probability that depends upon the symbol, where is an element of a well-defined set. random variable : has a distribution of probability over the set. x: a possible value of the random variable X.

: a possible value of the random variable. The distribution of X depends upon, an experimental value of the random variable. We shall denote the p.d.f. of by and we take when is not an element of. Moreover, we now denote the p.d.f. of X by since we think of it as a conditional p.d.f. of X, given Say is a random sample from his conditional distribution of X. Thus we can write the joint conditional p.d.f. of, given as Thus the joint p.d.f. of and is

If is a random variable of the continuous type, the joint marginal p.d.f. of is given by If is a random variable of the discrete type, integration would be replaced by summation. In either case the conditional p.d.f. of,given is This ralationship is another form of Bayes' formula.

Example. Let be a random sample from a Poission distribution with mean, where is the observed value of a random variable having a gamma distribution with known parameters and. Thus provided that and, and is equal to zero elsewhere. Then

Finally, the conditional p.d.f. of, given is provided that, and is equal to zero elsewhere. This conditional p.d.f. is one of the gamma type with parameters and

Bayesian statisticians frequently write that is proportional to that is, In example 1, the Bayesian statistician would simply write or, equivalently, and is equal to zero elsewhere.

In Bayesian statistics, the p.d.f. is called the prior p.d.f. of, and the conditional p.d.f. is called the posterior p.d.f. of. Suppose that we want a point estimate of, this really amounts to selecting a decision function, so that is a predicted value of when the computed value y and are known. : an experimental value of any random variable; : the mean, of the distribution of ; : the loss function. A Bayes' solution is a decision function that minimizes

If an interval estimate of is desired, we can find two functions and so that the conditional probability is large, say 0.95.

8.2 Fisher Information and the Rao-Cramer Inequality Let X be a random variable with p.d.f. where the parameter space is an interval. We consider only special cases, sometimes called regular cases, of probability density functions as we wish to differentiate under an integral sign. We have that and, by taking the derivative with respect to, (1)

The latter expression can be rewritten as or, equivalently, If we differentiate again, if follows that (2) We rewrite the second term of the left-hand member of this equation as

This is called Fisher information and is denoted by That is, but, from Equation (2), we see that can be computed from Sometimes, one expression is easier to compute than the other, but often we prefer the second expression.

Example 1. Let X be binomial Thus and Clearly, which is larger for values close to zero or 1.

The likelihood function is The Rao-Cram é r inequality: ( )

Definition 1. Let Y be an unbiased estimator of a parameter in such a case of point estimation. The statistic Y is called an efficient estimator of if and only if the variance of Y attains the Rao-Cram é r lower bound. Definition 2. In cases in which we can differentiate with respect to a parameter under an integral or summation symbol, the ratio of the Rao- Cram é r lower bound to the actual variance of any unbiased estimation of a parameter is called the efficiency of that statistic. Example 2. Let denote a random sample from a Poisson distribution that has the mean

It is known that is an m.l.e. of we shall show that it is also an efficient estimator of. We have Accordingly, The Rao-Cram é r lower bound in this case is But is the variance of. Hence is an efficient estimator of.

8.3 Limiting Distributions of Maximum Likelihood Estimators We can differentiate under the integral sign, so that has mean zero and variance. In addition, we want to be able to find the maximum likelihood estimator by solving

This equation can be rewrited as (1) Since Z is the sum of the i.i.d. random variables each with mean zero and variance,the numerator of the right-hand member of Equation (1)

is limiting N(0,1) by the central limit theorem. Example. Suppose that the random sample arises from a distribution with p.d.f. zero elsewhere. We have and

Since, the lower bound of the variance of every unbiased estimator of is. Moreover, the maximum likelihood estimator has an approximate normal distribution with mean and variance.Thus, in a limiting sense, is the unbiased minimum variance estimator of ; that is, is asymptotically efficient.

8.4 Robust M-Estimation We have found the m.l.e. of the center of the Cauchy distribution with p.d.f. where.The logarithm of the likelihood function of a random sample from this distribution is

The equation can be solved by some iterative process. We use the weight function where and where We have

and In addition, we define a weight function as which equals in the Cauchy case. Definition 1. An estimator that is fairly good (small variance, say) for a wide variety of distributions (not necessarily the best for any one of them) is called a robust estimator. Definition 2. Estimators associated with the solution of the equation

are frequently called robust M-estimators (denoted by ) because they can be thought of as maximum likelihood estimators. Huber's function: with weight provided that With Huber's function, another problem arises: If we double each estimators such as and median also double.

This is not at all true with the solution of the equation where the function is that of Huber. (1) A popular d to use is The scheme of selecting d also provides us with a clue for selecting k.

To satisfy the inequality Because then If all the values satisfy this inequality, then Equation (1) becomes This has the solution, which of course is most desirable with normal distributions.

To solve Equation (1). Newton's method. Let be a first estimate of, such as (the one-step M-estimate of ) If we use in place of, we obtain,the two-step M-estimate of.

The scale parameter is known. Two terms of Taylor's expansion of about provides the approximation This can be rewritten (2)

We have considered Clearly, Thus Equation (2) can be rewritten as (3)