n Point Estimation n Confidence Intervals for Means n Confidence Intervals for Differences of Means n Tests of Statistical Hypotheses n Additional Comments.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

“Students” t-test.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Evaluating Hypotheses
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
STATISTICAL INFERENCE PART VI
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Hypothesis Testing.
Chapter 8 Inferences Based on a Single Sample: Tests of Hypothesis.
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
More About Significance Tests
Chapter 9 Large-Sample Tests of Hypotheses
Adapted by Peter Au, George Brown College McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Random Sampling, Point Estimation and Maximum Likelihood.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Chapter 20 Testing hypotheses about proportions
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence intervals and hypothesis testing Petter Mostad
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
STATISTICAL INFERENCE PART IV CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 1.
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
© Copyright McGraw-Hill 2004
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Virtual University of Pakistan
Chapter 4. Inference about Process Quality
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
CONCEPTS OF HYPOTHESIS TESTING
Chapter 9 Hypothesis Testing.
CONCEPTS OF ESTIMATION
STATISTICAL INFERENCE PART IV
Discrete Event Simulation - 4
Confidence Intervals.
Presentation transcript:

n Point Estimation n Confidence Intervals for Means n Confidence Intervals for Differences of Means n Tests of Statistical Hypotheses n Additional Comments About Statistical Tests n Chi-Square Tests In this chapter, we will discuss the following problems: Chapter 6 introduction to Statistical Inference

In this chapter we develop statistical inference (estimation and testing) based on likelihood methods. We show that these procedures are asymptotically optimal under certain conditions. Suppose that X 1, …, X n ~ (iid) X, with pdf f (x;  ), (or pmf p(x;  )), . 6.1 Point Estimation (6.1.1) L(  ; x) = where x = (x 1, …, x n ). L(  ; x) is the joint pdf or pmf of random sample X 1, …, X n. We often write L(  ; x) as L(  ) due to it is a function of . The log of L(  ) is usually more convenient to work with mathematically. Denote the logL(  ) by The parameter  (can be vectors) is unknown. The basis of our inferential procedures is the likelihood function given by, The Maximum Likelihood Estimation

l(  ) = logL(  ) = (6.1.2) Note that there is no loss of information in using l(  ) because the log is a one-to-one function. Example Let Where unknown parameter  : 0    1. The problem is how can we estimate  based on x 1, …, x n, the observation of sample? According to the principle of maximum likelihood, when the value of  can make L(  ), the joint pmf of random sample, get the maximum value. Then we call this  s value is the best estimate of . Because it make the event {X 1 =x 1, …, X n =x n } occur in biggest probability.

Let We can get the estimator of  is Here is called the maximum likelihood estimator of , and is called the maximum likelihood estimate of . Let  0 denote the true value of . Theorem gives a theoretical Reason for maximizing the likelihood function.

Assumptions (Regularity Conditions). (R0): The pdfs are distinct: i.e.,    f (x i ;  )  f (x i ;  ). (R1): The pdfs have common support for all . (R2): The point  0 is an interior point in . Theorem Let  0 be the true parameter. Under assumptions (R0) and (R1), (6.1.3) Theorem says that asymptotically the likelihood function is maximized at the true value  0. so in considering estimates of  0, it seems natural to consider the value of  which maximizes the likelihood.

Definition (Maximum Likelihood Estimator). We say that (X) is a maximum likelihood estimator (mle) of  if (6.1.4) The notation Argmax means that L(  ; X) achieves its maximum value at. The mle of  can denoted by (6.1.5) is called estimating equation or likelihood equation. Example (Exponential Distribution). Suppose the X 1, …, X n iid from X~Exp(  ). Its pdf is defined by Find the mle of  :

Example 6.1.3(Laplace Distribution). Let X 1, …, X n be iid with density (6.1.6) This pdf is referred to either the Laplace or the double exponential distribution. Find the (6.1.7) Here sgn(t)=1, 0,  1 depending on whether t>0, t=0, or t<0. Setting the equation (6.1.7) to 0, the solution for  is median of sample, which can denoted by Q 2 (the 2nd quartile of the sample). i.e = Q 2.

Example 6.1.4(Uniform Distribution). Let X 1, …, X n be iid with the uniform (0,  ) density, i.e., Find the Theorem Let X 1, …, X n be iid with the pdf f (x;  ),  . For a specified function g, let  = g(  ) be a parameter of interest. Then is the mle of  =g (  ). Theorem Assume that X 1, …, X n satisfy the regularity condition (R0) and (R1),  0 is the true parameter, and further that f (x;  ) is differentiable with respect to  in . Then the likelihood equation, (6.1.8) has a solution

6.1.2 The method of the Moment Let X 1, …, X n be iid with the pdf f (x;  ),  . The parameter  (can be vectors  = (  1, …,  r )) is unknown. Setting We can find the estimators of the parameters  1, …,  r. this method is called the method of the Moment.. It is denoted by It should be done  1, …,  r. this method is called the method of the Moment.. It is noted that this could be done in an equivalent manner by equating and so on until unique solutions for  1, …,  r are obtained.

6.2 Confidence Intervals for Means 1.Size n sample with unknown meanμbut known Var.  2 If X 1, …, X n ~ N( ,  2 ), then Then, confidence interval of  is

2. Size n sample with unknown mean  and Var.  2 If we do not know the distribution of X, then when n is large, we Still approximately have (C.L.T.) Please look Example 1 on P270. ∴  2 can be replaced by nS 2 /(n  1). We have an approximate Confidence interval for . (when n is large)

As a matter of fact, if X 1,…,X n ~N( ,  2 ), (i.i.d), then X and S 2 are independent, and Let’s look Example 2 and 3 on p Note: Suppose X 1,…, X n ~ (i.i.d) f(x;  ), a r. v. Q(X 1,…,X n ;  ) is a pivotal quantity (or pivot) if the distribution of Q is independent of all parameters. The above method to find the c. i. is called pivotal inference. As we known, when X 1,…,X n ~ (i.i.d)N( ,  2 ),  is unknown, then is a pivot. Similarly, is also a pivot.

3. The Confidence interval of p =P(success) If Y~ b(n, p), we can use the relative frequency Y/n to estimate p. What is the accuracy. approximately That is

for brevity. We have an approximate (100  )% c. i. [p 1, p 2 ] for p. Replacing p by Y/n in p(1  p)/n, we have (P254 Example 1.) ∴ we get a c. i.

6.3 Confidence Intervals for Differences of Means 1. C. I. for differences of means of double normal Distributions

The c. i. of  1  2 can be found from the inequality as follows. The c. i. of  1  2 can be found from the inequality as follows. Where S w is the mixed samples variance. You can look example 1. on page 278 of the book.

2. The Confidence Intervals for p 1 – p 2 If Y 1 ~b(n 1, p 1 ), Y 2 ~ b(n 2, p 2 ) and they are independent, Then, approximately, n 1, n 2 are large. Again replacing p i by Y i /n i in p i (1  p i )/n i, I =1, 2, we have an approximate (100  )% c.i. for p 1  p 2 as follows : You can look example 2. on page 279 of the book. the pivot

6.4 Tests of Statistical Hypothesis Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference that is frequently used concerns tests of hypotheses. Suppose a r. v. X~f(x;  ) where , and  =  0  1,  0  1 = . We label hypotheses as H 0 :  0 versus H 1 :  1. (6.4.1) The hypothesis H 0 is referred to as the null hypothesis while H 1 is referred to as the alternative hypothesis. Often the null hypothesis represents no change or no difference from the past while the alternative represents change or difference. The alternative is often referred to as the research worker’s hypothesis.

Example (Zea Mays Data). In 1878 Charles Darwin recorded some data on the heights of zea mays plants to determine what effect cross-fertilized or self-fertilized had on the heights of zea mays. We will represent the data as (X 1, Y 1 ), …, (X 15, Y 15 ), where X i and Y i are the heights of the cross-fertilized and self-fertilized plants, respectively, in the ith pot. Let W i =X i  Y i.  =E(W i ), our hypotheses are: H 0 :  =0 versus H 1 :  >0. (6.4.2) Hence,  0 ={0} represents no difference in the treatments and  1 =(0,  ) represents a difference in the treatments. The decision rule to take H 0 or H 1 is based on a sample X 1, …, X n from the distribution of X and hence, the decision could be wrong.

Table 6.4.1: 2  2 decision Table for a Test of Hypothesis True State of Nature DecisionH 0 is trueH 1 is true Reject H 0 Type I ErrorCorrect Decision Accept H 0 Correct DecisionType II Error A test of H 0 versus H 1 is based on a subset C of D. This set C is called the critical region and its corresponding decision rule (test) is: Reject H 0, (Accept H 1 ), if (X 1, …, X n )  C Retain H 0, (Reject H 1 ), if (X 1, …, X n )  C c. (6.4.3) A Type I error occurs if H 0 is rejected when it is true while a Type II error occurs if H 0 accepted when H 1 is true.

Def We say a critical region C is of size  if (6.4.4) When  1, we want to maximize 1  P  [Type II Error]= P  [(X 1, …, X n )  C]. The probability on the right side of this equation is called the power of the test at . It is the probability that the test detects the alternative  when  1 is the true parameter. So minimizing the probability of Type II error is equivalent to maximizing power. We define the power function of a critical region to be  C (  )= P  [(X 1, …, X n )  C];  1. (6.4.5) Hence, given two critical regions C 1 and C 2 which are both of size , C 1 is better than C 2 if  C 1 (  )   C 2 (  ) for  1.

Example (Test for a Binomial Proportion of Success). Let X be a Bernoulli r. v. with probability of success p. suppose we want to test at size , H 0 : p=p 0 versus H 1 : p < p 0, (6.4.6) where p 0 is specified. As an illustration, suppose “success” is dying from a certain disease and p 0 is the probability of dying with some standard treatment. Remark (Nomenclature). If a hypothesis completely specifies the underlying distribution, such as H 0 : p=p 0, in Example 6.4.2, it is called a simple hypothesis. Most hypotheses, such as H 1 : p < p 0, are composite hypotheses, because they are composed of many simple hypotheses and hence do not completely specify the distribution.

Frequently,  is also called the significance level of the test associated with the critical region, and so on. Example (Large Sample Test for the Mean). Let X be a r, v, with mean  and finite variance  2. We want to test the hypotheses H 0 :  =  0 versus H 1 :  <  0, (6.4.6) where  0 is specified. To illustrate, suppose  0 is the mean level on a standardized test of students who have been taught a course by a standard method of teaching. Suppose it is hoped that a new method which incorporates computers will have a mean level  >  0, where  =E(X) and X is the score of a student taught by the new method. Example (Test for  under Normality). Let X have a N( ,  2 ) distribution. Consider the hypotheses

H 0 :  =  0 versus H 1 :  <  0, (6.4.6) where  0 is specified. Assume that the desired size of the test is , for 0<  <1. Suppose X 1, …, X n is a r. s. from X. using the distribution of t(n  1), it is easy to show that the following rejection rule has exact level  : Reject H 0 in favor of H 1 if(6.4.7) where t , n  1 is the upper  critical point of a t distribution with n  1 degrees of freedom; i.e.,  = P(T > t , n  1 ). This is often called the t test of H 0 :  =  0.

6.5 Additional Comments About Statistical Tests All of the alternative hypotheses considered in section 6.4 were one-sided hypotheses. For illustration, in exercise 6.42 we tested H 0 :  =30,000 against the one-sided alternative H 1 :  >30,000, where  is the mean of a normal distribution having standard deviation  = Perhaps in this situation, though, we think the manufacturer’s process has changed nut are unsure of the direction. That is, we are interested in the alter-native H 1 :   30,000. in this section, we further explore hypotheses testing and we begin with the construction of a test for a two sided alternative involving the mean of a r. v.

Example 1 (Large sample Two-Sided Test for the Mean). Let X be a r. v. with mean  and finite variance  2. We want to test H 0 :  =  0 versus H 1 :    0 (6.5.1) where  0 is specified. Let X 1, …, X n be a r. s. from X. We will use the decision rule Reject H 0 in favor of H 1 if (6.5.2) where h and k are such that Clearly h <k, hence, we have An intuitive rule is to divide  equally between the two terms on the right-side of the above expression; that is, h and k chosen by

(6.5.3) By the CLT and the consistency of S 2 to  2, we have under H 0 that This and (6.5.3) leads to the approximate decision rule: Reject H 0 in favor of H 1 if (6.5.4) To approximate the power function of the test, we use the CLT. Upon substituting  for S, it readily follows that the approximate power function is (6.5.5)

 (  ) is strictly decreasing for  <  0 and strictly increasing for  >  0. Where  (z) is the pdf of a standard normal r. v. Accept H 0 if and only if (6.5.6) Example 2. Let X 1, …, X n 1 iid from N(  1,  2 ); Y 1, …, Y n 2 iid from N(  2,  2 ). At  =0.05, reject H 0 :  1 =  2 and accept the one-sided alternative H 1 :  1 >  2 if where

Example 3. Say X~b(1, p). Consider testing H 0 : p=p 0 against H 1 : p<p 0. Let X 1, …, X n be a r. s. from X and let To test H 0 versus H 1, we use either If n is large, both Z 1 and Z 2 have approximate N(0, 1) distributions provided that H 0 : p=p 0 is true. If  is given, c can be decided. is a (1   )100% approximate c. i. for p.

Example 4. Let X 1, …, X 10 be a r. s. from P(  ). A critical region for testing H 0 :  =0.1 against H 1 :  >0.1 is given by (p290) Remark (Observed Significance Level). Not many statisticians like randomized tests in practice, because the use of them means that two statisticians could make the same assumptions, observe the same data, apply the same test, and yet make different decisions. Hence they usually adjust their significance level so as not to randomize. As a matter of fact, many statisticians report what are commonly called observed significance level or p-values

6.6 Chi-Square Tests In this section we introduce tests of statistical hypotheses called chi-square tests. A test of this sort was originally proposed by Karl Pearson in 1900, and it provided one of the earlier methods of statistical inference. Let’s now discuss some r. vs. that have approximate chi-square distributions. Let X 1 be b(n, p 1 ). Consider the r. v. which has, as n , a limiting distribution of N(0, 1), we strongly suspect that the limiting distribution of Z=Y 2 is  2 (1).

If G n (y) represents the cdf of Y, we know that (CLT) Let H n (z) be the cdf of Z=Y 2. Thus, if z  0, (H n (z)=0, if z < 0) Therefore, the limiting distribution of Y is  2 (1). Now, let X 1 ~b(n, p 1 ), X 2 = n  X 1 and p 2 =1  p 1.Then X 1  np 1 =(n  X 2 )  n(1  p 2 ) =  (X 2  np 2 )

We say, when n is positive integer, that Q 1 has a limiting chi- square distribution with 1 degree of freedom. This result can be generalized as follows. Let X 1, …, X k  1 have a multinomial distribution with the parameters n and p 1, …, p k  1. Let X k = n  (X 1 +…+X k  1 ) and let p k =1  (p 1 +…+ p k  1 ). Define Q k  1 by

It is proved in a more advanced course that, as n , Q k  1 has a limiting distribution that is  2 (k  1). If we accepted this fact, we can say Q k  1 has an approximate chi-square distribution with k  1 degree of freedom when n is a positive integer. This approximation is good when n is large enough (n  30) and np i  5. The r. v. Q k  1 may serve as basis of the tests of certain statistical hypotheses which we now discuss. Let the sample space A=A 1  …  A k, and A i  A j = , i  j. Let P(A i ) = p i, i = 1, …, k, where p k =1  (p 1 +…+p k  1 ), so that p i is the probability that the outcome of the random experiment of the set A i. The random experiment is to be repeated n independent times and X will represent the number of times the outcome an element of the set A i. That is, X 1, …, X k  1, X k =n  (X 1 +…+X k  1 ) is the multinomial pdf with parameters n, p 1, …, p k  1.

Consider the simple hypothesis (concerning this multinomial pmf) H 0 : p 1 =p 10, …, p k  1 =p k  1,0, (p k = p k0 =1  p 10  …  p k  1,0 ), where p 10, …, p k  1,0 are specified numbers. It is desired to test H 0 against all alternatives. If the hypothesis H 0 is true, the r. v. has an approximate  2 (k  1) distribution. If significance level  is given, then the critical region is Q k  1    2 (k  1). That is P(Q k  1    2 (k  1))= . This is frequently called a goodness of fit test. There are some illustrative examples as follows. P280~284 Example 1.~4. P