Estimation Procedures Point Estimation Confidence Interval Estimation
Three Properties of Point Estimators 1. Unbiasedness 2. Consistency 3. Efficiency
Estimate Number Error Error 0 0 The estimates in green are more efficient (smaller standard error) but the estimates in red are unbiased
x MEAN The Sampling Distribution of x MEAN for ‘large’ samples
The standard error (s.e.) of estimation for x MEAN is given by s.e. = / n where is the population standard deviation and n is the sample size
s.e. = / n Q. Why is the standard error (s.e.) directly related to A.If the population is more varied (dispersed) it is more difficult to locate the ‘typical’ value In which case are you likely to predict the population mean more accurately?? 1. The age distribution of all students in English schools, or 2. The age distribution of all students in English sixth form colleges?
Q. Why is the s.e. inversely related to the sample size? A. The larger the n, the more ‘representative’ the sample is of the population and hence the smaller sampling error s.e. = / n
Confidence Interval (CI) Sometimes, it is possible and convenient to predict, with a certain amount of confidence in the prediction, that the true value of the parameter lies within a specified interval. Such an interval is called a Confidence Interval (CI)
The statement ‘ [ L, H ] is the 95% CI of ’ is to be interpreted that with 95% chance the population mean lies within the specified interval and with 5% chance it lies outside.
Two points to appreciate about the CI A. The larger the standard error, longer is the CI, ceteris paribus B. The higher the level of confidence, the longer is the CI, ceteris paribus
z The area shaded orange is approximately 98% of the whole
z The area shaded orange is approximately 95% of the whole
Example1 (Confidence Interval for the population mean): Suppose that the result of sampling yields the following: x MEAN = 25 ; n = 36. Use this information to construct a 95% CI for , given that = 16
Since n >24, we can say that x MEAN is approximately Normal( , 2 /36). Standardisation means that (x MEAN - )/( /6) is approximately z. Now find the two symmetric points around 0 in the z table such that the area is The answer is z = 1.96.
Now solve (x MEAN - )/( 6) = (25- )/(16/6) = 1.96 to get two values of = and = Thus, the 95% CI for is [ ]
Question: How is the length of the CI related to the standard error? Answer: Ceteris Paribus, the CI is directly related to standard error
Example 2 :(Confidence Interval for the population mean): Suppose that the result of sampling yields the following: x MEAN = 25 ; n = 36. Use this information to construct a 95% CI for , given that = 32
Now solve (x MEAN - )/( 6) = (25- )/(32/6) = 1.96 to get two values of = and = Thus, the 95% CI for is [ ] Compare with the 95% CI for [ ] for
Question: How is the length of the CI related to the level of confidence? Answer: Ceteris Paribus, the CI will be longer the higher the level of confidence.
Example 3 :(Confidence Interval for the population mean): Suppose that the result of sampling yields the following: x MEAN = 25 ; n = 36. Use this information to construct a 90% CI for , given that = 16
Solve (x MEAN - )/( 6) = (25- )/(16/6) = to get two values of = and = Thus, the 90% CI for is [ ] Compare with the 95% CI for [ ]
1. The sample size n is ‘small’ The CLT does not work! To do any kind of parametric analysis we need the population to be normally distributed Case 1: The population standard deviation is known Theory: If X is normal( 2 ) then x MEAN is also normal( 2 /n) Some Procedural Problems in Parametric Analysis
Example4: (Confidence Interval for the population mean with small samples): Suppose that the result of sampling from a normal population with = 4 yields the following:
x MEAN = 25 ; n = 18. Use this information to construct the 90% CI for , Since X is normal( 2 ) then x MEAN is also normal( 2 /18) (x MEAN - )/(4/ ) = (25- )/(4/ ) = = 26.55, or = The required CI is [23.45, 26.55]
1. The sample size n is ‘small’ Case 2: The population standard deviation is unknown Theory: If X is normal( 2 ) then x MEAN is also normal( 2 /n) with unknown Theory: If x MEAN is normal( 2 /n) with unknown, then (x MEAN – )/s/ n has a t-distribution with (n-1) degrees of freedom. s ≡ ( (x i – x MEAN ) 2 /(n-1) for raw data, s ≡ ( f i (x i – x MEAN ) 2 /(n-1) for grouped data
Example5: (Confidence Interval for the population mean): Suppose that the result of sampling from a normal population yields the following:
x MEAN = 25 ; n = 18. Use this information to construct a 95% CI for , given that s 2 = 16 First, note that as is unknown, we use s for . But since n < 24, we can only say that x MEAN has a t-distribution with 17 degrees of freedom. Now find from the t-distribution table the two symmetric values of t such that the area in between them is 0.95.
The answer is t = Now solve (x MEAN - )/(s/6) = 2.11 (25- )/(16/6) = 2.11 to get two values of L = and H = Thus the 95% CI for is [19.37, 30.63].
2.The population standard deviation( is unknown but the sample size is ‘large’: We estimate by either of the two estimates, s or where s ≡ ( (x i – x MEAN ) 2 /N for raw data, and s ≡ ( f i (x i – x MEAN ) 2 /N for grouped data Then we proceed as in Example1 above.
The Sampling Distribution of the Sample proportion (p) Suppose that the population mean = 0.6 and consider the following statistical process Sample Number Value of p
This is the distribution of p provided n and n(1- are p p Sample Proportion Density
p p Sample Proportion This is the distribution of p provided n and n(1- are
p Density p Sample Proportion This is the distribution of p provided n and n(1- are
As n gets larger p Density p Sample Proportion
and larger…. p Density p Sample Proportion
p Density and larger…. p Sample Proportion
p Density and larger…. p Sample Proportion
p The distribution gets more compact around the mean value ( Density p Sample Proportion
The distribution gets more compact around the mean value ( p Density p Sample Proportion
The distribution gets more compact around the mean value ( p Density p Sample Proportion
The distribution of the sample proportion (p ) for three sample sizes: n1 < n2 < n3 p Density Sample Size: n2 Sample Size: n1 Sample Size: n3
Properties of p 1.p is an unbiased estimator of the population mean E(p ) = 2. Standard error of p (s.e. p ) is given by s.e p = { /n} Therefore, p is a consistent estimator of
Example1: (Confidence Interval for the population proportion): Suppose that the result of sampling yields the following:
p= 0.4 ; n = 36. Use this information to construct a 98% CI for . First, we do the validity check. This requires n 5 as well as n(1- ) 5. Because we don’t know what is, we use p in the place of .
Since p = 0.4 and n > 30, the validity check is satisfied. We can therefore say that p is approximately N( 2 /36) where 2 = ). Standardisation means that (p- /6 is approximately z. Now find the two symmetric points around 0 in the z table such that the area is The answer is z = 2.33.
Now solve (p- /6 = 2.33 (0.4- /6)= 2.33 In this expression we do not know what is, so we don’t know what is. We use 0.4 as a point estimator for and calculate an estimate for * = 0.49
(0.4- )/ 0.49/6 = 2.33 to get two values of L = 0.21 and H = Thus the 98% CI for is [ ]