ESTIMATION
STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn from that population
STATISTICAL INFERENCE This can be achieved by : Hypothesis testing Estimation: Interval estimation (confidence interval)
Estimation If the mean and the variance of a normally distributed populations are known , then the probabilities of various events can be determined. But almost always these values are not known , and we have to estimate these numerical values from information of a simple random sample.
Estimation The process of estimation involves calculating from the data of a sample , some “statistic” which is an approximation of the corresponding “parameter” of the population from which the sample was drawn
POINT ESTIMATION It is a single numerical value obtained from a random sample used to estimate the corresponding population parameter _ Sample mean (X) is the best point estimate for population mean(µ )
POINT ESTIMATION Sample standard deviation (s) is the best point estimate for population standard deviation (σ ) ~ Sample proportion ( P) is the best point estimator for population proportion (P)
Because of sampling variation we can not say that the exact parameter value is some specific number, but we can determine a range of values within which we are confident the unknown parameter lies
INTERVAL ESTIMATION It consists of two numerical values defining an interval within which lies the unknown parameter we want to estimate with a specified degree of confidence
A point estimate is a single number, An interval estimate provides more information about a population characteristic than does a point estimate. It provides a confidence level for the estimate. Such interval estimates are called confidence intervals Upper Confidence Limit Lower Confidence Limit Point Estimate Width of confidence interval
INTERVAL ESTIMATION The values depend on the confidence level which is equal to 1-α (α is the probability of error) The interval estimate may be expressed as: = Estimator ± (Reliability coefficient X standard error)
INTERVAL ESTIMATION σ /√ n Standard error Estimator Parameter Sample mean_ ( X) Population mean (µ )
INTERVAL ESTIMATION √ (σ21/n1)+ (σ22/n2) (µ1-µ2) ( X1-X2) Standard error Estimator Parameter √ (σ21/n1)+ (σ22/n2) Difference between two sample means _ _ ( X1-X2) Difference between two population means (µ1-µ2)
INTERVAL ESTIMATION (P) Population proportion √ p(1-p)/n ( P) Standard error Estimator Parameter ~ ~ √ p(1-p)/n (since P is unknown, and we want to estimate it) Sample proportion ~ (P) Population proportion ( P)
INTERVAL ESTIMATION P1-P2 ( P1-P2) √ p1(1-p1)/n1 + p2(1-p2)/n2 Standard error Estimator Parameter ~ ~ ~ ~ √ p1(1-p1)/n1 + p2(1-p2)/n2 Difference between two Sample proportion ~ ~ P1-P2 Difference between two Population proportions ( P1-P2)
Reliability Coefficient in Z-test The reliability coefficient is the value of Z1-α /2 corresponding to the confidence level
Reliability Coefficient Z-value α -value Confidence level 1.645 0.1 90% 1.96 0.05 95% 2.58 0.01 99%
Reliability Coefficient in t-test The reliability coefficient is the value of t1-α /2 corresponding to the confidence level AND to the related degree of freedom (df=n-1).
Confidence Interval The Confidence Interval is central and symmetric around the sample mean .
C.I FOR POPULATION MEAN µ = X ± [(Z1-α /2 )(σ /√ n)] The sample mean is an estimate for population mean If the population variance is known, C.I around µ: _ _ X- (Z1-α /2 )(σ /√ n) < µ < X + (Z1-α /2 )(σ /√ n) or _ µ = X ± [(Z1-α /2 )(σ /√ n)]
EXERCISE The mean S. indirect bilirubin level of 16 (1st four days old infants) was found to be 5.98 mg/dl. The population SD (σ)=3.5 mg/dl. Assuming normality , find 90,95, 99% CI for µ: _ _ {X- Z1-α /2 * σ /√ n < µ < X + Z1-α /2 * σ /√ n}
EXERCISE _ _ X- (Z1-α /2 )(σ /√ n) < µ < X + (Z1-α /2 )(σ /√ n) _ _ X- (Z1-α /2 )(σ /√ n) < µ < X + (Z1-α /2 )(σ /√ n) 90%CI= {5.98- 1.645 * 3.5 /√ 16 < µ < 5.98 + 1.645 * 3.5 /√ 16} 90%CI= {5.98- 1.44 < µ < 5.98 + 1.44} 90%CI= {4.54 < µ < 7.42}
95%CI {5.98- 1.96 * 3.5 /√ 16 < µ < 5.98 + 1.96 * 3.5 /√ 16} _ _ CI{X- Z1-α /2 x σ /√ n < µ < X + Z1-α /2 x σ /√ n} 95%CI {5.98- 1.96 * 3.5 /√ 16 < µ < 5.98 + 1.96 * 3.5 /√ 16} 95%CI {5.98- 1.715 < µ < 5.98 + 1.715} 95%CI {4.265 < µ < 7.695}
99%CI{5.98- 2.58 * 3.5 /√ 16 < µ < 5.98 + 2.58 * 3.5 /√ 16} _ _ CI{X- Z1-α /2 x σ /√ n < µ < X + Z1-α /2 x σ /√ n} 99%CI{5.98- 2.58 * 3.5 /√ 16 < µ < 5.98 + 2.58 * 3.5 /√ 16} 99%CI{5.98- 2.258 < µ < 5.98 + 2.258} 99%CI={ 3.72 < µ < 8.24}
CI for difference between two population means A sample of 10 (twelve years old boys) and a sample of 10 (twelve years old girls) yielded mean height of 59.8 inches (boys), and 58.5 inches (girls). Assuming normality and σ1=2 inches, and σ2= 3 inches . Find 90% CI for the difference in means of height between girls and boys at this age.
CI for difference between two population means _ _ _ _ CI{( X1-X2) -Z √ (σ21/n1)+ (σ22/n2)< (µ1-µ2)< ( X1-X2)+ Z√ (σ21/n1)+ (σ22/n2)} 90%CI{( 59.8-58.5) -1.645 √ (2)2/10)+ (3)2/10)< (µ1-µ2)< ( 59.8-58.5)+1.645√ (2)2/10)+ (3)2/10)} 90%CI{1.3 -1.88< (µ1-µ2)< 1.3+ 1.88} 90%CI{ -0.58< (µ1-µ2)< 3.18}
CI for population proportion In a survey 300 adults were interviewed , 123 said they had yearly medical checkup. Find the 95% for the true proportion of adults having yearly medical checkup. ~ 123 P=-------=0.41 300
CI for population proportion ~ ~ ~ ~ ~ ~ CI{P-Z √ p(1-p)/n<P<P+Z √ p(1-p)/n} 95%CI{0.41-1.96 √ 0.41(1-0.41)/300<P<0.41+1.96 √ 0.41(1-0.41)/300} 95%CI{0.41- 0.06<P<0.41+0.06} 95%C.I. = {0.35<P<0.47}
CI for difference between two population proportions 200 patients suffering from a certain disease were randomly divided into two equal groups. The first group received NEW treatment, 90 recovered in three days. Out of the other 100 who received the STANDARD treatment 78 recovered within three days. Find the 95% CI for the difference between the proportion of recovery among the populations receiving the two treatments
Answer ~ ~ 90 78 P1-P2=------- - ---------=0.12 100 100
Answer ~ ~ ~ ~ ~ ~ ~ ~ CI [( P1-P2 )-Z √ p1(1-p1)/n1 + p2(1-p2)/n2 ] < P1-P2 <[ ( P1-P2 )+Z ~ ~ ~ ~ √ p1(1-p1)/n1 + p2(1-p2)/n2 ] __________________________________ 95% CI=0.12± 1.96 √ 0.9(1-0.9)/100 + 0.78(1-0.78)/100 95%CI=0.12 ± 0.1 95%CI =0.02-0.22
The width of the interval estimation is increased by: Increasing confidence level (i.e.: decreasing alpha value) Decreasing sample size
Confidence level can shade the light on the following information: 1.The range within which the true value of the estimated parameter lies
2.The statistical significance of a difference ( in population means(µ1-µ2) or proportions (P1-P1)). If the ZERO value is included in the interval of such differences( i.e.: the range lies between a negative value and a positive value), then we can state that there is no statistically significant difference between the two population values (parameters), although the sample values (statistics) showed a difference
3.The sample size. A narrow interval indicates a “large” sample size, while a wide interval indicates a “small” sample size (with fixed confidence level)
EXERCISES In a study to assess the side effects of two drugs , 50 animals were given Drug A (11 showed undesirable side effects), and 50 were given Drug B (8 showed similar side effects). Find the 95% CI for PA-PB
EXERCISES In a random sample of 100 workers , the mean blood lead level was 90 ppm. If the distribution of blood lead level in workers population is normal with a standard deviation of 10 ppm. Find the 90,95,and 99% CI for the population mean.
EXERCISE In assessing the relationship between a certain drug and a certain anomaly in chick embryos, 50 fertilized eggs were injected with the drug on the 4th day of incubation . On the 20th day the embryos were examined and in 12 the presence of the abnormality was observed. Find the 90,95, and 99% CI for the population proportion.
EXERCISE If the Hb level of males aged >10 years is normally distributed with a variance of 1.462 (gm/dl)2 , and that of males below 10 years is also normally distributed with a variance of 0.867 (gm/dl)2 . If a random sample of 10 older and 20 younger males are selected , and showed sample means of 14.47 gm/dl, and 12.64 gm/dl , respectively. Find the 90, 95, and 99% CI for the difference in population means.