4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population. These methods utilize the information contained in a sample from the population in drawing conclusions.
4-1 Statistical Inference
4-2 Point Estimation
4-3 Hypothesis Testing We like to think of statistical hypothesis testing as the data analysis stage of a comparative experiment, in which the engineer is interested, for example, in comparing the mean of a population to a specified value (e.g. mean pull strength) Statistical Hypotheses
4-3 Hypothesis Testing Statistical Hypotheses Two-sided Alternative Hypothesis One-sided Alternative Hypotheses
4-3 Hypothesis Testing Statistical Hypotheses Test of a Hypothesis A procedure leading to a decision about a particular hypothesis Hypothesis-testing procedures rely on using the information in a random sample from the population of interest. If this information is consistent with the hypothesis, then we will conclude that the hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude that the hypothesis is false.
4-3 Hypothesis Testing Testing Statistical Hypotheses
4-3 Hypothesis Testing Testing Statistical Hypotheses
4-3 Hypothesis Testing Testing Statistical Hypotheses Sometimes the type I error probability is called the significance level, or the -error, or the size of the test.
4-3 Hypothesis Testing Testing Statistical Hypotheses Reduce α 1.push critical regions further 2.Increasing sample size increasing Z value
4-3 Hypothesis Testing Testing Statistical Hypotheses
4-3 Hypothesis Testing Testing Statistical Hypotheses error when μ =52 and n=16. Increasing the sample size results in a decrease in β.
4-3 Hypothesis Testing Testing Statistical Hypotheses error when μ =50.5 and n=10. β increases as the true value of μ approaches the hypothesized value.
4-3 Hypothesis Testing Testing Statistical Hypotheses
4-3 Hypothesis Testing Testing Statistical Hypotheses The power is computed as 1 - , and power can be interpreted as the probability of correctly rejecting a false null hypothesis. We often compare statistical tests by comparing their power properties. For example, consider the propellant burning rate problem when we are testing H 0 : = 50 centimeters per second against H 1 : not equal 50 centimeters per second. Suppose that the true value of the mean is = 52. When n = 10, we found that = , so the power of this test is 1 - = = when = 52.
4-3 Hypothesis Testing P-Values in Hypothesis Testing P-value carries info about the weight of evidence against H 0. The smaller the p-value, the greater the evidence against H 0.
4-3 Hypothesis Testing One-Sided and Two-Sided Hypotheses Two-Sided Test: One-Sided Tests:
4-3 Hypothesis Testing General Procedure for Hypothesis Testing
OPTIONS NODATE NONUMBER; DATA EX415; mu_h0=12; mu_h1=11.25; sigma=0.5; n=4; alpha=0.05; xbar=11.7; sem=sigma/sqrt(n); /* sem = mean standard error */ Z=(xbar-mu_h0)/sem; z_beta= probit(alpha); xbar_beta=mu_h0+z_beta*sem; z_beta=(xbar_beta-mu_h1)/sem; PVAL=PROBNORM(Z); beta=1-probnorm(z_beta); power=1-beta; PROC PRINT; VAR PVAL beta power; TITLE '4-15 (page 168)'; RUN;QUIT; 4-15 (page 168) OBS PVAL beta power Hypothesis Testing
4-4 Inference on the Mean of a Population, Variance Known Assumptions
4-4 Inference on the Mean of a Population, Variance Known Hypothesis Testing on the Mean We wish to test : The test statistic is :
4-4 Inference on the Mean of a Population, Variance Known Hypothesis Testing on the Mean
4-4 Inference on the Mean of a Population, Variance Known Hypothesis Testing on the Mean Reject H 0 if the observed value of the test statistic z 0 is either: or Fail to reject H 0 if
4-4 Inference on the Mean of a Population, Variance Known Hypothesis Testing on the Mean
4-4 Inference on the Mean of a Population, Variance Known
4-4 Inference on the Mean of a Population, Variance Known OPTIONS NODATE NONUMBER; DATA EX43; MU=50; SD=2; N=25; ALPHA=0.05; XBAR=51.3; SEM=SD/SQRT(N); /*MEAN STANDARD ERROR*/ Z=(XBAR-MU)/SEM; PVAL=2*(1-PROBNORM(Z)); PROC PRINT; VAR Z PVAL ALPHA; TITLE 'TEST OF MU=50 VS NOT=50'; RUN;QUIT; TEST OF MU=50 VS NOT=50 OBS Z PVAL ALPHA
4-4 Inference on the Mean of a Population, Variance Known Type II Error and Choice of Sample Size Finding The Probability of Type II Error
4-4 Inference on the Mean of a Population, Variance Known Type II Error and Choice of Sample Size Finding The Probability of Type II Error
4-4 Inference on the Mean of a Population, Variance Known Type II Error and Choice of Sample Size Sample Size Formulas
4-4 Inference on the Mean of a Population, Variance Known Type II Error and Choice of Sample Size Sample Size Formulas
4-4 Inference on the Mean of a Population, Variance Known Type II Error and Choice of Sample Size
4-4 Inference on the Mean of a Population, Variance Known Type II Error and Choice of Sample Size
4-4 Inference on the Mean of a Population, Variance Known Large Sample Test In general, if n 30, the sample variance s 2 will be close to σ 2 for most samples, and so s can be substituted for σ in the test procedures with little harmful effect.
4-4 Inference on the Mean of a Population, Variance Known Some Practical Comments on Hypothesis Testing The Seven-Step Procedure Only three steps are really required:
4-4 Inference on the Mean of a Population, Variance Known Some Practical Comments on Hypothesis Testing Statistical versus Practical Significance
4-4 Inference on the Mean of a Population, Variance Known Some Practical Comments on Hypothesis Testing Statistical versus Practical Significance
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean Two-sided confidence interval: One-sided confidence intervals: Confidence coefficient:
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean Relationship between Tests of Hypotheses and Confidence Intervals If [l,u] is a 100(1 - ) percent confidence interval for the parameter, then the test of significance level of the hypothesis will lead to rejection of H 0 if and only if the hypothesized value is not in the 100(1 - ) percent confidence interval [l, u].
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean Confidence Level and Precision of Estimation The length of the two-sided 95% confidence interval is whereas the length of the two-sided 99% confidence interval is
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean Choice of Sample Size
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean Choice of Sample Size
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean Choice of Sample Size
4-4 Inference on the Mean of a Population, Variance Known Confidence Interval on the Mean One-Sided Confidence Bounds
4-4 Inference on the Mean of a Population, Variance Known General Method for Deriving a Confidence Interval
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean Calculating the P-value
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean Fixed Significance Level Approach
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown Fig 4-20 Normal probability plot of the coefficient of restitution data from Example 4-7.
4-5 Inference on the Mean of a Population, Variance Unknown OPTIONS NODATE NONUMBER; DATA ex47; xbar= ; sd= ; mu=0.82; n=15;alpha=0.05; df=n-1; t0=(xbar-mu)/(sd/sqrt(n)); pval = 1- PROBT(t0, df); /* PROBT function returns the probability from a t distribution */ /* less than or equal to x */ cr=tinv(1-alpha, df); /* critical region */ /* TINV function returns a quantile from the t distribution */ PROC PRINT; var t0 pval cr; title 'Test of mu=0.82 vs mu>0.82'; title2 'Ex. 4-7 in pp193'; RUN; QUIT; Test of mu=0.82 vs mu>0.82 Ex. 4-7 in pp193 OBS t0 pval cr
4-5 Inference on the Mean of a Population, Variance Unknown OPTIONS NOOVP NODATE NONUMBER; DATA ex47; input coeffs coeffs1=coeffs-0.82; cards; proc univariate data=ex47 plot normal; var coeffs1; title 'using proc uinivariate for t-test for one population mean'; title2 “Example 4-7 in page 191”; RUN; QUIT;
4-5 Inference on the Mean of a Population, Variance Unknown using proc uinivariate for t-test for one population mean Example 4-7 in page 191 UNIVARIATE 프로시저 변수 : coeffs1 적률 N 15 가중합 15 평균 관측치 합 표준 편차 분산 왜도 첨도 제곱합 수정 제곱합 변동계수 평균의 표준 오차 기본 통계 측도 위치측도 변이측도 평균 표준 편차 중위수 분산 최빈값. 범위 사분위 범위 위치모수 검정 : Mu0=0 검정 -- 통계량 p 값 스튜던트의 t t Pr > |t| 부호 M 2.5 Pr >= |M| 부호 순위 S 39 Pr >= |S| 정규성 검정 검정 ---- 통계량 p 값 Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq > Anderson-Darling A-Sq Pr > A-Sq >0.2500
4-5 Inference on the Mean of a Population, Variance Unknown using proc uinivariate for t-test for one population mean Example 4-7 in page 191 UNIVARIATE 프로시저 변수 : coeffs1 극 관측치 최소 최대 값 관측치 값 관측치 줄기 잎 # 상자그림 | | | | *--+--* | | | | 값 : ( 줄기. 잎 )*10**-2 정규 확률도 * +++* | *++++ | *+*++ | ** *+ | ++++** | ++*+* * | +++* *
4-5 Inference on the Mean of a Population, Variance Unknown OPTIONS NOOVP NODATE NONUMBER LS=80; DATA ex47; input coeffs cards; proc ttest data=ex47 h0=0.82 sides=u; /* h0 는 귀무가설, sides 는 양측일 경우 2, lower 단측 검정일경우 L, upper 단측 검정일 경우 u*/ var coeffs; title 'using t-test for one population mean'; title “Example 4-7 in page 191”; RUN; QUIT; using t-test for one population mean The TTEST Procedure Variable: coeffs N Mean Std Dev Std Err Minimum Maximum Mean 95% CL Mean Std Dev 95% CL Std Dev Infty DF t Value Pr > t
4-5 Inference on the Mean of a Population, Variance Unknown Type II Error and Choice of Sample Size Fortunately, this unpleasant task has already been done, and the results are summarized in a series of graphs in Appendix A Charts Va, Vb, Vc, and Vd that plot for the t-test against a parameter for various sample sizes n.
4-5 Inference on the Mean of a Population, Variance Unknown Type II Error and Choice of Sample Size These graphics are called operating characteristic (or OC) curves. Curves are provided for two-sided alternatives on Charts Va and Vb. The abscissa scale factor d on these charts is d efi ned as
4-5 Inference on the Mean of a Population, Variance Unknown Type II Error and Choice of Sample Size Chart V (c) in pp. 496
4-5 Inference on the Mean of a Population, Variance Unknown Confidence Interval on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown Confidence Interval on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown Confidence Interval on the Mean
4-5 Inference on the Mean of a Population, Variance Unknown OPTIONS NOOVP NODATE NONUMBER LS=80; DATA ex60; input diam diam1=diam-8.2; cards; proc univariate plot normal; var diam; title 'univariate for t-test with H0: mu=8.2'; proc univariate; var diam1; title 'univariate for t-test with H0: mu=0'; proc ttest data=ex60 h0=8.2; var diam; title 't-test for one population mean'; title “Example 4-60 in page 198”; RUN; QUIT;
4-5 Inference on the Mean of a Population, Variance Unknown univariate for t-test with H0: mu=8.2 UNIVARIATE 프로시저 변수 : diam 적률 N 12 가중합 12 평균 관측치 합 표준 편차 분산 왜도 첨도 제곱합 수정 제곱합 변동계수 평균의 표준 오차 기본 통계 측도 위치측도 변이측도 평균 표준 편차 중위수 분산 최빈값 범위 사분위 범위 Note: 표시된 최빈값은 2 개의 최빈값 ( 개수 : 2) 중에 가장 작습니다. univariate for t-test with H0: mu=8.2 UNIVARIATE 프로시저 변수 : diam 적률 위치모수 검정 : Mu0=0 검정 -- 통계량 p 값 스튜던트의 t t Pr > |t| <.0001 부호 M 6 Pr >= |M| 부호 순위 S 39 Pr >= |S| 정규성 검정 검정 ---- 통계량 p 값 Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq > Anderson-Darling A-Sq Pr > A-Sq >0.2500
4-5 Inference on the Mean of a Population, Variance Unknown univariate for t-test with H0: mu=8.2 UNIVARIATE 프로시저 변수 : diam 극 관측치 최소 최대 값 관측치 값 관측치 줄기 잎 # 상자그림 | 83 | *--+--* | | 값 : ( 줄기. 잎 )*10**-1 정규 확률도 * +*++++ | | *+*++* *+*+ | ++++*+* *
4-5 Inference on the Mean of a Population, Variance Unknown univariate for t-test with H0: mu=0 UNIVARIATE 프로시저 변수 : diam1 적률 N 12 가중합 12 평균 관측치 합 0.92 표준 편차 분산 왜도 첨도 제곱합 수정 제곱합 변동계수 평균의 표준 오차 기본 통계 측도 위치측도 변이측도 평균 표준 편차 중위수 분산 최빈값 범위 사분위 범위 Note: 표시된 최빈값은 2 개의 최빈값 ( 개수 : 2) 중에 가장 작습니다. 위치모수 검정 : Mu0=0 검정 -- 통계량 p 값 스튜던트의 t t Pr > |t| 부호 M 3 Pr >= |M| 부호 순위 S 31 Pr >= |S|
4-5 Inference on the Mean of a Population, Variance Unknown “Example 4-60 in page 198” The TTEST Procedure Variable: diam N Mean Std Dev Std Err Minimum Maximum Mean 95% CL Mean Std Dev 95% CL Std Dev DF t Value Pr > |t|
4-6 Inference on the Variance of a Normal Population Hypothesis Testing on the Variance of a Normal Population
4-6 Inference on the Variance of a Normal Population Hypothesis Testing on the Variance of a Normal Population
4-6 Inference on the Variance of a Normal Population Hypothesis Testing on the Variance of a Normal Population
4-6 Inference on the Variance of a Normal Population Hypothesis Testing on the Variance of a Normal Population
4-6 Inference on the Variance of a Normal Population Hypothesis Testing on the Variance of a Normal Population
4-6 Inference on the Variance of a Normal Population Hypothesis Testing on the Variance of a Normal Population
4-6 Inference on the Variance of a Normal Population Confidence Interval on the Variance of a Normal Population
4-6 Inference on the Variance of a Normal Population OPTIONS NODATE NONUMBER; DATA ex410; p_var=0.01; s_var=0.0153; n=20;alpha=0.05; df=n-1; chisq=(n-1)*s_var/p_var; p_value=1-probchi(chisq, df); cv1=cinv(1-alpha, df); /* critical value */ cv2=cinv(alpha, df); lcl=sqrt((n-1)*s_var/cv1); /* ucl= upper confidence limit */ PROC PRINT; var chisq p_value cv1 cv2 lcl; title 'Test of H0: var=0.01 vs H0: var>0.01'; title2 'Ex in pp202'; RUN; QUIT; Test of H0: var=0.01 vs H0: var>0.01 Ex in pp202 OBS chisq p_value cv1 cv2 lcl
4-7 Inference on Population Proportion Hypothesis Testing on a Binomial Proportion We will consider testing:
4-7 Inference on Population Proportion Hypothesis Testing on a Binomial Proportion
4-7 Inference on Population Proportion Hypothesis Testing on a Binomial Proportion
4-7 Inference on Population Proportion Hypothesis Testing on a Binomial Proportion
4-7 Inference on Population Proportion Type II Error and Choice of Sample Size
4-7 Inference on Population Proportion Type II Error and Choice of Sample Size
4-7 Inference on Population Proportion Confidence Interval on a Binomial Proportion
4-7 Inference on Population Proportion Confidence Interval on a Binomial Proportion
4-7 Inference on Population Proportion Confidence Interval on a Binomial Proportion Choice of Sample Size
4-8 Other Interval Estimates for a Single Sample Prediction Interval
4-8 Other Interval Estimates for a Single Sample Tolerance Intervals for a Normal Distribution
4-10 Testing for Goodness of Fit So far, we have assumed the population or probability distribution for a particular problem is known. There are many instances where the underlying distribution is not known, and we wish to test a particular distribution. Use a goodness-of-fit test procedure based on the chi- square distribution.