Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha
Previous Lecture Summary Inference about the slope t test for slope Inference about the slope (t test example) F test for significance Confidence Interval estimate for slope Confidence Interval concept and calculation
Inferences About the Slope: t Test t test for a population slope Is there a linear relationship between X and Y? Null and alternative hypotheses H 0 : β 1 = 0(no linear relationship) H 1 : β 1 ≠ 0(linear relationship does exist) Test statistic where: b 1 = regression slope coefficient β 1 = hypothesized slope S b1 = standard error of the slope
Inferences About the Slope: t Test Example House Price in $1000s (y) Square Feet (x) Estimated Regression Equation: The slope of this model is Is there a relationship between the square footage of the house and its sales price?
Inferences About the Slope: t Test Example H 0 : β 1 = 0 H 1 : β 1 ≠ 0 CoefficientsStandard Errort StatP-value Intercept Square Feet b1b1
Inferences About the Slope: t Test Example Test Statistic: t STAT = There is sufficient evidence that square footage affects house price Decision: Reject H 0 Reject H 0 /2=.025 -t α/2 Do not reject H 0 0 t α/2 /2= d.f. = = 8 H 0 : β 1 = 0 H 1 : β 1 ≠ 0
Inferences About the Slope: t Test Example H 0 : β 1 = 0 H 1 : β 1 ≠ 0 CoefficientsStandard Errort StatP-value Intercept Square Feet p-value There is sufficient evidence that square footage affects house price. Decision: Reject H 0, since p-value < α
F Test for Significance F Test statistic: where where F STAT follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom (k = the number of independent variables in the regression model)
F-Test for Significance Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations10 ANOVA dfSSMSFSignificance F Regression Residual Total With 1 and 8 degrees of freedom p-value for the F-Test
H 0 : β 1 = 0 H 1 : β 1 ≠ 0 =.05 df 1 = 1 df 2 = 8 Test Statistic: Decision: Conclusion: Reject H 0 at = 0.05 There is sufficient evidence that house size affects selling price 0 =.05 F.05 = 5.32 Reject H 0 Do not reject H 0 Critical Value: F = 5.32 F Test for Significance (continued) F
Confidence Interval Estimate for the Slope Confidence Interval Estimate of the Slope: Excel Printout for House Prices: At 95% level of confidence, the confidence interval for the slope is (0.0337, ) CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Square Feet d.f. = n - 2
Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.74 and $ per square foot of house size CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept Square Feet This 95% confidence interval does not include 0. Conclusion: There is a significant relationship between house price and square feet at the.05 level of significance Confidence Interval Estimate for the Slope (continued)
Confidence Interval Example Cereal fill example Population has µ = 368 and σ = 15. If you take a sample of size n = 25 you know 368 ± 1.96 * 15 / = (362.12, ). 95% of the intervals formed in this manner will contain µ. When you don’t know µ, you use X to estimate µ If X = the interval is ± 1.96 * 15 / = (356.42, ) Since ≤ µ ≤ the interval based on this sample makes a correct statement about µ. But what about the intervals from other possible samples of size 25?
Confidence Interval Example (continued) Sample #X Lower Limit Upper Limit Contain µ? Yes Yes No Yes Yes
Confidence Interval Example In practice you only take one sample of size n In practice you do not know µ so you do not know if the interval actually contains µ However you do know that 95% of the intervals formed in this manner will contain µ Thus, based on the one sample, you actually selected you can be 95% confident your interval will contain µ (this is a 95% confidence interval) (continued) Note: 95% confidence is based on the fact that we used Z = 1.96.
Estimation Process (mean, μ, is unknown) Population Random Sample Mean X = 50 Sample I am 95% confident that μ is between 40 & 60.
General Formula The general formula for all confidence intervals is: Point Estimate ± (Critical Value)(Standard Error) Where: Point Estimate is the sample statistic estimating the population parameter of interest Critical Value is a table value based on the sampling distribution of the point estimate and the desired confidence level Standard Error is the standard deviation of the point estimate
Confidence Intervals Population Mean σ Unknown Confidence Intervals Population Proportion σ Known
Confidence Interval for μ (σ Known) Assumptions Population standard deviation σ is known Population is normally distributed If population is not normal, use large sample (n > 30) Confidence interval estimate: where is the point estimate Z α/2 is the normal distribution critical value for a probability of /2 in each tail is the standard error
Finding the Critical Value, Z α/2 Consider a 95% confidence interval: Z α/2 = -1.96Z α/2 = 1.96 Point Estimate Lower Confidence Limit Upper Confidence Limit Z units: X units: Point Estimate 0
Common Levels of Confidence Commonly used confidence levels are 90%, 95%, and 99% Confidence Level Confidence Coefficient, Z α/2 value % 90% 95% 98% 99% 99.8% 99.9%
Intervals and Level of Confidence Confidence Intervals Intervals extend from to (1- )100% of intervals constructed contain μ; ( )100% do not. Sampling Distribution of the Mean x x1x1 x2x2
Example A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms. Determine a 95% confidence interval for the true mean resistance of the population.
Example A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms. Solution: (continued) DCOVA
Interpretation We are 95% confident that the true mean resistance is between and ohms Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true mean DCOVA
t Test for a Correlation Coefficient Hypotheses H 0 : ρ = 0 (no correlation between X and Y) H 1 : ρ ≠ 0 (correlation exists) Test statistic (with n – 2 degrees of freedom)
t-test For A Correlation Coefficient Is there evidence of a linear relationship between square feet and house price at the.05 level of significance? H 0 : ρ = 0 (No correlation) H 1 : ρ ≠ 0 (correlation exists) =.05, df = = 8 (continued)
t-test For A Correlation Coefficient Conclusion: There is evidence of a linear association at the 5% level of significance Decision: Reject H 0 Reject H 0 /2=.025 -t α/2 Do not reject H 0 0 t α/2 /2= d.f. = 10-2 = 8 (continued)
Pitfalls of Regression Analysis Lacking an awareness of the assumptions underlying least-squares regression Not knowing how to evaluate the assumptions Not knowing the alternatives to least-squares regression if a particular assumption is violated Using a regression model without knowledge of the subject matter Extrapolating outside the relevant range
Strategies for Avoiding the Pitfalls of Regression Start with a scatter plot of X vs. Y to observe possible relationship Perform residual analysis to check the assumptions Plot the residuals vs. X to check for violations of assumptions such as homoscedasticity Use a histogram, stem-and-leaf display, boxplot, or normal probability plot of the residuals to uncover possible non-normality
Strategies for Avoiding the Pitfalls of Regression If there is violation of any assumption, use alternative methods or models If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals Avoid making predictions or forecasts outside the relevant range (continued)
Lecture Summary Confidence Interval Concept Confidence Interval examples t test for correlation coefficient Application of t test in correlation coefficient Pitfalls of Regression Strategies to avoid pitfalls in regression SPSS Application in regression