Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Regression Inference

Similar presentations


Presentation on theme: "Linear Regression Inference"— Presentation transcript:

1 Linear Regression Inference
AP Statistics Linear Regression Inference

2 Hypothesis Tests: Slopes
Given: Observed slope relating Education to Job Prestige = 2.47 Question: Can we generalize this to the population of all Americans? How likely is it that this observed slope was actually drawn from a population with slope = 0? Solution: Conduct a hypothesis test Notation: slope = b, population slope = b H0: Population slope b = 0 H1: Population slope b  0 (two-tailed test)

3 Review: Slope Hypothesis Tests
What information lets us to do a hypothesis test? Answer: Estimates of a slope (b) have a sampling distribution, like any other statistic It is the distribution of every value of the slope, based on all possible samples (of size N) If certain assumptions are met, the sampling distribution approximates the t-distribution Thus, we can assess the probability that a given value of b would be observed, if b = 0 If probability is low – below alpha – we reject H0

4 Review: Slope Hypothesis Tests
Visually: If the population slope (b) is zero, then the sampling distribution would center at zero Since the sampling distribution is a probability distribution, we can identify the likely values of b if the population slope is zero If b=0, observed slopes should commonly fall near zero, too b Sampling distribution of the slope If observed slope falls very far from 0, it is improbable that b is really equal to zero. Thus, we can reject H0.

5 Bivariate Regression Assumptions
Assumptions for bivariate regression hypothesis tests: 1. Random sample Ideally N > 20 But different rules of thumb exist. (10, 30, etc.) 2. Variables are linearly related i.e., the mean of Y increases linearly with X Check scatter plot for general linear trend Watch out for non-linear relationships (e.g., U-shaped)

6 Bivariate Regression Assumptions
3. Y is normally distributed for every outcome of X in the population “Conditional normality” Ex: Years of Education = X, Job Prestige (Y) Suppose we look only at a sub-sample: X = 12 years of education Is a histogram of Job Prestige approximately normal? What about for people with X = 4? X = 16 If all are roughly normal, the assumption is met

7 Bivariate Regression Assumptions
Examine sub-samples at different values of X. Make histograms and check for normality. Normality: Good Not very good

8 Bivariate Regression Assumptions
4. The variances of prediction errors are identical at different values of X Recall: Error is the deviation from the regression line Is dispersion of error consistent across values of X? Definition: “homoskedasticity” = error dispersion is consistent across values of X Opposite: “heteroskedasticity”, errors vary with X Test: Compare errors for X=12 years of education with errors for X=2, X=8, etc. Are the errors around line similar? Or different?

9 Bivariate Regression Assumptions
Homoskedasticity: Equal Error Variance Examine error at different values of X. Is it roughly equal? Here, things look pretty good.

10 Bivariate Regression Assumptions
Heteroskedasticity: Unequal Error Variance At higher values of X, error variance increases a lot. This looks pretty bad.

11 Bivariate Regression Assumptions
Notes/Comments: 1. Overall, regression is robust to violations of assumptions It often gives fairly reasonable results, even when assumptions aren’t perfectly met 2. Variations of regression can handle situations where assumptions aren’t met 3. But, there are also further diagnostics to help ensure that results are meaningful…

12 Regression Hypothesis Tests
If assumptions are met, the sampling distribution of the slope (b) approximates a T-distribution Standard deviation of the sampling distribution is called the standard error of the slope (sb) Population formula of standard error: Where se2 is the variance of the regression error

13 Regression Hypothesis Tests
Estimating se2 lets us estimate the standard error: Now we can estimate the S.E. of the slope:

14 Regression Hypothesis Tests
Finally: A t-value can be calculated: It is the slope divided by the standard error Where sb is the sample point estimate of the standard error The t-value is based on N-2 degrees of freedom

15 Regression Confidence Intervals
You can also use the standard error of the slope to estimate confidence intervals: Where tN-2 is the t-value for a two-tailed test given a desired a-level Example: Observed slope = 2.5, S.E. = .10 95% t-value for 102 d.f. is approximately 2 95% C.I. = 2.5 +/- 2(.10) Confidence Interval: 2.3 to 2.7


Download ppt "Linear Regression Inference"

Similar presentations


Ads by Google