Download presentation
Presentation is loading. Please wait.
Published byMalcolm Owen Modified over 8 years ago
1
The “Big Picture” (from Heath 1995)
2
Simple Linear Regression
3
History Developed by Sir Francis Galton (1822- 1911) in his article “Regression towards mediocrity in hereditary structure”
4
Purpose: To describe the linear relationship between two continuous variables: the response variable Y and a single predictor variable X To determine how much of the variation in Y can be “explained” by the linear relationship with X To predict new values of Y from new values of X
6
The linear regression model is: X i and Y i are paired observations (i = 1 to n) β 0 = population intercept (when X i =0) β 1 = population slope (measures the change in Y i per unit change in X i ) ε i = the random or unexplained error associated with the i th observation. The ε i are assumed to be independent and distributed as N(0, σ 2 ).
7
Assumptions The linear model correctly describes the functional relationship between X and Y. The X i are known constants and are measured without error. For a given value of X, the sampled Y values are independent with normally distributed errors. Variances are constant along the regression line.
8
Linear relationship Y X β0β0 β1β1 1.0 εiεi XiXi YiYi
9
Linear models may approximate non-linear functions over a limited domain extrapolation interpolation
10
Y i = β o + β 1 *X i + ε i ε ~ N(0,σ 2 ) E(ε i ) = 0 E(Y i ) = β o + β 1 *X i X1X1 X2X2 E(Y 1 ) E(Y 2 ) Y X For a given value of X, the sampled Y values are independent with normally distributed errors:
11
YiYi ŶiŶi Y i – Ŷ i = ε i (residual) XiXi Fitting data to a linear model:
12
The squared residual: The residual sum of squares:
13
Estimating Regression Parameters The “best fit” estimates for the regression population parameters (β 0 and β 1 ) are the values that minimize the residual sum of squares (SS residual ) between each observed value and the predicted value of the model:
14
Sum of squares of X: Sum of cross products:
15
Least-squares estimate for the slope parameter:
16
Sample variance of X: Sample covariance:
17
Thus, our estimated regression equation is: Solving for the intercept:
18
Hypothesis Tests with Regression Null hypothesis is that there is no linear relationship between X and Y: H 0 : β 1 = 0 Y i = β 0 + ε i H A : β 1 ≠ 0 Y i = β 0 + β 1 X i + ε i We can use an F-ratio (i.e., the ratio of variances) to test these hypotheses
19
Variance of the error of regression: NOTE: this is also referred to as residual variance, mean squared error (MSE) or residual mean square (MS residual )
20
Mean square of regression: The F-ratio is: (MS Regression )/(MS Residual ) This ratio follows the F-distribution with (1, n-2) degrees of freedom
21
ANOVA table for regression SourcedfSum of squaresMean square Expected mean square F- ratio Regression 1 Residual n-2 Total n-1
22
Publication form of ANOVA table for regression Source Sum of Squaresdf Mean SquareF-ratioP-value Regression 2.1681 21.1180.00035 Residual 1.540150.103 Total 3.70816
23
Variance components and the Coefficient of Determination (r 2 or R 2 )
24
Coefficient of determination
25
Pearson’s product-moment correlation coefficient (r)
26
Parametric Confidence Intervals If we assume our parameter of interest has a particular sampling distribution and we have estimated its expected value and variance, we can construct a confidence interval for a given percentile. Example: if we assume Y is a normal random variable with unknown mean μ and variance σ 2, then is distributed as a standard normal variable. But, since we don’t know σ, we must divide by the standard error instead:, giving us a t- distribution with (n-1) degrees of freedom. The 100(1-α)% confidence interval for μ is then given by: IMPORTANT: this does not mean “There is a 100(1-α)% chance that the true population mean μ occurs inside this interval.” It means that if we were to repeatedly sample the population in the same way, 100(1-α)% of the confidence intervals would contain the true population mean μ.
27
Variance of the estimated slope: Since is distributed as a t-distribution with (n-2) df, we can generate a 100*(1-α)% confidence interval for β 1 as follows:
28
Variance of estimated intercept:
29
Variance of the fitted value Ŷ:
30
Variance of the predicted value (Ỹ):
31
THURS.: Confidence Intervals and Prediction Intervals
32
Residual plot for species-area relationship
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.