The “Big Picture” (from Heath 1995). Simple Linear Regression.

The “Big Picture” (from Heath 1995)

Simple Linear Regression

History Developed by Sir Francis Galton (1822- 1911) in his article “Regression towards mediocrity in hereditary structure”

Purpose: To describe the linear relationship between two continuous variables: the response variable Y and a single predictor variable X To determine how much of the variation in Y can be “explained” by the linear relationship with X To predict new values of Y from new values of X

The linear regression model is: X i and Y i are paired observations (i = 1 to n) β 0 = population intercept (when X i =0) β 1 = population slope (measures the change in Y i per unit change in X i ) ε i = the random or unexplained error associated with the i th observation. The ε i are assumed to be independent and distributed as N(0, σ 2 ).

Assumptions The linear model correctly describes the functional relationship between X and Y. The X i are known constants and are measured without error. For a given value of X, the sampled Y values are independent with normally distributed errors. Variances are constant along the regression line.

Linear relationship Y X β0β0 β1β1 1.0 εiεi XiXi YiYi

Linear models may approximate non-linear functions over a limited domain extrapolation interpolation

Y i = β o + β 1 *X i + ε i ε ~ N(0,σ 2 )  E(ε i ) = 0 E(Y i ) = β o + β 1 *X i X1X1 X2X2 E(Y 1 ) E(Y 2 ) Y X For a given value of X, the sampled Y values are independent with normally distributed errors:

YiYi ŶiŶi Y i – Ŷ i = ε i (residual) XiXi Fitting data to a linear model:

The squared residual: The residual sum of squares:

Estimating Regression Parameters The “best fit” estimates for the regression population parameters (β 0 and β 1 ) are the values that minimize the residual sum of squares (SS residual ) between each observed value and the predicted value of the model:

Sum of squares of X: Sum of cross products:

Least-squares estimate for the slope parameter:

Sample variance of X: Sample covariance:

Thus, our estimated regression equation is: Solving for the intercept:

Hypothesis Tests with Regression Null hypothesis is that there is no linear relationship between X and Y: H 0 : β 1 = 0  Y i = β 0 + ε i H A : β 1 ≠ 0  Y i = β 0 + β 1 X i + ε i We can use an F-ratio (i.e., the ratio of variances) to test these hypotheses

Variance of the error of regression: NOTE: this is also referred to as residual variance, mean squared error (MSE) or residual mean square (MS residual )

Mean square of regression: The F-ratio is: (MS Regression )/(MS Residual ) This ratio follows the F-distribution with (1, n-2) degrees of freedom

ANOVA table for regression SourcedfSum of squaresMean square Expected mean square F- ratio Regression 1 Residual n-2 Total n-1

Publication form of ANOVA table for regression Source Sum of Squaresdf Mean SquareF-ratioP-value Regression 2.1681 21.1180.00035 Residual 1.540150.103 Total 3.70816

Variance components and the Coefficient of Determination (r 2 or R 2 )

Coefficient of determination

Pearson’s product-moment correlation coefficient (r)

Parametric Confidence Intervals If we assume our parameter of interest has a particular sampling distribution and we have estimated its expected value and variance, we can construct a confidence interval for a given percentile. Example: if we assume Y is a normal random variable with unknown mean μ and variance σ 2, then is distributed as a standard normal variable. But, since we don’t know σ, we must divide by the standard error instead:, giving us a t- distribution with (n-1) degrees of freedom. The 100(1-α)% confidence interval for μ is then given by: IMPORTANT: this does not mean “There is a 100(1-α)% chance that the true population mean μ occurs inside this interval.” It means that if we were to repeatedly sample the population in the same way, 100(1-α)% of the confidence intervals would contain the true population mean μ.

Variance of the estimated slope: Since is distributed as a t-distribution with (n-2) df, we can generate a 100*(1-α)% confidence interval for β 1 as follows:

Variance of estimated intercept:

Variance of the fitted value Ŷ:

Variance of the predicted value (Ỹ):

THURS.: Confidence Intervals and Prediction Intervals

Residual plot for species-area relationship

The “Big Picture” (from Heath 1995). Simple Linear Regression.

Similar presentations

Presentation on theme: "The “Big Picture” (from Heath 1995). Simple Linear Regression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The “Big Picture” (from Heath 1995). Simple Linear Regression.

Similar presentations

Presentation on theme: "The “Big Picture” (from Heath 1995). Simple Linear Regression."— Presentation transcript:

Similar presentations

About project

Feedback