Download presentation
Presentation is loading. Please wait.
Published byBertina Charles Modified over 6 years ago
1
Topic 13: Quantitative-Quantitative Association Part 1:
Introduction to linear regression Finding the best fit line by least squares regression Linear Regression and Outliers
2
Introduction to linear regression
3
Poverty vs. high school graduation rate
The scatterplot below shows the relationship between high school graduate rate in all 50 US states and DC and the percentage of residents who live below the poverty line (income below $23,050 for a family of 4 in 2012). Explanatory variable? % HS grad Response variable? % in poverty Relationship? linear, negative, moderately strong
4
Quantifying the relationship
The correlation coefficient (r) describes the strength of the linear association between two quantitative variables.
5
Strong, Moderate, or Weak?
Interpreting r requires knowledge in one’s field. A value for r that implies a strong relationship in one field may not in another. Below is a table for serves as a starting point until you learn more about your field in particular. Positive Association 0.8 to 1.0 (very strong) 0.6 to 0.8 (strong) 0.4 to 0.6 (moderate) 0.2 to 0.4 (weak) 0.0 to 0.2 (very weak) Negative Association -0.8 to -1.0 (very strong) -0.6 to -0.8 (strong) -0.4 to -0.6 (moderate) -0.2 to -0.4 (weak) 0.0 to -0.2 (very weak)
6
Guessing the correlation
Which of the following is the best guess for the correlation coefficient between % in poverty and % HS grad? 0.6 -0.75 -0.1 0.02 -1.5
7
Guessing the correlation
Which of the following is the best guess for the correlation between % in poverty and % female householder with no husband present? 0.1 -0.6 -0.4 0.9 0.5
8
Assessing the correlations
Which of the following has the strongest correlation, that is, the correlation coefficient is closest to 1 or -1
9
Finding the best fit line by least squares regression
10
Residuals Residuals are the distances of the observations to the line.
11
Method of least squares
We find the line that minimizes the sum of the squares of the residuals. Consider the Geogebra applet. Conditions for the least squares line Linearity Nearly normal residuals Constant variability
12
Conditions: (1) Linearity
The relationship between the explanatory and the response variable should be approximately linear.
13
Conditions: (2) Nearly normal residuals
The residuals should be nearly normal.
14
Conditions: (3) Constant variability
The variability of the points around the least squares line should be roughly constant.
15
Checking conditions What condition is this model obviously violating?
Constant variability Linear Relationship Normal residuals
16
Checking conditions What condition is this model obviously violating?
Constant variability Linear Relationship Normal residuals
17
r2 The strength of the fit of a linear model is most commonly evaluated using r2, that is, the square of the correlation coefficient. It tells us what percent of variability in the response variable is explained by the model. The remainder of the variability is due to other variables not included in the model or by inherent randomness in the data.
18
Interpretation of r2 r = -0.62 r2 = 0.38
38% of the variability in the % of residents living in poverty among the 51 states is explained by the model.
19
Linear Regression and Outliers
20
Outliers and direction of the association
Data are available on the surface temperature and light intensity of 47 stars in the star cluster CYG oB1
21
Outliers and association strength
r = 0.08, r2 = r = 0.79, r2 =
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.