Download presentation
Presentation is loading. Please wait.
Published byHillary Kennedy Modified over 9 years ago
1
Chapter 20 Linear Regression
2
What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary and education level For each observation we have two measures, and those two measures came from the same person
3
What would we “predict”? Does more education mean more salary? Does more salary mean more education? Does more education mean less salary? Does more salary mean less education? Are salary and education related?
4
Regression Descriptive vs. Inferential Bivariate data - measurements on two variables for each observation –Heights (X) and weights (Y) –IQ (X) and SAT(Y) scores –Years of educ. (X) and Annual salary (Y) –Number of Policemen (X) and Number of crimes (Y) in US cities
5
Regression How are the two sets of scores related? Using a scatterplot we can “look” at the relationship Constructed by plotting each of the bivariate observations (X, Y)
6
Regression Which one’s X and which one’s Y? That’s up to you, but… Generally, the X variable is thought of as the “predictor” variable We try to predict a Y score given an X score
7
Regression If the scores seem to “line up,” we call this a “linear relationship”
8
Interpreting Scatterplots If the following relations hold: low x - high y mid x - mid y high x - low y, “A negative linear relationship”
9
Interpreting Scatterplots If the following relations hold: low x - low y mid x - mid y high x - high y, “A positive linear relationship”
10
Interpreting Scatterplots However, there also can be “no relation” also
11
Interpreting Scatterplots Curvelinear
12
Measuring Linear Relationships The first measure of a linear relationship (not in the book) is COVARIANCE (s XY )
13
Or SP XY is known as the “Sum of Products” or the sum of the products of the deviations of X and Y from their means
14
Easy Calculation
15
Covariance Interpretation: –positive = positive linear relationship –negative = negative linear relationship –zero = no relationship Magnitude (strength of the relationship)? –Uninterpretable –for example, a large covariance does not necessarily mean strong relationship
16
But, we can use covariance Which line best fits our data? Do we just draw one that looks good? No, we can use something called “least squares regression” to find the equation of the best-fit line (“Best-fit linear regression”)
17
Linear Equations Y i = mX i + b m = slope b = y-intercept
18
Finding the Slope
19
Or…
20
Finding the y-intercept (b) After finding the slope (m), find b using:
21
Least Squares Criterion The best line has the property of least squares The sum of the squared deviations of the points from the line are a minimum
22
What’s the “least” again? What are we trying to minimize? –The best fit line will be described by the function Y i = mX i + b –Thus, for any X i, we can estimate a corresponding Y i value –Problem: for some X i ’s we already have Y i ’s –So, let’s call the estimated value (“Y-sub-I-hat”), to differentiate it from the “real” Y i
23
Least Squares Criterion For example, when X i = 15 we would estimate that = 44,000 But, we have a “real” Y i value corresponding to X i =15 (35,000) When X i = 15 Our estimated Y value is 44,000 A “ real ” Y value of 35,000
24
Minimize this… For every X i, we have the a value Y i, and an estimate of Y i ( ) Consider the quantity: –Which is the deviation of the real score from the estimated score, for any give X i value The sum of these deviations will be zero
25
But, by squaring those deviations and summing, We want the line that makes the above quantity the minimum (the least squares criterion) This is also called the sums of squares error or SSE (how much do our estimates “err” from our real values?)
26
How accurate are our Estimates? Two ways to measure how “good” our estimates are: –Standard Error of the Estimate –Coefficient of Determination (not covered in our book, yet)
27
Standard Error of the Estimate but, this term is very hard to interpret. (Hurrah, there are better ways to measure the goodness of the fit!)
28
Coefficient of Determination cd = r 2
29
Now You: IDINCOMENUMDRK 200111 200262 200358 200441 200563
30
Practice: IDINCOMENUMDRK XY 200111 200262 200358 200441 200563 Σ n M SS(X)
31
Practice: IDINCOMENUMDRK XY 2001111 20026212 20035840 2004414 20056318 Σ221575 n55 M4.43 SS(X)17.234
32
Practice: IDINCOMENUMDRK XY 2001111 20026212 20035840 2004414 20056318 Σ221575 n55 M4.43 SS(X)17.234
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.