Presentation is loading. Please wait.

Presentation is loading. Please wait.

YOU NEED TO KNOW WHAT THIS MEANS

Similar presentations


Presentation on theme: "YOU NEED TO KNOW WHAT THIS MEANS"— Presentation transcript:

1 YOU NEED TO KNOW WHAT THIS MEANS
REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED TO KNOW WHAT THIS MEANS

2 REGRESSION ANALYSIS is the process of fitting a linear model to a data set.
The aim is to determine the best linear model possible and to use it to make predictions.

3 What do we mean by the “best possible linear model”?
The best possible linear model is the one in which: a. The data is linear or has been linearized by a data transformation:

4 and we also want b. the linear model which has the greatest possible value of r2
REMEMBER: the value of the coefficient of determination measures the predictive power of our regression model. PREDICTIVE POWER R2

5 If r2 > 30%, then our model will have
Predictive power

6 STEP 1: Construct a scatterplot of the RAW (Original ) Data and note: Its shape The value of the coefficient of determination FIRST: We must decide Which is the INDEPENDENT (x) VARIABLE: Which is the DEPENDENT (y ) VARIABLE We are predicting LIFE EXPECTANCY from GDP, so: X GDP Y LIFE EXPECTANCY

7 CONCLUSION: Data is NON-LINEAR List A = gdp List B = le
lifeex 950 58 1670 65 4250 68 11520 74 12280 73 4170 14300 75 5540 71 9830 72 1680 61 320 67 22260 66 550 50 930 940 64 2670 11220 1420 48 150 41 330 44 520 49 350 180 Life expectancy CONCLUSION: Data is NON-LINEAR

8 From the Home screen determine the value of r2. Value of r2 = 0.3665.

9 CHECK THE CIRCLE OF TRANSFORMATIONS!!
STEP 2: We seek a Transformation to linearize the data. CHECK THE CIRCLE OF TRANSFORMATIONS!! Our scatterplot most closely resembles Quadrant 2! POTENTIALLY SUITABLE TRANSFORMATIONS are: Y2 Logx 1 x Quadrant 2 Quadrant 1 Quadrant 3 Quadrant 4

10 Step 3 Try each of these transformations to determine which one effectively linearizes the data and gives the highest value for r2. In each case, obtain a RESIDUAL PLOT to confirm that the transformed data is linear.

11 R2 = 38.3% List A gdp ( x variable) Y SQUARED TRANSFORMATION List B
lesqu 950 58 3364 1670 65 4225 4250 68 4624 11520 74 5476 12280 73 5329 4170 14300 75 5625 5540 71 5041 9830 72 5184 1680 61 3721 320 67 4489 22260 66 4356 550 50 2500 930 940 64 4096 2670 11220 1420 48 2304 150 41 1681 330 44 1936 520 49 2401 350 180 List B le (y variable) List C lesqu (y transformed variable ) R2 = 38.3% TRANSFORMED DATA APPEARS NON-LINEAR STILL

12 Establish the value of r2 in HOMESCREEN:

13 CONFIRM WITH RESIDUAL PLOT
Remember: to get the correct residual plot use the split screen view. Make sure that the scatterplot at the top has the correct transformed variable. CONCLUSION: The residual plot shows a definite curved pattern, indicating that the transformed data is still not linear. The y2 transformation has NOT succeeded in producing an effective linear model.

14 NEXT STEP…. You guessed it!! Now we try the next potential candidate transformation. It was the log x transformation!

15 (Delete the y2 column, as we have discarded this transformation.)
GDP lifeex logGDP 950 58 2.98 1670 65 3.22 4250 68 3.63 11520 74 4.06 12280 73 4.09 4170 3.62 14300 75 4.16 5540 71 3.74 9830 72 3.99 1680 61 3.23 320 67 2.51 22260 66 4.35 550 50 2.74 930 2.97 940 64 2670 3.43 11220 4.05 1420 48 3.15 150 41 2.18 330 44 2.52 520 2.72 49 350 2.54 List A= gdp List B= le List C= loggdp R2 = 66.0% CONCLUSION: It appears that the log(GDP) transformation has successfully linearized the data! Scatterplot appears linear, and R2 has increased.

16 NOTE THE VARIABLES ARE LISTED HERE SO YOU CAN CHECK

17 The value of r2 has now increased to 66.0%.
Now confirm this by creating a RESIDUAL PLOT for the log(x) transformation. Open a new graphing screen!! CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the transformed data is linear. The value of r2 has now increased to 66.0%. The logx transformation has succeeded in producing an effective linear model for the data with significant predictive power.

18 And now…… Yes you guessed it! We need to check out the reciprocal x transformation, because ….. maybe it will give a higher coefficient of determination than the logx! (here we go again)

19 Don’t delete log x column because we think this model was effective!
List A = gdp List B = le List C = loggdp List D = recgdp Life expectancy list A listB list C list D log(list1) 1/list1 950 58 2.98 1670 65 3.22 4250 68 3.63 11520 74 4.06 12280 73 4.09 4170 3.62 14300 75 4.16 5540 71 3.74 9830 72 3.99 1680 61 3.23 320 67 2.51 22260 66 4.35 550 50 2.74 930 2.97 940 64 2670 3.43 11220 4.05 1420 48 3.15 150 41 2.18 330 44 2.52 520 2.72 49 350 2.54 180 2.26 R2 = 51.5% 1/GNP CONCLUSION: The transformed data appears to be linear, but the value of the coefficient of determination is 51.5%, lower than for the loggdp transformation.

20 Coefficient of determination

21 Remember to create a new graphing screen for the new transformation!!
CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the 1/x transformation has made the data linear.

22 Y squared transformation: Ineffective (did not linearize the data)
OVERALL CONCLUSIONS We have tested three transformations: Y squared transformation: Ineffective (did not linearize the data) Log (x ) transformation: Effective in linearizing data with r2 = 66.0% 1/x transformation: Effective in linearizing data with r2 = 51.5% Based on this regression analysis, we conclude that the log(GDP) transformation provides the best model for making predictions from this data.

23 MAKING A PREDICTION Use your linear regression model to predict the Life Expectancy in a country where the GNP is $8000 gnp le List3 loggnp 950 58 2.98 1670 65 3.22 4250 68 3.63 11520 74 4.06 12280 73 4.09 4170 3.62 14300 75 4.16 5540 71 3.74 9830 72 3.99 1680 61 3.23 320 67 2.51 22260 66 4.35 550 50 2.74 930 2.97 940 64 2670 3.43 11220 4.05 1420 48 3.15 150 41 2.18 330 44 2.52 520 2.72 49 350 2.54 180 2.26 Find the equation of the LEAST SQUARES REGRESSION line for the Log transformation Regression(a+bx) Xlist = log(GNP) Ylist=le a = 14.3 b = 14.5 Life Expectancy =  log(GNP) Life Expectancy = × log(8000) = 70.9 Predicted Life Expectancy = 70.9 years


Download ppt "YOU NEED TO KNOW WHAT THIS MEANS"

Similar presentations


Ads by Google