Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays & Fridays. Welcome http://www.youtube.com/watch?v=oSQJP40PcGI http://www.youtube.com/watch?v=oSQJP40PcGI
A note on doodling
Schedule of readings Before our fourth and final exam (May 1st) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
Homework on class website: Please complete homework worksheets #25 and #26 Multiple Regression Worksheet and Test Review Due: Wednesday, April 26th
Test review and tutoring Lab sessions Everyone will want to be enrolled in one of the lab sessions Optional Test review and tutoring
By the end of lecture today 4/24/17 Multiple Regression Using multiple predictor variables (independent) to make predictions about the predicted variable (dependent) More than one coefficient of regression (also called “b”s or slopes)
Multiple Linear Regression - Example Can we predict heating cost? Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace. To investigate, Salisbury's research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January
Multiple Linear Regression - Example
The Multiple Regression Equation – Interpreting the Regression Coefficients b1 = The regression coefficient for mean outside temperature (X1) is -4.583. The coefficient is negative and shows a negative correlation between heating cost and temperature. As the outside temperature increases, the cost to heat the home decreases. The numeric value of the regression coefficient provides more information. If we increase temperature by 1 degree and hold the other two independent variables constant, we can estimate a decrease of $4.583 in monthly heating cost.
The Multiple Regression Equation – Interpreting the Regression Coefficients b2 = The regression coefficient for mean attic insulation (X2) is -14.831. The coefficient is negative and shows a negative correlation between heating cost and insulation. The more insulation in the attic, the less the cost to heat the home. So the negative sign for this coefficient is logical. For each additional inch of insulation, we expect the cost to heat the home to decline $14.83 per month, regardless of the outside temperature or the age of the furnace.
The Multiple Regression Equation – Interpreting the Regression Coefficients b3 = The regression coefficient for mean attic insulation (X3) is 6.101 The coefficient is positive and shows a negative correlation between heating cost and insulation. As the age of the furnace goes up, the cost to heat the home increases. Specifically, for each additional year older the furnace is, we expect the cost to increase $6.10 per month.
Applying the Model for Estimation What is the estimated heating cost for a home if: the mean outside temperature is 30 degrees, there are 5 inches of insulation in the attic, and the furnace is 10 years old?
Preview Homework
r(18) = - 0.50 r(18) = - 0.40 r(18) = + 0.60 r(18) = - 0.811508835 500 400 300 200 100 500 400 300 200 100 500 400 300 200 100 Heating Cost Heating Cost Heating Cost 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Average Temperature Insulation Age of Furnace r(18) = - 0.50 r(18) = - 0.40 r(18) = + 0.60 r(18) = - 0.811508835 r(18) = - 0.257101335 r(18) = + 0.536727562
r(18) = - 0.50 r(18) = - 0.40 r(18) = + 0.60 r(18) = - 0.811508835 500 400 300 200 100 500 400 300 200 100 500 400 300 200 100 Heating Cost Heating Cost Heating Cost 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Average Temperature Insulation Age of Furnace r(18) = - 0.50 r(18) = - 0.40 r(18) = + 0.60 r(18) = - 0.811508835 r(18) = - 0.257101335 r(18) = + 0.536727562
+ 427.19 - 4.5827 -14.8308 + 6.1010 Y’ = 427.19 - 4.5827 x1 - 14.8308 x2 + 6.1010 x3
+ 427.19 - 4.5827 -14.8308 + 6.1010 Y’ = 427.19 - 4.5827 x1 - 14.8308 x2 + 6.1010 x3
+ 427.19 - 4.5827 -14.8308 + 6.1010 Y’ = 427.19 - 4.5827 x1 - 14.8308 x2 + 6.1010 x3
+ 427.19 - 4.5827 -14.8308 + 6.1010 Y’ = 427.19 - 4.5827 x1 - 14.8308 x2 + 6.1010 x3
+ 427.19 - 4.5827 -14.8308 + 6.1010 Y’ = 427.19 - 4.5827 x1 - 14.8308 x2 + 6.1010 x3
4.58 14.83 6.10 Y’ = 427.19 - 4.5827(30) -14.8308 (5) +6.1010 (10) Y’ = 427.19 - 137.481 - 74.154 + 61.010 = $ 276.56 = $ 276.56 Calculate the predicted heating cost using the new value for the age of the furnace Use the regression coefficient for the furnace ($6.10), to estimate the change
These differ by only one year but heating cost changed by $6.10 4.58 14.83 6.10 Y’ = 427.19 - 4.5827(30) -14.8308 (5) +6.1010 (10) Y’ = 427.19 - 137.481 - 74.154 + 61.010 = $ 276.56 Y’ = 427.19 - 4.5827(30) -14.8308 (5) +6.1010 (10) These differ by only one year but heating cost changed by $6.10 282.66 – 276.56 = 6.10 Y’ = 427.19 - 137.481 - 74.154 + 61.010 = $ 276.56 = $ 276.56 $ 276.56 Y’ = 427.19 - 4.5827(30) -14.8308 (5) +6.1010 (11) Y’ = 427.19 - 137.481 - 74.154 + 67.111 = $ 282.66 Calculate the predicted heating cost using the new value for the age of the furnace Use the regression coefficient for the furnace ($6.10), to estimate the change
4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
- 0 .41107 No
- 0 .41107 No + 1.2013 Yes
- 0 .41107 No + 1.2013 Yes 0.0016 No
- 0 .41107 No + 1.2013 Yes 0.0016 No - 0 .0019 No
- 0 .41107 No + 1.2013 Yes 0.0016 No - 0 .0019 No High School GPA
- 0 .41107 No + 1.2013 Yes 0.0016 No - 0 .0019 No High School GPA Y’ = - 0 .41107 + 1.2013 x1 + 0 .0016 x2 - 0 .0019 x3
1.201 .0016 .0019 Y’ = - 0 .41107 + 1.2013 x1 + 0 .0016 x2 - 0 .0019 x3 Y’ = - 0 .411 + 1.2013 (2.8) + 0.0016 (430) - 0 .0019 (460) = 2.76 2.76
1.201 .0016 .0019 Y’ = - 0 .41107 + 1.2013 x1 + 0 .0016 x2 - 0 .0019 x3 Y’ = - 0 .411 + 1.2013 (3.8) + 0 .0016 (430) - 0 .0019 (460) = 3.96 3.96
1.201 .0016 .0019 2.76 3.96 3.96 - 2.76 = 1.2 Yes, use the regression coefficient for the HS GPA (1.2), to estimate the change
Let’s try one When using hypothesis testing for correlation, what is our null hypothesis? There is no relationship between the variables (r = 0) There is a relationship between the variables (r ≠ 0) Not enough info is given Correct
Let’s try one When using hypothesis testing for correlation, if we reject the null, what are we concluding? There is no relationship between the variables (r = 0) There is a relationship between the variables (r ≠ 0) Not enough info is given Correct
Let’s try one Winnie found an observed correlation coefficient of 0, what should she conclude? a. Reject the null hypothesis b. Do not reject the null hypothesis c. Not enough info is given Correct
Y’ = a + bx1 + bx2 + bx3 + bx4 Correct In the regression equation, what does the letter "a" represent? a. Y intercept b. Slope of the line c. Any value of the independent variable that is selected d. None of these Correct Y’ = a + bx1 + bx2 + bx3 + bx4
Correct Assume the least squares equation is Y’ = 10 + 20X. What does the value of 10 in the equation indicate? a. Y intercept b. For each unit increased in Y, X increases by 10 c. For each unit increased in X, Y increases by 10 d. None of these . Correct
In the least squares equation, Y’ = 10 + 20X the value of 20 indicates a. the Y intercept. b slope (so for each unit increase in X, Y’ increases by 20). c. slope (so for each unit increase in Y’, X increases by 20). d. none of these. Correct
In the equation Y’ = a + bX, what is Y’. a. Slope of the line b In the equation Y’ = a + bX, what is Y’ ? a. Slope of the line b. Y intercept C. Predicted value of Y, given a specific X value d. Value of Y when X = 0 Correct
Y’ = a + bx1 + bx2 + bx3 + bx4 Correct If there are four independent variables in a multiple regression equation, there are also four a. Y-intercepts (a). b. regression coefficients (slopes or bs). c. dependent variables. d. constant terms (k). Correct Y’ = a + bx1 + bx2 + bx3 + bx4
According to the Central Limit Theorem, which is false? As n ↑ x will approach µ b. As n ↑ curve will approach normal shape c. As n ↑ curve variability gets larger Correct As n ↑ d.
Thank you! See you next time!!