Download presentation
Presentation is loading. Please wait.
Published byBonnie Fay Mason Modified over 9 years ago
2
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill Building 8:00 - 8:50 Mondays, Wednesdays & Fridays.
4
Labs continue this week with Multiple Regression
5
Schedule of readings Before next exam (Monday May 4 th ) Please read chapters 10 – 14 Please read Chapters 17, and 18 in Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
6
Homework due – Wednesday (April 29 th ) On class website: Homework worksheet #22 Completing 6 types of analysis using Excel: Semester Summary On class website: Homework worksheet #22 Completing 6 types of analysis using Excel: Semester Summary Extra Credit Opportunity Discuss resources available on website
7
Pair and share Please share your 3 multiple choice questions that you wrote for today’s homework with two classmates. Each multiple choice question must contain: a person’s name only one correct answer, and 3 incorrect options (for a total of 4 options for each question
8
Pop Quiz – First 5 Questions 2. What is a residual? How would you find it? 1. What is regression used for? Include and example 3. What is Standard Error of the Estimate (How is it related to residuals?) 4. Give one fact about r 2 5. How is regression line like a mean?
9
Writing Assignment - 5 Solutions Regressions are used to take advantage of relationships between variables described in correlations. We choose a value on the independent variable (on x axis) to predict values for the dependent variable (on y axis). 1. What is regression used for? Include and example
10
Writing Assignment - 5 solutions 2. What is a residual? How would you find it? Residuals are the difference between our predicted y (y’) and the actual y data points. Once we choose a value on our independent variable and predict a value for our dependent variable, we look to see how close our prediction was. We are measuring how “wrong” we were, or the amount of “error” for that guess. Y – Y’
11
Writing Assignment - 5 solutions 3. What is Standard Error of the Estimate (How is it related to residuals?) The average length of the residuals The average error of our guess The average length of the green lines The standard deviation of the regression line
12
Writing Assignment - 5 solutions 4. Give one fact about r 2 5. How is regression line like a mean?
13
14-12 Can we predict heating cost? Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace. To investigate, Salisbury's research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January Multiple Linear Regression - Example
15
14-14 The Multiple Regression Equation – Interpreting the Regression Coefficients b 1 = The regression coefficient for mean outside temperature (X 1 ) is -4.583. The coefficient is negative and shows a negative correlation between heating cost and temperature. As the outside temperature increases, the cost to heat the home decreases. The numeric value of the regression coefficient provides more information. If we increase temperature by 1 degree and hold the other two independent variables constant, we can estimate a decrease of $4.583 in monthly heating cost.
16
14-15 The Multiple Regression Equation – Interpreting the Regression Coefficients b 2 = The regression coefficient for mean attic insulation (X 2 ) is -14.831. The coefficient is negative and shows a negative correlation between heating cost and insulation. The more insulation in the attic, the less the cost to heat the home. So the negative sign for this coefficient is logical. For each additional inch of insulation, we expect the cost to heat the home to decline $14.83 per month, regardless of the outside temperature or the age of the furnace.
17
14-16 The Multiple Regression Equation – Interpreting the Regression Coefficients b 3 = The regression coefficient for mean attic insulation (X 3 ) is 6.101 The coefficient is positive and shows a negative correlation between heating cost and insulation. As the age of the furnace goes up, the cost to heat the home increases. Specifically, for each additional year older the furnace is, we expect the cost to increase $6.10 per month.
24
Applying the Model for Estimation What is the estimated heating cost for a home if: the mean outside temperature is 30 degrees, there are 5 inches of insulation in the attic, and the furnace is 10 years old?
25
Multiple regression equations Prediction line Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a Very often we want to select students or employees who have the highest probability of success in our school or company. Andy is an administrator at a paralegal program and he wants to predict the Grade Point Average (GPA) for the incoming class. He thinks these independent variables will be helpful in predicting GPA. High School GPA (X 1 ) SAT - Verbal (X 2 ) SAT - Mathematical (X 3 ) Andy completes a multiple regression analysis and comes up with this regression equation: Y’ = 1.2X 1 +.00163X 2 -.00194X 3 -.411 Y’ = 1.2 gpa +.00163 sat verb -.00194sat math -.411
26
Here comes Victoria, her scores are as follows: High School GPA = 3.81 SAT Verbal = 500 SAT Mathematical = 600 What would we predict her GPA to be in the paralegal program? Y’ = 1.2 (3.81) +.00163 (500) -.00194 (600) -.411 Y’ = 4.572 +.815 - 1.164 -.411 Y’ = 1.2 gpa +.00163 sat verb -.00194sat math -.411 Predict Victor’s GPA, his scores are as follows: High School GPA = 2.63 SAT - Verbal = 469 SAT - Mathematical = 440 Y’ = 1.2 (2.63) +.00163 (469) -.00194 (440) -.411 Y’ = 3.156 +.76447 -.8536 -.411 = 3.812 Y’ = 1.2 gpa +.00163 sat verb -.00194 sat math -.411 We predict Victor will have a GPA of 2.656 = 2.66 Prediction line: Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a Y’ = 1.2X 1 +.00163X 2 -.00194X 3 -.411 We predict Victoria will have a GPA of 3.812
27
Pop Quiz – Second 5 Questions 7. How many dependent variables are in a multiple regression? 6. How does a multiple regression differ from a simple regression? 8. How are “slopes”, “b”s, and “regression coefficients” related? 9.What possible values can each of these coefficients have? r r 2 b 10. What possible values can the standard error of the estimate have?
28
Pop Quiz – Second 5 Solutions 7. How many dependent variables are in a multiple regression? 6. How does a multiple regression differ from a simple regression? 8. How are “slopes”, “b”s, and “regression coefficients” related? 9.What possible values can each of these coefficients have? r r 2 b 10. What possible values can the standard error of the estimate have? Simple regression has one predicted variable (dependent variable) and more than one predictor variables (independent variables). There is only one predicted variable (dependent variable) They are synonyms – they refer to the same concept Can vary from -1.0 to +1.0 Can vary from 0.0 to +1.0 Can be any number Can be any positive number
29
Pop Quiz – Second 5 Questions 6. How does a multiple regression differ from a simple regression? Simple regression has one predicted variable (dependent variable) and more than one predictor variables (independent variables).
30
Pop Quiz – Second 5 Questions 7. How many dependent variables are in a multiple regression? 6. How does a multiple regression differ from a simple regression? Include an example of each 8. How are “slopes”, “b”s, and “regression coefficients” related? 9.What possible values can each of these coefficients have? r r 2 b 10. What possible values can the standard error of the estimate have?
32
19831.93 8196.32 50387 266 804 289.06 8196.32 28.3549 289.0618 Yes – close enough
33
19831.93 289.06 1.96 2.58 19831.93 ± (1.96) (289.06) 19831.93 ± 566.56 19265.37 19265 20397.56 20397 19086.16 19086 20577.70 20577 19831.93 ± (2.58) (289.06) 19831.93 ± 745.77
35
Number of doors 22-doors vs 4-doors Quasi Price of car (dollars) Ratio Between Two-tailed $23,807.14 $20,580.67 alpha = 0.05 3.9677 1.9629 802 p =.000079 The average price for 2-door cars was $23,807.14, while the average price for 4-door cars was $20,580.67. A t-test was conducted and found this to be a significant difference t(802) = 3.9677; p < 0.01
36
Size of the engine 3 4- versus 6- versus 8 cylinder engines Quasi Price of car (dollars) Ratio Between $17,862 $20,081 alpha = 0.05 345.3577 3.006964 2 p =.(17 zeroes, then)69755 The average prices were $17,862, $20,081 and $38.968 for the 4-, 6-, and 8-cylinder engines (respectively). An ANOVA was conducted and found this to be a significant difference F(2,801) = 345; p < 0.01 801 $38,968
37
802.195 The relationship between mileage and car price was -0.14. This is a weak and not significant correlation r(802) = -0.14305;n.s. -0.1431
38
df = 802 b = -0.1725 r = -0.1431 a = 24765 lower higher -0.1725x + 24765 For each additional mile driven (as x goes up by 1), cost of the car goes down by 17.25 cents. The base cost for the car (before taking into account the mileage) is $24,765 (30,000)(-0.1725) + 24765 = $19,590 -0.1431 2 =.0204633 or 2.04% The proportion of total variance for price of car that was accounted for by miles was 2.04%
39
Y’ = cost of car (dollars) mileagesize of car 3145.75 -0.15243 Y’ = 3145.75 + (-0.15243)(mileage) + (4027.67)( car size) 4027.67 yes 15.243 cents $4,027.67
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.