Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays & Fridays. Welcome http://www.youtube.com/watch?v=oSQJP40PcGI http://www.youtube.com/watch?v=oSQJP40PcGI
A note on doodling
Schedule of readings Before our fourth and final exam (May 1st) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
By the end of lecture today 4/17/17 Simple Regression Using correlation for predictions
Homework on class website: Please complete homework worksheet #24 Simple Regression Worksheet Extended due date: Wednesday, April 19th
Lab sessions Everyone will want to be enrolled in one of the lab sessions Project 4
Project 4 - Two Correlations - We will use these to create two regression analyses This lab builds on the work we did in our very first lab. But now we are using the correlation for prediction. This is called regression analysis
Correlation: Independent and dependent variables When used for prediction we refer to the predicted variable as the dependent variable and the predictor variable as the independent variable What are we predicting? What are we predicting? Dependent Variable Dependent Variable Independent Variable Independent Variable Revisit this slide
Correlation - What do we need to define a line If you probably make this much Expenses per year Yearly Income Y-intercept = “a” (also “b0”) Where the line crosses the Y axis Slope = “b” (also “b1”) How steep the line is If you spend this much The predicted variable goes on the “Y” axis and is called the dependent variable The predictor variable goes on the “X” axis and is called the independent variable Revisit this slide
Assumptions Underlying Linear Regression For each value of X, there is a group of Y values These Y values are normally distributed. The means of these normal distributions of Y values all lie on the straight line of regression. The standard deviations of these normal distributions are equal. Revisit this slide
r2 = The proportion of the total variance in one variable that is What is r2? r2 = The proportion of the total variance in one variable that is predictable by its relationship with the other variable Examples If mother’s and daughter’s heights are correlated with an r = .8, then what amount (proportion or percentage) of variance of mother’s height is accounted for by daughter’s height? .64 because (.8)2 = .64
r2 = The proportion of the total variance in one variable that is What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If mother’s and daughter’s heights are correlated with an r = .8, then what proportion of variance of mother’s height is not accounted for by daughter’s height? .36 because (1.0 - .64) = .36 or 36% because 100% - 64% = 36%
If ice cream sales and temperature are correlated with an What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature? .25 because (.5)2 = .25
If ice cream sales and temperature are correlated with an What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature? .75 because (1.0 - .25) = .75 or 75% because 100% - 25% = 75%
Homework Review
For each additional hour worked, weekly pay will increase by $6.09 +0.92 positive strong The relationship between the hours worked and weekly pay is a strong positive correlation. This correlation is significant, r(3) = 0.92; p < 0.05 up down 55.286 6.0857 y' = 6.0857x + 55.286 207.43 85.71 .846231 or 84% 84% of the total variance of “weekly pay” is accounted for by “hours worked” For each additional hour worked, weekly pay will increase by $6.09
400 380 360 Wait Time 340 320 300 280 4 5 6 7 8 Number of Operators
-.73 negative strong 0.878 - No we do not reject the null
-.73 negative strong 0.878 No we do not reject the null The relationship between wait time and number of operators working is negative and moderate. This correlation is not significant, r(3) = 0.73; n.s. number of operators increase, wait time decreases 458 -18.5 y' = -18.5x + 458 365 seconds 328 seconds The proportion of total variance of wait time accounted for by number of operators is 54%. .53695 or 54% For each additional operator added, wait time will decrease by 18.5 seconds
39 36 33 30 27 24 21 Percent of BAs 45 48 51 54 57 60 63 66 Median Income
Percent of residents with a BA degree 10 8 0.8875 positive strong 0.632
Percent of residents with a BA degree 10 8 0.8875 positive strong 0.632 Yes we reject the null The relationship between median income and percent of residents with BA degree is strong and positive. This correlation is significant, r(8) = 0.89; p < 0.05. median income goes up so does percent of residents who have a BA degree 3.1819 0.0005 y' = 0.0005x + 3.1819 25% of residents 35% of residents .78766 or 78% The proportion of total variance of % of BAs accounted for by median income is 78%. For each additional $1 in income, percent of BAs increases by .0005
30 27 24 21 18 15 12 Crime Rate 45 48 51 54 57 60 63 66 Median Income
No we do not reject the null Crime Rate 10 8 -0.6293 negative moderate Critical r = 0.632 No we do not reject the null The relationship between crime rate and median income is negative and moderate. This correlation is not significant, r(8) = -0.63; p < n.s. 0.6293 is not bigger than critical of 0.632 median income goes up, crime rate tends to go down 4662.5 -0.0499 y' = -0.0499x + 4662.5 2,417 thefts 1,418.5 thefts .396 or 40% The proportion of total variance of thefts accounted for by median income is 40%. For each additional $1 in income, thefts go down by .0499
Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least) productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.
Do more sales calls result in more sales made? Regression Example 60 70 0 1 2 3 4 Number of sales calls made systems sold 10 20 30 40 50 Ava Emily Do more sales calls result in more sales made? Isabella Emma Step 1: Draw scatterplot Ethan Step 2: Estimate r Joshua Jacob Dependent Variable Independent Variable
Regression Example Do more sales calls result in more sales made? Step 3: Calculate r Step 4: Is it a significant correlation?
Do more sales calls result in more sales made? Step 4: Is it a significant correlation? n = 10, df = 8 alpha = .05 Observed r is larger than critical r (0.71 > 0.632) therefore we reject the null hypothesis. Yes it is a significant correlation r (8) = 0.71; p < 0.05 Step 3: Calculate r Step 4: Is it a significant correlation?
Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b = 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation What are we predicting?
Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b = 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation
Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b = 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation
Describe relationship Regression line (and equation) r = 0.71 Rory’s Regression: Predicting sales from number of visits (sales calls) Describe relationship Regression line (and equation) r = 0.71 Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Predict using regression line (and regression equation) b = 11.579 (slope) Slope: as sales calls increase by 1, sales should increase by 11.579 Dependent Variable Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems a = 20.526 (intercept) Independent Variable
Regression: Predicting sales You should sell 32.105 systems Step 1: Predict sales for a certain number of sales calls Madison Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Joshua If make one sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(1) Y’ = 32.105 What should you expect from a salesperson who makes 1 calls? They should sell 32.105 systems If they sell more over performing If they sell fewer underperforming
Regression: Predicting sales You should sell 43.684 systems Step 1: Predict sales for a certain number of sales calls Isabella Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Jacob If make two sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(2) Y’ = 43.684 What should you expect from a salesperson who makes 2 calls? They should sell 43.68 systems If they sell more over performing If they sell fewer underperforming
Thank you! See you next time!!