Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.

Slides:



Advertisements
Similar presentations
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Advertisements

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Stage Screen Row B Gallagher Theater Row R Lecturer’s desk Row A Row B Row C
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
BNAD 276: Statistical Inference in Management Spring 2016 Green sheets.
BNAD 276: Statistical Inference in Management Spring 2016 Green sheets.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Just one quick favor… Please use your phone or laptop Please take just a minute to complete Course Evaluations online….. Check your for a link or.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Please hand in Project 4 To your TA.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Modern Languages Projection Booth Screen Stage Lecturer’s desk broken
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2018 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2018 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Hand in your Homework Assignment.
BNAD 276: Statistical Inference in Management Spring 2016
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Alyson Lecturer’s desk Chris Flo Jun Trey Projection Booth Screen
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Alyson Lecturer’s desk Chris Flo Jun Trey Projection Booth Screen
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
CHAPTER 3 Describing Relationships
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays & Fridays. Welcome http://www.youtube.com/watch?v=oSQJP40PcGI http://www.youtube.com/watch?v=oSQJP40PcGI

A note on doodling

Schedule of readings Before our fourth and final exam (May 1st) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

Homework on class website: No homework due Friday, April 21st

Lab sessions Everyone will want to be enrolled in one of the lab sessions Project 4

By the end of lecture today 4/19/17 Residuals Simple Regression Using correlation for predictions r versus r2 Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) Coefficient of correlation is name for “r” Coefficient of determination is name for “r2” (remember it is always positive – no direction info) Standard error of the estimate is our measure of the variability of the dots around the regression line (average deviation of each data point from the regression line – like standard deviation) Coefficient of regression will “b” for each variable (like slope)

Project 4 - Two Correlations - We will use these to create two regression analyses

Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least) productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold. Review

Regression: Predicting sales You should sell 32.105 systems Step 1: Predict sales for a certain number of sales calls Madison Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Joshua If make one sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(1) Y’ = 32.105 What should you expect from a salesperson who makes 1 calls? They should sell 32.105 systems If they sell more  over performing If they sell fewer  underperforming Review

Regression: Predicting sales You should sell 43.684 systems Step 1: Predict sales for a certain number of sales calls Isabella Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Jacob If make two sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(2) Y’ = 43.684 What should you expect from a salesperson who makes 2 calls? They should sell 43.68 systems If they sell more  over performing If they sell fewer  underperforming

Regression: Predicting sales You should sell 55.263 systems Ava Step 1: Predict sales for a certain number of sales calls Emma Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x If make three sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(3) Y’ = 55.263 What should you expect from a salesperson who makes 3 calls? They should sell 55.263 systems If they sell more  over performing If they sell fewer  underperforming

Regression: Predicting sales You should sell 66.84 systems Step 1: Predict sales for a certain number of sales calls Emily Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x If make four sales calls Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(4) Y’ = 66.842 What should you expect from a salesperson who makes 4 calls? They should sell 66.84 systems If they sell more  over performing If they sell fewer  underperforming

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Ava Emma Isabella Emily Madison What should you expect from each salesperson Joshua Jacob They should sell x systems depending on sales calls If they sell more  over performing If they sell fewer  underperforming

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels 70-55.3=14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Ava 14.7 How did Ava do? Ava sold 14.7 more than expected taking into account how many sales calls she made over performing

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels 20-43.7=-23.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Ava -23.7 How did Jacob do? Jacob sold 23.684 fewer than expected taking into account how many sales calls he made under performing Jacob

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Ava Emma Isabella Emily Madison What should you expect from each salesperson Joshua Jacob They should sell x systems depending on sales calls If they sell more  over performing If they sell fewer  underperforming

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Ava 14.7 Emma Isabella -6.8 Emily Madison -23.7 7.9 Joshua Jacob

No, we are wrong sometimes… Does the prediction line perfectly the predicted variable when using the predictor variable? No, we are wrong sometimes… How can we estimate how much “error” we have? Exactly? Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) 14.7 How would we find our “average residual”? -23.7 The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions

Σ(Y – Y’) = 0 Σ(Y – Y’) Σx N Σ(Y – Y’) Residual scores How do we find the average amount of error in our prediction Ava is 14.7 Jacob is -23.7 Emily is -6.8 Madison is 7.9 The average amount by which actual scores deviate on either side of the predicted score Step 1: Find error for each value (just the residuals) Y – Y’ Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Step 2: Add up the residuals Big problem Σ(Y – Y’) = 0 Square the deviations Σ(Y – Y’) 2 How would we find our “average residual”? N Σx Square root 2 n - 2 Σ(Y – Y’) The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions Divide by df

√ Σx N How do we find the average amount of error in our prediction Deviation scores Diallo is 0” Preston is 2” Mike is -4” Step 1: Find error for each value (just the residuals) Hunter is -2 Y – Y’ Sound familiar?? Step 2: Find average √ ∑(Y – Y’)2 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) n - 2 How would we find our “average residual”? N Σx The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions

These would be helpful to know by heart – please memorize Standard error of the estimate (line) = These would be helpful to know by heart – please memorize these formula

Standard error of the estimate: How well does the prediction line predict the predicted variable when using the predictor variable? Standard error of the estimate (line) What if we want to know the “average deviation score”? Finding the standard error of the estimate (line) Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y’ scores differ from Y scores a mean of the lengths of the green lines Slope doesn’t give “variability” info Intercept doesn’t give “variability” info Correlation “r” does give “variability” info Residuals do give “variability” info

How well does the prediction line predict the Ys from the Xs? A note about curvilinear relationships and patterns of the residuals Residuals Shorter green lines suggest better prediction – smaller error Longer green lines suggest worse prediction – larger error Why are green lines vertical? Remember, we are predicting the variable on the Y axis So, error would be how we are wrong about Y (vertical)

No, we are wrong sometimes… Does the prediction line perfectly the predicted variable when using the predictor variable? No, we are wrong sometimes… How can we estimate how much “error” we have? 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) -23.7 The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions Perfect correlation = +1.00 or -1.00 Each variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line

Regression Analysis – Least Squares Principle When we calculate the regression line we try to: minimize distance between predicted Ys and actual (data) Y points (length of green lines) remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

Is the regression line better than just guessing the mean of the Y variable? How much does the information about the relationship actually help? Which minimizes error better? How much better does the regression line predict the observed results? r2 Wow!

Some useful terms Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) Coefficient of correlation is name for “r” Coefficient of determination is name for “r2” (remember it is always positive – no direction info) Coefficient of regression is name for “b” Residual is found by y – y’ Standard error of the estimate is our measure of the variability of the dots around the regression line (average deviation of each data point from the regression line – like standard deviation)

Describe relationship Regression line (and equation) r = 0.71 Rory’s Regression: Predicting sales from number of visits (sales calls) Describe relationship Regression line (and equation) r = 0.71 Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Predict using regression line (and regression equation) b = 11.579 (slope) Slope: as sales calls increase by 1, sales should increase by 11.579 Dependent Variable Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems a = 20.526 (intercept) Independent Variable Review

Review 50% is explained so the other 50% has yet to be explained (0.71 > 0.632) Review

Summary Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems Slope: as sales calls increase by one, 11.579 more systems should be sold Review

Writing Assignment Worksheet Regressions use correlations to make predictions.

What is regression used for? Include and example Regressions are used to take advantage of relationships between variables described in correlations. We choose a value on the independent variable (on x axis) to predict values for the dependent variable (on y axis).

Writing Assignment Worksheet Regressions use correlations to make predictions. Two numeric variables represented by scatterplots make predictions and not just describe the relationhsip (Y – Y’) Residuals are the difference between our predicted y (y’) and the actual y data points. A positive residual score means that they earned more than expected given the hours that they worked. A negative score means they earned less than expected

What is a residual? How would you find it? Residuals are the difference between our predicted y (y’) and the actual y data points. Once we choose a value on our independent variable and predict a value for our dependent variable, we look to see how close our prediction was. We are measuring how “wrong” we were, or the amount of “error” for that guess. Y – Y’ = residual score

Writing Assignment Worksheet Regressions use correlations to make predictions. Two numeric variables represented by scatterplots make predictions and not just describe the relationhsip Residuals are the difference between our predicted y (y’) and the actual y data points. A positive residual score means that they earned more than expected given the hours that they worked. A negative score means they earned less than expected It is the average of the values for all of the residuals

What is Standard Error of the Estimate (How is it related to residuals The average length of the residuals The average error of our guess The average length of the green lines The standard deviation of the regression line

Writing Assignment Worksheet Regressions use correlations to make predictions. Two numeric variables represented by scatterplots make predictions and not just describe the relationhsip Residuals are the difference between our predicted y (y’) and the actual y data points. A positive residual score means that they earned more than expected given the hours that they worked. A negative score means they earned less than expected It is the average of the values for all of the residuals Square the r value in the annual salary that is due to hours worked is 49% (Note: The value for r in this example is 0.70)

Give one fact about r2 It is called a coefficient of determination It is the square of the value for r It is the proportion of variance of the dependent variable that is accounted for by its relationship with the independent variable

Writing Assignment Worksheet Regressions use correlations to make predictions. Two numeric variables represented by scatterplots make predictions and not just describe the relationhsip Residuals are the difference between our predicted y (y’) and the actual y data points. A positive residual score means that they earned more than expected given the hours that they worked. A negative score means they earned less than expected It is the average of the values for all of the residuals Square the r value in the annual salary that is due to hours worked is 49% (Note: The value for r in this example is 0.70)

Thank you! See you next time!!