Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10:00 - 10:50 Mondays, Wednesdays.

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Regression
Advertisements

The Simple Regression Model
Correlation and Linear Regression
Linear Regression and Correlation
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Chapter 15 Correlation and Regression
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.
Stage Screen Row B Gallagher Theater Row R Lecturer’s desk Row A Row B Row C
Stage Screen Row B Gallagher Theater Row R Lecturer’s desk Row A Row B Row C
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Modern Languages Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Modern Languages Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M
BNAD 276: Statistical Inference in Management Spring 2016 Green sheets.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Just one quick favor… Please use your phone or laptop Please take just a minute to complete Course Evaluations online….. Check your for a link or.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Chapter 14 Introduction to Multiple Regression
Regression and Correlation
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Correlation and Simple Linear Regression
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Please hand in Project 4 To your TA.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Modern Languages Projection Booth Screen Stage Lecturer’s desk broken
Physics- atmospheric Sciences (PAS) - Room 201
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2018 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Correlation and Simple Linear Regression
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2018 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Hand in your Homework Assignment.
BNAD 276: Statistical Inference in Management Spring 2016
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Correlation and Simple Linear Regression
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2017 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Simple Linear Regression and Correlation
Lecturer’s desk Projection Booth Screen Screen Harvill 150 renumbered
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2019 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays.
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10:00 - 10:50 Mondays, Wednesdays & Fridays. Welcome

Schedule of readings This Monday Before our fourth and final exam (December 5th) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

By the end of lecture today 12/2/16 Review for Exam 4 No new material Review homeworks Clicker Questions

with a letter somewhere between Homework: No more homework!! Please click in My last name starts with a letter somewhere between A. A – D B. E – L C. M – R D. S – Z

Just one quick favor… Please use your phone or laptop Please take just a minute to complete Course Evaluations online….. Check your email for a link or go to… tceonline.oia.arizona.edu

Correlation - What do we need to define a line If you probably make this much Expenses per year Yearly Income Y-intercept = “a” (also “b0”) Where the line crosses the Y axis Slope = “b” (also “b1”) How steep the line is If you spend this much The predicted variable goes on the “Y” axis and is called the dependent variable The predictor variable goes on the “X” axis and is called the independent variable

Describe relationship Regression line (and equation) r = 0.71 Rory’s Regression: Predicting sales from number of visits (sales calls) Describe relationship Regression line (and equation) r = 0.71 Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Predict using regression line (and regression equation) b = 11.579 (slope) Slope: as sales calls increase by 1, sales should increase by 11.579 Dependent Variable Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems a = 20.526 (intercept) Independent Variable Review

Review 50% is explained so the other 50% has yet to be explained (0.71 > 0.632) Review

Summary Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems Slope: as sales calls increase by one, 11.579 more systems should be sold Review

Regression Analysis – Least Squares Principle When we calculate the regression line we try to: minimize distance between predicted Ys and actual (data) Y points (length of green lines) remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

Homework Review

For each additional hour worked, weekly pay will increase by $6.09 +0.92 positive strong The relationship between the hours worked and weekly pay is a strong positive correlation. This correlation is significant, r(3) = 0.92; p < 0.05 up down 55.286 6.0857 y' = 6.0857x + 55.286 207.43 85.71 .846231 or 84% 84% of the total variance of “weekly pay” is accounted for by “hours worked” For each additional hour worked, weekly pay will increase by $6.09

400 380 360 Wait Time 340 320 300 280 4 5 6 7 8 Number of Operators

No we do not reject the null Critical r = 0.878 No we do not reject the null -.73 negative strong The relationship between wait time and number of operators working is negative and moderate. This correlation is not significant, r(3) = 0.73; n.s. number of operators increase, wait time decreases 458 -18.5 y' = -18.5x + 458 365 seconds 328 seconds .53695 or 54% The proportion of total variance of wait time accounted for by number of operators is 54%. For each additional operator added, wait time will decrease by 18.5 seconds

39 36 33 30 27 24 21 Percent of BAs 45 48 51 54 57 60 63 66 Median Income

Percent of residents with a BA degree Critical r = 0.632 Yes we reject the null Percent of residents with a BA degree 10 8 0.8875 positive strong The relationship between median income and percent of residents with BA degree is strong and positive. This correlation is significant, r(8) = 0.89; p < 0.05. median income goes up so does percent of residents who have a BA degree 3.1819 0.0005 y' = 0.0005x + 3.1819 25% of residents 35% of residents .78766 or 78% The proportion of total variance of % of BAs accounted for by median income is 78%. For each additional $1 in income, percent of BAs increases by .0005

30 27 24 21 18 15 12 Crime Rate 45 48 51 54 57 60 63 66 Median Income

No we do not reject the null Critical r = 0.632 No we do not reject the null Crime Rate 10 8 -0.6293 negative moderate The relationship between crime rate and median income is negative and moderate. This correlation is not significant, r(8) = -0.63; p < n.s. [0.6293 is not bigger than critical of 0.632] . median income goes up, crime rate tends to go down 4662.5 -0.0499 y' = -0.0499x + 4662.5 2,417 thefts 1,418.5 thefts .396 or 40% The proportion of total variance of thefts accounted for by median income is 40%. For each additional $1 in income, thefts go down by .0499

Review Sheet

As variability goes down, it is easier to reject the null decrease narrower As variability goes down, it is easier to reject the null ANOVA

.9918 .4918 .5000 40 z = 52-40 5 z = 2.4 Go to table .4918 Add area Lower half .4918 + .5000 = .9918 also fine: 99.18% .9918 .4918 .5000 z =2.4 40

As variability goes down, it is easier to reject the null decrease narrower As variability goes down, it is easier to reject the null ANOVA 99.18%

Interval True experiment Interval True experiment Income x Education, would allow the most accurate predictions because this relationship has the largest correlation coefficient of 0.85 This would be statistically significant with alpha of 0.05 because p = 0.02 so p < 0.05 NOT statistically significant with alpha of 0.01 because p = 0.02; p is not < .01

IQ x Age, would allow the weakest predictions because this relationship has the smallest correlation coefficient of -0.02 0.91 No because 0.91 is not less than 0.05 Education x Income is the only one that is significant; p = 0.02 so p < 0.05 None because none have p < .01 r2

Standard error of the estimate Because it is a measure of the amount of error in the regression line (average of residuals) 81% because .92 = .81 19% because 100% – 81% =19% The correlation between the heights of mothers and their daughters is moderate, positive and statistically significant, r(28) = 0.60; p< 0.05 36% because .62 = .36 64% because 100% – 36% =64% 75% because 100% – 25% =75% (note: .52 = .25)

0.92 0.922 = .8464 6.0857 .8464 = 0.922 .1536 = 1 - .8464 6.0857 55.286 Y’ = 6.0857x + 55.286 $6.09 residual

r r2 b r2 b a

-1 +1 +1 (or 100%) anything anything +1 (or 100%) anything anything anything anything any positive number Y’ = 6.0857x + 55.286

A deviation score is a difference score from actual score to the mean. A residual is the difference actual score and predicted score (which is mean) Standard deviation is like the mean of deviation scores; standard error of estimate is like mean of residual scores How far away is each score from the regression line (like mean of subgroup)

Do not reject the null hypothesis because observed F is less than one

Please hand in your homework

Today we will be reviewing for the test using clicker questions.

What if we were looking to see if our stop-smoking program affects peoples‘ desire to smoke. What would null hypothesis be? a. Can’t know without knowing the dependent variable b. The program does not work c. The programs works d. Comparing the null and alternative hypothesis Correct

Correct Which of the following is a Type I error: a. We conclude that the program works when it fact it doesn’t b. We conclude that the program works when in fact it does c. We conclude that the program doesn’t work when in fact it does d. We conclude that the program doesn’t work when in fact it doesn’t

What is the null hypothesis of a correlation coefficient. a What is the null hypothesis of a correlation coefficient?  a. It is zero (nothing going on) b. It is less than zero c. It is more than zero d. It equals the computed sample correlation Correct

Let’s try one Winnie found an observed correlation coefficient of 0, what should she conclude? a. Reject the null hypothesis b. Do not reject the null hypothesis c. Not enough info is given Correct

In the regression equation, what does the letter "a" represent. a In the regression equation, what does the letter "a" represent?  a. Y intercept b. Slope of the line c. Any value of the independent variable that is selected d. None of these Correct

Correct Assume the least squares equation is Y’ = 10 + 20X. What does the value of 10 in the equation indicate?  a. Y intercept b. For each unit increased in Y, X increases by 10 c. For each unit increased in X, Y increases by 10 d. None of these . Correct

In the least squares equation,  Y’  = 10 + 20X the value of 20 indicates  a. the Y intercept. b  slope (so for each unit increase in X, Y’ increases by 20). c. slope (so for each unit increase in Y’, X increases by 20). d. none of these. Correct

In the equation Y’ = a + bX, what is Y’. a. Slope of the line b In the equation  Y’  = a + bX, what is  Y’ ?  a. Slope of the line b. Y intercept C. Predicted value of Y, given a specific X value d. Value of Y when X = 0 Correct

According to the Central Limit Theorem, which is false? As n ↑ x will approach µ b. As n ↑ curve will approach normal shape c. As n ↑ curve variability gets larger Correct As n ↑ d.

coefficient of determination = r2 If the coefficient of determination is 0.80, what percent of variation is explained?  a. 20% b. 90% c. 64% d. 80%   Correct coefficient of determination = r2 What percent of variation is not explained?  a. 20% b. 90% c. 64% d. 80%   Correct

Which of the following represents a significant finding: a. p < 0.05 b. t(3) = 0.23; n.s. c. the observed t statistic is nearly zero d. we do not reject the null hypothesis Correct

Correct If r = 1.00, which inference cannot be made? a. The dependent variable can be perfectly predicted by the independent variable b. This provides evidence that the dependent variable is caused by the independent variable c. All of the variation in the dependent variable can be accounted for by the independent variable d. Coefficient of determination is 100%.

Let’s try one In a regression analysis what do we call the variable used to predict the value of another variable?  a. Independent b. Dependent c. Correlation d. Determination Correct

  What can we conclude if the coefficient of determination is 0.94?  a.  r2 = 0.94 b. direction of relationship is positive c.  94% of total variation of one variable is explained by variation in the other variable. d.  Both A and C Correct

Which of the following statements regarding the coefficient of correlation is true?  a. It ranges from -1.0 to +1.0 b. It measures the strength of the relationship between two variables c. A value of 0.00 indicates two variables are not related d. All of these Correct

coefficient of correlation = r coefficient of determination = r2 What does a coefficient of correlation of 0.70 infer? (r = +0.70)  a. Almost no correlation because 0.70 is close to 1.0 b. 70% of the variation in one variable is explained by the other c. Coefficient of determination is 0.49 d. Coefficient of nondetermination is 0.30 Correct coefficient of correlation = r coefficient of determination = r2

  If r = 0.65, what does the coefficient of determination equal?  a. 0.194 b. 0.423 c. 0.577 d. 0.806 Correct

If the coefficient of correlation is 0 If the coefficient of correlation is 0.60, what percent of variation is not explained?  a. 20% b. 90% c. 64% d. 80%   Correct

If the coefficient of determination is 0 If the coefficient of determination is 0.20, what percent of variation is not explained?  a. 20% b. 90% c. 64% d. 80%   Correct

What is the measure that indicates how precise a prediction of Y is based on X or, conversely, how inaccurate the prediction might be?  a. Regression equation b. Slope of the line c. Standard error of estimate d. Least squares principle Correct

Let’s try one Agnes compared the heights of the women’s gymnastics team and the women’s basketball team. If she doubled the number of players measured (but ended up with the same means) what effect would that have on the results? a. the means are the same, so the t-test would yield the same results. b. the means are the same, but the variability would increase so it would be harder to reject the null hypothesis. c. the means are the same, but the variability would decrease so it would be easier to reject the null hypothesis. Correct

Agnes compared the heights of the women’s gymnastics team and the scores they got. If she doubled the number of players measured, but ended up with the same correlation (r) what effect would that have on the results? Let’s try one a. the r is the same, so the conclusion would be the same b. the r is the same, but with more people, degrees of freedom (df) would go up and it would be harder to reject the null hypothesis. c. the r is the same, but with more people, degrees of freedom (df) would go up and it would be easier to reject the null hypothesis. Correct

Standard error of the estimate (line) Correct Which of the following is true about the standard error of estimate?  a. It is a measure of the accuracy of the prediction b. It is based on squared vertical deviations between Y and Y’ c. It cannot be negative d. All of these Correct Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y’ scores differ from Y scores a mean of the lengths of the green lines

Standard error of the estimate (line) Correct If all the plots on a scatter diagram lie on a straight line, (perfect correlation) what is the standard error of estimate?  a. - 1 b. +1 c. 0 d. Infinity Correct Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y’ scores differ from Y scores a mean of the lengths of the green lines

Let’s try one Scatterplot A Scatterplot B Scatterplot C Which of these correlations would be most likely to have the highest positive value for r? a. Scatterplot A b. Scatterplot B c. Scatterplot C d. Can not be determined from the information given Correct

Let’s try one Scatterplot A Scatterplot B Scatterplot C Which of the these scatterplots will have the smallest “y intercept”? a. Scatterplot A b. Scatterplot B c. Scatterplot C d. Can not be determined from the information given Correct

Let’s try one Scatterplot A Scatterplot B Scatterplot C Which of the these correlations would be most likely to represent the correlation between salary and expenses? a. Scatterplot A b. Scatterplot B c. Scatterplot C d. Can not be determined from the information given Correct

Let’s try one Which of the following correlations would allow you the most accurate predictions? a. r = + 0.01 b. r = - 0.10 c. r = + 0.40 d. r = - 0.65 Correct

Let’s try one After duplicate correlations have been discarded and trivial correlations have been ignored, there remain a. two correlations b. three correlations c. six correlations d. nine correlations Correct

Let’s try one Which of the following conclusions can not be made from the data in the matrix? a. There is a significant correlation between Science and Reading b. There is a significant correlation between Math and Reading c. There is a significant correlation between Math and Science Correct

Winnie found an observed t of .04, what should she conclude? Let’s try one Winnie found an observed t of .04, what should she conclude? (Hint: notice that .04 is less than 1) a. Reject the null hypothesis b. Do not reject the null hypothesis c. Not enough info is given correct x small observed t score

Thank you ! Thank you for a wonderful semester! and good luck with your studies See you at the final exam . .