Download presentation
Presentation is loading. Please wait.
Published byMyron Cunningham Modified over 6 years ago
1
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays & Fridays. Welcome
2
Schedule of readings This Monday
Before our fourth and final exam (December 5th) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
3
By the end of lecture today 12/2/16
Review for Exam 4 No new material Review homeworks Clicker Questions
4
with a letter somewhere between
Homework: No more homework!! Please click in My last name starts with a letter somewhere between A. A – D B. E – L C. M – R D. S – Z
5
Just one quick favor… Please use your phone or laptop
Please take just a minute to complete Course Evaluations online….. Check your for a link or go to… tceonline.oia.arizona.edu
6
Correlation - What do we need to define a line
If you probably make this much Expenses per year Yearly Income Y-intercept = “a” (also “b0”) Where the line crosses the Y axis Slope = “b” (also “b1”) How steep the line is If you spend this much The predicted variable goes on the “Y” axis and is called the dependent variable The predictor variable goes on the “X” axis and is called the independent variable
7
Describe relationship Regression line (and equation) r = 0.71
Rory’s Regression: Predicting sales from number of visits (sales calls) Describe relationship Regression line (and equation) r = 0.71 Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Predict using regression line (and regression equation) b = (slope) Slope: as sales calls increase by 1, sales should increase by Dependent Variable Intercept: suggests that we can assume each salesperson will sell at least systems a = (intercept) Independent Variable Review
8
Review 50% is explained so the other 50% has yet to be explained
(0.71 > 0.632) Review
9
Summary Intercept: suggests that we can assume each salesperson will sell at least systems Slope: as sales calls increase by one, more systems should be sold Review
10
Regression Analysis – Least Squares Principle
When we calculate the regression line we try to: minimize distance between predicted Ys and actual (data) Y points (length of green lines) remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”
11
Homework Review
12
For each additional hour worked, weekly pay will increase by $6.09
+0.92 positive strong The relationship between the hours worked and weekly pay is a strong positive correlation. This correlation is significant, r(3) = 0.92; p < 0.05 up down 55.286 6.0857 y' = x 207.43 85.71 or 84% 84% of the total variance of “weekly pay” is accounted for by “hours worked” For each additional hour worked, weekly pay will increase by $6.09
13
400 380 360 Wait Time 340 320 300 280 4 5 6 7 8 Number of Operators
14
No we do not reject the null
Critical r = 0.878 No we do not reject the null -.73 negative strong The relationship between wait time and number of operators working is negative and moderate. This correlation is not significant, r(3) = 0.73; n.s. number of operators increase, wait time decreases 458 -18.5 y' = -18.5x + 458 365 seconds 328 seconds or 54% The proportion of total variance of wait time accounted for by number of operators is 54%. For each additional operator added, wait time will decrease by 18.5 seconds
15
39 36 33 30 27 24 21 Percent of BAs Median Income
16
Percent of residents with a BA degree
Critical r = 0.632 Yes we reject the null Percent of residents with a BA degree 10 8 0.8875 positive strong The relationship between median income and percent of residents with BA degree is strong and positive. This correlation is significant, r(8) = 0.89; p < 0.05. median income goes up so does percent of residents who have a BA degree 3.1819 0.0005 y' = x 25% of residents 35% of residents or 78% The proportion of total variance of % of BAs accounted for by median income is 78%. For each additional $1 in income, percent of BAs increases by .0005
17
30 27 24 21 18 15 12 Crime Rate Median Income
18
No we do not reject the null
Critical r = 0.632 No we do not reject the null Crime Rate 10 8 negative moderate The relationship between crime rate and median income is negative and moderate. This correlation is not significant, r(8) = -0.63; p < n.s. [ is not bigger than critical of 0.632] . median income goes up, crime rate tends to go down 4662.5 y' = x 2,417 thefts 1,418.5 thefts .396 or 40% The proportion of total variance of thefts accounted for by median income is 40%. For each additional $1 in income, thefts go down by .0499
19
Review Sheet
20
As variability goes down, it is easier to reject the null
decrease narrower As variability goes down, it is easier to reject the null ANOVA
21
.9918 .4918 .5000 40 z = 52-40 5 z = 2.4 Go to table .4918 Add area
Lower half = .9918 also fine: % .9918 .4918 .5000 z =2.4 40
22
As variability goes down, it is easier to reject the null
decrease narrower As variability goes down, it is easier to reject the null ANOVA 99.18%
23
Interval True experiment
Interval True experiment Income x Education, would allow the most accurate predictions because this relationship has the largest correlation coefficient of 0.85 This would be statistically significant with alpha of 0.05 because p = 0.02 so p < 0.05 NOT statistically significant with alpha of 0.01 because p = 0.02; p is not < .01
24
IQ x Age, would allow the weakest predictions because this relationship has the smallest correlation coefficient of -0.02 0.91 No because 0.91 is not less than 0.05 Education x Income is the only one that is significant; p = 0.02 so p < 0.05 None because none have p < .01 r2
25
Standard error of the estimate
Because it is a measure of the amount of error in the regression line (average of residuals) 81% because .92 = .81 19% because 100% – 81% =19% The correlation between the heights of mothers and their daughters is moderate, positive and statistically significant, r(28) = 0.60; p< 0.05 36% because .62 = .36 64% because 100% – 36% =64% 75% because 100% – 25% =75% (note: .52 = .25)
26
0.92 0.922 = .8464 6.0857 .8464 = 0.922 .1536 = 6.0857 55.286 Y’ = x $6.09 residual
27
r r2 b r2 b a
28
-1 +1 +1 (or 100%) anything anything +1 (or 100%) anything anything anything anything any positive number Y’ = x
29
A deviation score is a difference score
from actual score to the mean. A residual is the difference actual score and predicted score (which is mean) Standard deviation is like the mean of deviation scores; standard error of estimate is like mean of residual scores How far away is each score from the regression line (like mean of subgroup)
30
Do not reject the null hypothesis because observed F is less than one
31
Please hand in your homework
32
Today we will be reviewing for the test using clicker questions.
33
What if we were looking to see if our stop-smoking program affects
peoples‘ desire to smoke. What would null hypothesis be? a. Can’t know without knowing the dependent variable b. The program does not work c. The programs works d. Comparing the null and alternative hypothesis Correct
34
Correct Which of the following is a Type I error: a. We conclude that the program works when it fact it doesn’t b. We conclude that the program works when in fact it does c. We conclude that the program doesn’t work when in fact it does d. We conclude that the program doesn’t work when in fact it doesn’t
35
What is the null hypothesis of a correlation coefficient. a
What is the null hypothesis of a correlation coefficient? a. It is zero (nothing going on) b. It is less than zero c. It is more than zero d. It equals the computed sample correlation Correct
36
Let’s try one Winnie found an observed correlation coefficient of 0, what should she conclude? a. Reject the null hypothesis b. Do not reject the null hypothesis c. Not enough info is given Correct
37
In the regression equation, what does the letter "a" represent. a
In the regression equation, what does the letter "a" represent? a. Y intercept b. Slope of the line c. Any value of the independent variable that is selected d. None of these Correct
38
Correct Assume the least squares equation is Y’ = 10 + 20X.
What does the value of 10 in the equation indicate? a. Y intercept b. For each unit increased in Y, X increases by 10 c. For each unit increased in X, Y increases by 10 d. None of these . Correct
39
In the least squares equation, Y’ = X the value of 20 indicates a. the Y intercept. b slope (so for each unit increase in X, Y’ increases by 20). c. slope (so for each unit increase in Y’, X increases by 20). d. none of these. Correct
40
In the equation Y’ = a + bX, what is Y’. a. Slope of the line b
In the equation Y’ = a + bX, what is Y’ ? a. Slope of the line b. Y intercept C. Predicted value of Y, given a specific X value d. Value of Y when X = 0 Correct
41
According to the Central Limit Theorem, which is false?
As n ↑ x will approach µ b. As n ↑ curve will approach normal shape c. As n ↑ curve variability gets larger Correct As n ↑ d.
42
coefficient of determination = r2
If the coefficient of determination is 0.80, what percent of variation is explained? a. 20% b. 90% c. 64% d. 80% Correct coefficient of determination = r2 What percent of variation is not explained? a. 20% b. 90% c. 64% d. 80% Correct
43
Which of the following represents a significant finding:
a. p < 0.05 b. t(3) = 0.23; n.s. c. the observed t statistic is nearly zero d. we do not reject the null hypothesis Correct
44
Correct If r = 1.00, which inference cannot be made? a. The dependent variable can be perfectly predicted by the independent variable b. This provides evidence that the dependent variable is caused by the independent variable c. All of the variation in the dependent variable can be accounted for by the independent variable d. Coefficient of determination is 100%.
45
Let’s try one In a regression analysis what do we call the variable used to predict the value of another variable? a. Independent b. Dependent c. Correlation d. Determination Correct
46
What can we conclude if the coefficient of determination is 0.94? a. r2 = 0.94 b. direction of relationship is positive c. 94% of total variation of one variable is explained by variation in the other variable. d. Both A and C Correct
47
Which of the following statements regarding the coefficient of correlation
is true? a. It ranges from -1.0 to +1.0 b. It measures the strength of the relationship between two variables c. A value of 0.00 indicates two variables are not related d. All of these Correct
48
coefficient of correlation = r coefficient of determination = r2
What does a coefficient of correlation of 0.70 infer? (r = +0.70) a. Almost no correlation because 0.70 is close to 1.0 b. 70% of the variation in one variable is explained by the other c. Coefficient of determination is 0.49 d. Coefficient of nondetermination is 0.30 Correct coefficient of correlation = r coefficient of determination = r2
49
If r = 0.65, what does the coefficient of determination equal? a. 0.194 b. 0.423 c. 0.577 d. 0.806 Correct
50
If the coefficient of correlation is 0
If the coefficient of correlation is 0.60, what percent of variation is not explained? a. 20% b. 90% c. 64% d. 80% Correct
51
If the coefficient of determination is 0
If the coefficient of determination is 0.20, what percent of variation is not explained? a. 20% b. 90% c. 64% d. 80% Correct
52
What is the measure that indicates how precise a prediction of Y is
based on X or, conversely, how inaccurate the prediction might be? a. Regression equation b. Slope of the line c. Standard error of estimate d. Least squares principle Correct
53
Let’s try one Agnes compared the heights of the women’s gymnastics team and the women’s basketball team. If she doubled the number of players measured (but ended up with the same means) what effect would that have on the results? a. the means are the same, so the t-test would yield the same results. b. the means are the same, but the variability would increase so it would be harder to reject the null hypothesis. c. the means are the same, but the variability would decrease so it would be easier to reject the null hypothesis. Correct
54
Agnes compared the heights of the women’s gymnastics team and the scores they got. If she doubled the number of players measured, but ended up with the same correlation (r) what effect would that have on the results? Let’s try one a. the r is the same, so the conclusion would be the same b. the r is the same, but with more people, degrees of freedom (df) would go up and it would be harder to reject the null hypothesis. c. the r is the same, but with more people, degrees of freedom (df) would go up and it would be easier to reject the null hypothesis. Correct
55
Standard error of the estimate (line) Correct
Which of the following is true about the standard error of estimate? a. It is a measure of the accuracy of the prediction b. It is based on squared vertical deviations between Y and Y’ c. It cannot be negative d. All of these Correct Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y’ scores differ from Y scores a mean of the lengths of the green lines
56
Standard error of the estimate (line) Correct
If all the plots on a scatter diagram lie on a straight line, (perfect correlation) what is the standard error of estimate? a. - 1 b. +1 c. 0 d. Infinity Correct Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y’ scores differ from Y scores a mean of the lengths of the green lines
57
Let’s try one Scatterplot A Scatterplot B Scatterplot C Which of these correlations would be most likely to have the highest positive value for r? a. Scatterplot A b. Scatterplot B c. Scatterplot C d. Can not be determined from the information given Correct
58
Let’s try one Scatterplot A Scatterplot B Scatterplot C Which of the these scatterplots will have the smallest “y intercept”? a. Scatterplot A b. Scatterplot B c. Scatterplot C d. Can not be determined from the information given Correct
59
Let’s try one Scatterplot A Scatterplot B Scatterplot C Which of the these correlations would be most likely to represent the correlation between salary and expenses? a. Scatterplot A b. Scatterplot B c. Scatterplot C d. Can not be determined from the information given Correct
60
Let’s try one Which of the following correlations would allow you the most accurate predictions? a. r = b. r = c. r = d. r = Correct
61
Let’s try one After duplicate correlations have been discarded and trivial correlations have been ignored, there remain a. two correlations b. three correlations c. six correlations d. nine correlations Correct
62
Let’s try one Which of the following conclusions can not be made from the data in the matrix? a. There is a significant correlation between Science and Reading b. There is a significant correlation between Math and Reading c. There is a significant correlation between Math and Science Correct
63
Winnie found an observed t of .04, what should she conclude?
Let’s try one Winnie found an observed t of .04, what should she conclude? (Hint: notice that .04 is less than 1) a. Reject the null hypothesis b. Do not reject the null hypothesis c. Not enough info is given correct x small observed t score
64
Thank you ! Thank you for a wonderful semester!
and good luck with your studies See you at the final exam . .
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.