Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to Week 05 College Statistics

Similar presentations


Presentation on theme: "Welcome to Week 05 College Statistics"— Presentation transcript:

1 Welcome to Week 05 College Statistics http://media.dcnews.ro/image/201109/w670/statistics.jpg

2 Two Variables So far we have used only one variable measured on each of our sample subjects What if we had more than one?

3 Two Variables Subject | Height | Weight 1 5’8’’ 170 2 5’3’’ 120 3 6’ 200 4 5’7’’ 250 5 6’2’’ 195

4 Two Variables With two variables measured on each subject, the measurements are “paired” – linked to each other

5 If you calculate means or standard deviations for one variable then the other, the pairing is lost, and you lose information (a bad thing…) Two Variables

6 Relationships Using the pairing, you can determine if there is a relationship between the two variables

7

8 Relationships The relationship can be seen by graphing the variables in a scatter graph Our brains are programmed to see visual relationships

9 Correlation There are all sorts of relationships, but we define a “correlation” to be a linear relationship

10 a linear relationship between two variables (x,y) Correlation

11 Francis Galton

12 Correlation Karl Pearson’s Correlation Coefficient: r

13 Correlation Correlation coefficient “r”: Measures the strength of the linear relationship between two variables (x,y)

14 Correlation coefficient “r”: Measured by the closeness of the data to a straight line of “best fit” Correlation

15 In real life, you practically never get all of your data points on the line of best fit How close is close enough?

16 Correlation Strong Weak relationship

17 Correlation So… a correlation is actually a measure of

18 Correlation So… a correlation is actually a measure of VARIABILITY!

19 Which has the strongest linear relationship? Correlation PROJECT QUESTION

20 Relationships https://images.sciencedaily.com/2015/05/150513135011_1_900x600.jpg Can be positive…

21 Relationships http://cdn2.momjunction.com/wp-content/uploads/2016/01/Relationship-With-Kids-After-Divorce.jpg

22 Correlation Positive Correlation: as one variable increases, the other also increases

23 Correlation Negative correlation: as one variable increases, the other decreases

24 Correlation Zero correlation: no LINEAR relationship

25 Correlation Zero correlation: there may be a strong nonlinear relationship

26 Correlation Zero correlation: If it is a horizontal line

27 Positive, negative, or zero? Correlation PROJECT QUESTION

28 Correlation Correlation coefficients range from -1.0 to +1.0 r=-1.0 r=-0.9 r=-0.5 r=0.0 r=0.5 r=0.9 r=1.0

29 Correlation Correlation coefficients can’t be bigger than 1 or less than -1 -1 ≤ corr ≤ 1

30 Greeky Stuff The sample correlation coefficient is called “r” The population coefficient is called “ρ ” Pronounced “rho”

31 Which has the highest positive correlation? The highest negative correlation? Correlation PROJECT QUESTION

32 Correlation If you square the correlation coefficient (R 2 or RSQ) the coefficient tells how well the (x,y) values are “fitted” by the line of best fit

33 Correlation R 2 tells you in % how well the trend line fits the data

34 Is it a good fit? Correlation PROJECT QUESTION

35 R 2 = 0.988 means 98.8% of the variability in the data is “explained” by the fit line Correlation

36 Is it a good fit? Correlation PROJECT QUESTION

37 Is it a good fit? Correlation PROJECT QUESTION

38 Is it a good fit? Correlation PROJECT QUESTION

39 Is it a good fit? Correlation PROJECT QUESTION

40 Questions?

41 How to Lie with Statistics #5 Just because two variables have a high correlation coefficient doesn’t mean one causes the other

42 They could both be caused by a third variable Or… it could just be a coincidence How to Lie with Statistics #5

43

44 How to Lie with Statistics Sleep studies

45 How to Lie with Statistics How would you “prove” that getting a certain amount of sleep causes you to live longer?

46 Correlation vs Causation Causation is hard to prove In research, causation can only be shown by a carefully controlled experiment

47 Correlation vs Causation So you can’t show causation by observation or by using a survey

48 Correlation vs Causation The research must be carefully designed so that the suspected cause can be the only cause of the result

49 Correlation vs Causation Just because a scientist speaks very persuasively about the results of their experiment does not mean he/she has proven causation

50 Correlation vs Causation You have to be skeptical!

51 Correlation vs Causation Spanish influenza video

52 Who “proved” causation? CAUSATION IN-CLASS PROBLEMS

53 Questions?

54 Causation If you know that one variable DOES cause the other, put the variable that is the “causal” or “explanatory” variable on the horizontal “x” axis

55 Causation The causal or explanatory or predictor variable is called the “independent” variable Changing its value changes the value of the y variable

56 Causation The variable affected by the causal variable (the response variable) is called the “dependent” variable Its value depends on the value of x

57 CAUSATION IN-CLASS PROBLEMS Which graph has the correct placement for the causative variable?

58 Questions?

59 Line of Best Fit The “line of best fit” is created by minimizing the total distance of all the points to the line (deviations)

60 Regression The line of best fit is called the “regression” line

61 Regression Because it is a line, it has an equation: y = b + mx m = slope b = y-intercept

62 Regression The slope “m” and the correlation coefficient “r” will both have the same sign

63 Regression R 2 tells how closely the regression line “fits” the data – “goodness of fit”

64 Regression As you can imagine, the calculations for correlation and the regression line are scary

65 Hooray for Excel! Regression

66 Francis Galton

67

68 Questions?

69 Regression in Excel Edwin Hubble gathered and analyzed data from astronomical objects He used regression to show that the universe is expanding http://www.thefamouspeople.com/profiles/images/edwin-powell-hubble-1.jpg

70 Regression in Excel Let’s take a look!

71

72 Regression in Excel What do we do first?

73 Regression in Excel What do we do first? GRAPH THE DATA!

74 Regression in Excel

75 Does it look like a straight line would fit the data well?

76 Regression in Excel Now we’re going to go to: Data Data Analysis Regression

77 Regression in Excel They want “y” first (I HATE this…)

78 Regression in Excel Let’s use “distance” for “x” and “velocity” for “y”

79 Regression in Excel Eeek! What’s all this????

80 Regression in Excel Here’s the RSQ. What is the %?

81 Regression in Excel For the trend line, you need:

82 Regression in Excel This (believe it or not) is the equation of the line of best fit!

83 Regression in Excel Line of best fit: y = mx + b

84 Regression in Excel Our equation is: Vel = 505.8409 x Dist + -48.3429

85 Regression in Excel Highlight and copy:

86 Regression in Excel Paste on the “Hubble” page

87 Regression in Excel Add a new column heading: Trend

88 We’re going to calculate our line:

89 Copy it down…

90 Oops! That doesn’t look right!

91 The reference cells are changing for each row We need to make those constant

92 Go back to the first entry Add a $ before the row numbers you want to keep constant

93 Now, copy it down!

94 Much better!

95 Regression in Excel Create a new graph:

96 Regression in Excel Make it purty! To make the trend line a line: change Marker Options to “none” change line color to “solid line”

97 TAH-DAH!

98 What Does It Say?

99 Regression Note: some people use the symbol “ŷ” for the trend data corresponding to the “y” observation data

100 Regression in Excel Summary of Regression Analysis: Data/Data Analysis/Regression Enter “y” first Copy the first two coefficients in the bottom table Trend line is: =Coeff*Data+Intercept Make a graph Change the trend dots into a line

101 Questions?


Download ppt "Welcome to Week 05 College Statistics"

Similar presentations


Ads by Google