Download presentation
Presentation is loading. Please wait.
1
Welcome to Week 05 College Statistics http://media.dcnews.ro/image/201109/w670/statistics.jpg
2
Two Variables So far we have used only one variable measured on each of our sample subjects What if we had more than one?
3
Two Variables Subject | Height | Weight 1 5’8’’ 170 2 5’3’’ 120 3 6’ 200 4 5’7’’ 250 5 6’2’’ 195
4
Two Variables With two variables measured on each subject, the measurements are “paired” – linked to each other
5
If you calculate means or standard deviations for one variable then the other, the pairing is lost, and you lose information (a bad thing…) Two Variables
6
Relationships Using the pairing, you can determine if there is a relationship between the two variables
8
Relationships The relationship can be seen by graphing the variables in a scatter graph Our brains are programmed to see visual relationships
9
Correlation There are all sorts of relationships, but we define a “correlation” to be a linear relationship
10
a linear relationship between two variables (x,y) Correlation
11
Francis Galton
12
Correlation Karl Pearson’s Correlation Coefficient: r
13
Correlation Correlation coefficient “r”: Measures the strength of the linear relationship between two variables (x,y)
14
Correlation coefficient “r”: Measured by the closeness of the data to a straight line of “best fit” Correlation
15
In real life, you practically never get all of your data points on the line of best fit How close is close enough?
16
Correlation Strong Weak relationship
17
Correlation So… a correlation is actually a measure of
18
Correlation So… a correlation is actually a measure of VARIABILITY!
19
Which has the strongest linear relationship? Correlation PROJECT QUESTION
20
Relationships https://images.sciencedaily.com/2015/05/150513135011_1_900x600.jpg Can be positive…
21
Relationships http://cdn2.momjunction.com/wp-content/uploads/2016/01/Relationship-With-Kids-After-Divorce.jpg
22
Correlation Positive Correlation: as one variable increases, the other also increases
23
Correlation Negative correlation: as one variable increases, the other decreases
24
Correlation Zero correlation: no LINEAR relationship
25
Correlation Zero correlation: there may be a strong nonlinear relationship
26
Correlation Zero correlation: If it is a horizontal line
27
Positive, negative, or zero? Correlation PROJECT QUESTION
28
Correlation Correlation coefficients range from -1.0 to +1.0 r=-1.0 r=-0.9 r=-0.5 r=0.0 r=0.5 r=0.9 r=1.0
29
Correlation Correlation coefficients can’t be bigger than 1 or less than -1 -1 ≤ corr ≤ 1
30
Greeky Stuff The sample correlation coefficient is called “r” The population coefficient is called “ρ ” Pronounced “rho”
31
Which has the highest positive correlation? The highest negative correlation? Correlation PROJECT QUESTION
32
Correlation If you square the correlation coefficient (R 2 or RSQ) the coefficient tells how well the (x,y) values are “fitted” by the line of best fit
33
Correlation R 2 tells you in % how well the trend line fits the data
34
Is it a good fit? Correlation PROJECT QUESTION
35
R 2 = 0.988 means 98.8% of the variability in the data is “explained” by the fit line Correlation
36
Is it a good fit? Correlation PROJECT QUESTION
37
Is it a good fit? Correlation PROJECT QUESTION
38
Is it a good fit? Correlation PROJECT QUESTION
39
Is it a good fit? Correlation PROJECT QUESTION
40
Questions?
41
How to Lie with Statistics #5 Just because two variables have a high correlation coefficient doesn’t mean one causes the other
42
They could both be caused by a third variable Or… it could just be a coincidence How to Lie with Statistics #5
44
How to Lie with Statistics Sleep studies
45
How to Lie with Statistics How would you “prove” that getting a certain amount of sleep causes you to live longer?
46
Correlation vs Causation Causation is hard to prove In research, causation can only be shown by a carefully controlled experiment
47
Correlation vs Causation So you can’t show causation by observation or by using a survey
48
Correlation vs Causation The research must be carefully designed so that the suspected cause can be the only cause of the result
49
Correlation vs Causation Just because a scientist speaks very persuasively about the results of their experiment does not mean he/she has proven causation
50
Correlation vs Causation You have to be skeptical!
51
Correlation vs Causation Spanish influenza video
52
Who “proved” causation? CAUSATION IN-CLASS PROBLEMS
53
Questions?
54
Causation If you know that one variable DOES cause the other, put the variable that is the “causal” or “explanatory” variable on the horizontal “x” axis
55
Causation The causal or explanatory or predictor variable is called the “independent” variable Changing its value changes the value of the y variable
56
Causation The variable affected by the causal variable (the response variable) is called the “dependent” variable Its value depends on the value of x
57
CAUSATION IN-CLASS PROBLEMS Which graph has the correct placement for the causative variable?
58
Questions?
59
Line of Best Fit The “line of best fit” is created by minimizing the total distance of all the points to the line (deviations)
60
Regression The line of best fit is called the “regression” line
61
Regression Because it is a line, it has an equation: y = b + mx m = slope b = y-intercept
62
Regression The slope “m” and the correlation coefficient “r” will both have the same sign
63
Regression R 2 tells how closely the regression line “fits” the data – “goodness of fit”
64
Regression As you can imagine, the calculations for correlation and the regression line are scary
65
Hooray for Excel! Regression
66
Francis Galton
68
Questions?
69
Regression in Excel Edwin Hubble gathered and analyzed data from astronomical objects He used regression to show that the universe is expanding http://www.thefamouspeople.com/profiles/images/edwin-powell-hubble-1.jpg
70
Regression in Excel Let’s take a look!
72
Regression in Excel What do we do first?
73
Regression in Excel What do we do first? GRAPH THE DATA!
74
Regression in Excel
75
Does it look like a straight line would fit the data well?
76
Regression in Excel Now we’re going to go to: Data Data Analysis Regression
77
Regression in Excel They want “y” first (I HATE this…)
78
Regression in Excel Let’s use “distance” for “x” and “velocity” for “y”
79
Regression in Excel Eeek! What’s all this????
80
Regression in Excel Here’s the RSQ. What is the %?
81
Regression in Excel For the trend line, you need:
82
Regression in Excel This (believe it or not) is the equation of the line of best fit!
83
Regression in Excel Line of best fit: y = mx + b
84
Regression in Excel Our equation is: Vel = 505.8409 x Dist + -48.3429
85
Regression in Excel Highlight and copy:
86
Regression in Excel Paste on the “Hubble” page
87
Regression in Excel Add a new column heading: Trend
88
We’re going to calculate our line:
89
Copy it down…
90
Oops! That doesn’t look right!
91
The reference cells are changing for each row We need to make those constant
92
Go back to the first entry Add a $ before the row numbers you want to keep constant
93
Now, copy it down!
94
Much better!
95
Regression in Excel Create a new graph:
96
Regression in Excel Make it purty! To make the trend line a line: change Marker Options to “none” change line color to “solid line”
97
TAH-DAH!
98
What Does It Say?
99
Regression Note: some people use the symbol “ŷ” for the trend data corresponding to the “y” observation data
100
Regression in Excel Summary of Regression Analysis: Data/Data Analysis/Regression Enter “y” first Copy the first two coefficients in the bottom table Trend line is: =Coeff*Data+Intercept Make a graph Change the trend dots into a line
101
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.