Download presentation
Presentation is loading. Please wait.
Published byDylan Shanon Gray Modified over 9 years ago
1
Foundation Statistics Copyright Douglas L. Dean, 2015
2
Types of Variables 1.Input Variables 1.Might explain the outcome variable 2.Output variable 1.Variable you want to explain or predict 2
3
Outline 1.One-way ANOVA –Does a category make a significant difference? 2.Bivariate Statistics –Are numeric variables related to each other? 3.Plotting bivariate relationships and lines of best fit 4.Simple linear regression 3
4
Oneway ANOVA 4
5
Outlier Thresholds 1.5 x the interquartile range Interquartile range
6
Annotated Box-whisker Plot
7
Bivariate descriptive stats Slope (m) Correlation (r) Coefficient of Determination (R 2 ) 7
8
Slope Every straight line can be represented by an equation: y = mx + b The slope ‘m’ describes both the direction and the steepness of the line. 8
9
Slope Tree 9
10
How slope is calculated 10
11
A positive slope example 11 ( 0,1 ) (3,3) ● ●
12
Another example 12 (0,1) (3,4) ● ●
13
Negative Slope example 13 (3,0) (1,4) ● ●
14
Correlation Coefficient The Pearson Correlation Coefficient (r) is a measure of the strength of the linear relationship between two numeric variables. The value of r ranges from -1 to 1
15
Correlation Coefficient Stronger Which correlation is stronger? r = -.80 or r =.80? Neither. They are the same strength.
16
Examples of Perfect Correlation 16
17
Examples of Strong Correlation 17
18
Examples of Weak Correlation 18
19
No Correlation 19 Ways to get r = 0 Pure randomness Perfectly horizontal strait line
20
Correlation Often the first thing we check to see if a relationship with variable may exist Good place to start, not to finish Is a standardized value –Units of measure are factored out –So you can have one variable on a small scale and the related variable on a large scale. No matter the scales, the value of r will be adjusted to be between negative and positive one. 20
21
Limitations of Correlation Correlation measures linear association not causality Correlation is only one important measure of a possible linear relationship Correlations lack statistical control for other possible related variables 21
22
Correlation ≠ Causality The Japanese eat very little fat and drink little red wine and suffer fewer heart attacks than the British or Americans The French eat a lot of fat and drink a lot of red wine and suffer fewer heart attacks than the British or Americans The Germans drink a lot of beer and eat a lot of sausages and fat and suffer fewer heart attacks than the British or Americans. Conclusion: Eat and drink what you like. Apparently it is speaking English that kills you. 22
23
23
24
Ambiguities in causality abound… When Y and Z are correlated, direction of causality might be 24 X Z Or X Z
25
Ambiguities in causality abound… When a correlation exists between Y and Z. Causality might be 25 X Y Z X YZ X Y Z X Y Z
26
Statistical control The purpose of statistical control is to find the degree of association between two variables after removing the effects of other variables. Correlation lacks statistical control Many variables may exert influence on the variable being predicted. You cannot control for multiple influences with r alone Some forms of statistics and data mining methods give you statistical control 26
27
Sign (+ or -) of Basic statistics Slope and r can be positive or negative –Slope and r have the same sign ›If one is positive, so is the other ›If one is negative, so is the other R 2 is always positive 27
28
R-Squared (R 2 ) R 2 is the Coefficient of determination The proportion of the variance in Y attributable to the variance in X (if only one x) or set of X variables if more than one predictor is included in the model. 28
29
Calculation of R 2 29 If only one predictor: R 2 = r 2, r =.80 R 2 =.80 2 =.64 With multiple input variables - The math to calculate R 2 is more complex - Math beyond the scope of this course.
30
Why we need more than just slope If we have the slope, why do we need r and R 2 ? –Relationships are rarely perfectly linear in their ability to predict. –Slopes of “best-fit” lines do not give us a measure of variability in how x and y relate to each other Correlation (r) measures the variability in linear association R 2 measures proportion of variance in y attributable to all of the input variables included in the model 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.