Download presentation
Presentation is loading. Please wait.
Published byPrimrose Bates Modified over 9 years ago
2
Correlation Coefficients Pearson’s Product Moment Correlation Coefficient interval or ratio data only What about ordinal data?
3
Spearman’s Rank Correlation Coefficient r s = 1 - di2di2 i=1 i=n n 3 - n 6
4
http://www.mnstate.edu/wasson/ed602spearcorr.htm Spearman’s Rank Correlation Coefficient: Example
6
A Significance Test for r s SE r s = 1 n -1 t test = rsrs SE r s = r s n -1 df = n - 1
7
http://www.mnstate.edu/wasson/ed602spearcorr.htm Spearman’s Rank Correlation Coefficient: Example
8
Pearson’s r - Assumptions 1.Interval or ratio scale data 2.Selected randomly 3.Linear 4.Joint bivariate normal distribution S-Plus (qqnorm)
9
Spearman’s Rank Correlation Coefficient Ordinal data already in a ranked form Interval or ratio data convert it to rankings
10
Spearman’s Rank Correlation Coefficient TVDI (x) 0.274 0.542 0.419 0.286 0.374 0.489 0.623 0.506 0.768 0.725 Rank (x) 1 7 4 2 3 5 8 6 10 9 Theta (y) 0.414 0.359 0.396 0.458 0.350 0.357 0.255 0.189 0.171 0.119 Rank (y) 9 7 8 10 5 6 4 3 2 1 Difference (d i ) -8 0 -4 -8 -2 4 3 8
11
A Significance Test for r s
12
S-Plus http://www.mnstate.edu/wasson/ed602spearcorr.htm
13
TVDI (x) 0.274 0.542 0.419 0.286 0.374 0.489 0.623 0.506 0.768 0.725 Theta (y) 0.414 0.359 0.396 0.458 0.350 0.357 0.255 0.189 0.171 0.119
14
Correlation Direction & Strength We might wish to go a little further Rate of change Predictability Correlation Regression
15
Deterministic perfect knowledge Probabilistic estimate not with absolute accuracy (or certainty) Two Sorts of Bivariate Relationships
16
Travel at a constant speed Deterministic time spent driving vs. distance traveled A Deterministic Relationship s = s 0 + vt s: distance traveled s 0 : initial distance v: speed t: time traveled time (t) distance (s) slope (v) intercept (s 0 ) Truly deterministic rare
17
More often probabilistic e.g., ages vs. heights (2 – 20 yrs) A Probabilistic Relationship age (years) height (meters) Good relationship Unpredictability or error
18
Sampling and Regression Our expectation (less than perfect) Collecting data measurement errors height Other factors (not accounted for in the model) plant growth vs. T
19
Simple vs. Multiple Regression Simple linear regression y x Multiple linear regression y x 1, x 2, … x n
20
Model y = a + bx + e Simple Linear Regression x: independent variable y: dependent variable b: slope a: intercept e: error term x (independent) y (dependent) b a error:
21
Scatterplot fitting a line Fitting a Line to a Set of Points x (independent) y (dependent) Least squares method Minimize the error term e
22
Sampling and Regression Sampled data model y = a + bx + e Attempt to estimate a “true” regression line y = + x + Multiple samples several similar regression lines the population regression line
23
Minimize the error term e The line of best fit ŷ = a + b Least Squares Method y ŷ = a + bx ŷ (y - ŷ)
24
Estimates and Residuals Errors e = y – ŷ Residuals Underestimate Overestimate
25
Errors (residuals) e = (y - ŷ) Overall error Simply sum these error terms 0 Square the differences and then sum them up to create a useful estimate Minimizing the Error Term SSE = (y - ŷ) 2 i = 1 n
26
Minimizing the SSE (y - ŷ) 2 i = 1 n min a,b n (y i - a - bx i ) 2 i = 1 min a,b =
27
Least squares method Finding Regression Coefficients (x i - x) (y i - y) i = 1 n b = (x i - x) 2 i = 1 n a = y - bx
28
Interpreting Slope (b) Slope of the line (b the change in y due to a unit change in x b > 0 b < 0
29
Regression Slope and Correlation (x i - x)(y i - y) i=1 i=n (n - 1) s X s Y r = (x i - x) (y i - y) i = 1 n b = (x i - x) 2 i = 1 n b = r sysy sxsx
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.