Presentation is loading. Please wait.

Presentation is loading. Please wait.

Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!

Similar presentations


Presentation on theme: "Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!"— Presentation transcript:

1 Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT! http://www.psych.uiuc.edu/~jrfinley/p235/

2 End of Term Business 3rd Graded Assessment next M & W  Same procedure as the last two  If you want to sign up for a specific time please email one of us. Extra Credit Final Requirements  84 hours on ALEKS by 11:59 pm April 30th  You can check how many in class hours you have during the assessment  You must schedule an extra credit final (no walk-ins) by 11:59 May 4th (email is fine)  Asssessment can be scheduled anytime 9-5 on May 8th or May 9th

3 Questions? You master 98% or more Your receive A+ 92% – 97%A 84% – 91%B 75% - 83%C 65% - 74%D < 65%F

4 Studying Relationships between Variables When considering relationships,  An independent variable (X) usually has many quantitative levels  Want to show that the dependant variable is some function of the independent variable  Correlation & Regression

5 Correlation & Regression What’s the difference? Usually, 2 observations for each of N subjects Consider these 2 examples:  running speed in a maze(Y) & number trials to reach criterion(X)  Running speed (Y) & the number of food pellets per reinforcement(X) What’s the difference here? In both, Y is a random variable. In one X is fixed and the other it is random.

6 Difference between correlation & regression When X is fixed variable-- regression When X is random variable -- correlation Note: in practice they are very similar, often differentiated by purpose of research Goal is to predict Y from X: regression Goal is to know degree of relationship: correlation

7 Last Week: Scatter Plots X-axis: predictor variable Y-axis: criterion variable

8 Consider Data: What do you see? Unexpected? As we continue, think about why we might see this pattern What is this line? Regression Line: Y predicted on X Given a value of X, our prediction of what Y is likely to be

9 Correlation Correlation is a measure of the degree to which the data clusters around that line. If all data is on the line r=+1.

10 Correlation Coefficient: r sign: direction of relationship magnitude (number): strength of relationship -1 ≤ r ≤ 1 r=0 is no linear relationship r=-1 is “perfect” negative correlation r=1 is “perfect” positive correlation Notes:  Symmetric Measure (You can exchange X and Y and get the same value)  Measures linear relationship only

11 Example for discussion: Howell(1988) investigated the relationship between stress and health.  Measures:  Stress: a scale to measure frequency, perceived importance, and desirability of life events  Health: Hypkins Symptom Checklist for 57 psychological symptoms

12 What do we always do first in a study? Look at the raw data! What do you notice? unimodal positively skewed few outliers good variability

13 Covariance Correlation Coefficient is based off of covariance Covariance: number that reflects the degree to which two variables vary together Cov XY =∑(X-X)(Y-Y) N - 1

14 Think about it… If there was a positive correlation between these two variables what we expect to find? ∑X = 2297 ∑X 2 = 67489 X = 21467 S x = 13.096 ∑Y = 9705 ∑Y 2 = 923787 Y = 90.701 S y =20.266 ∑XY = 22576 N = 107

15 Think about it… ∑X = 2297 ∑X 2 = 67489 X = 21467 S x = 13.096 ∑Y = 9705 ∑Y 2 = 923787 Y = 90.701 S y =20.266 ∑XY = 22576 N = 107 Cov XY =∑(X-X)(Y-Y) N - 1 or Cov XY =∑XY-(∑X∑Y/N) N - 1 Cov XY =134.301

16 Pearson Correlation Coefficient ( r ) The way we were talking about the covariance suggests that it might be a measure of the relationship between 2 variables But: Covariance is also a function of the standard deviations of X and Y To resolve: r = cov xy S X S Y r= 134.301 =.506 (13.1)(20.3)

17 Last week you learned: Formula: alt. formula (ALEKS): So this is the same formula for r, just in a slightly different form. r is the covariance, adjusted by the standard deviation Also Note: this r is not an unbiased estimate of the correlation coefficient in the population: so there is an adjusted formula for a small sample size (not common)

18 Looking at the data… Equation for straight line: Y = bX + a Y= predicted Y value b = slope a = intercept X= value of predictor value Our task is to solve for a & be to find best-fitting linear function. A logical way to do this is in terms of errors of prediction… So, how far apart are predicted Ys and actual Ys ^ ^

19 Optimal values of a & b a = Y -bX b = cov XY s 2 x In our stress data set: b=.7831 a=73.891 Y=.78*X+73.9 So, we could find a predicted value of Y for any X… And then find the difference between predicted and actual. ^

20 X Y Estimated Regression Line Find (Y-Y) 2 ^

21 X Y Estimated Regression Line =“y hat”: predicted value of Y for X i

22 ALEKS: Y=b 0 +b 1 X b 1 (slope) b 0 (Y intercept) using correlation coefficient

23 Standard Error of Estimate So if we wanted to predict a value of Y from X, we could just plug it into the formula. But! We would expect our predicted Y to vary some from the actual data So we calculate the likely error S 2 Y.X = ∑(Y-Y) = SS residual N-2 N-2 ^

24 Data Example ∑(Y-Y)=0 ∑(Y-Y)=32,388.049 ^^^^ S 2 Y.X = 32388 =309.458 N-2 S Y.X = 17.56

25 Confidence Limits on Y Although standard error of estimate is useful as an overall measure of error, it is not a good estimate of the error in any single prediction If we want to predict Y on S for a NEW member of the population, the error is given by…

26 CI on Y s’ Y.X = s Y.X √(1 + 1 + (X i -X) 2 ) N (N-1)s 2 And then we form the Confidence Interval as we have in the past CI(Y) = Y +- (t  /2 )(s’ Y.X ) In our data: s’ Y.X =17.7 CI(Y) = 81.7 +- 1.9(17.7) 46.6<Y<116.9

27 Hypothesis Testing We know how to calculate r and b (the slope) We may want to test the null hypothesis that the corresponding population parameters are 0.

28 Testing r We can show that when p(rho)=0, r will be normally distributed around 0. t = r√N-2 with df = N-2 √1-r 2 So, in the physician example: r =.506 and N=107 t=6.011 and we reject H 0 : p=0

29 Testing b If you think about it, testing b is very similar to testing b… Right? (slope is nonzero if X&Y are related) b* = underlying pop value of b Standard Error approximated by: sb = s Y.X s X √N-1

30 Testing b t = b- b* = b = (b)(s X )√N-1 s b s Y.X s Y.X s X √N-1 In our data: t=(.7831)(13.096)(√106)/17.56 =6.011 This is the same t value!!

31 CI on b CI(Y) = b +- (t  /2 )* s Y.X s X √N-1

32 ICES Forms


Download ppt "Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!"

Similar presentations


Ads by Google