Correlation & Prediction REVIEW Correlation BivariateDirect/IndirectCause/Effect Strength of relationships (is + stronger than negative?) Coefficient of determination (r 2 ); Predicts what? Linear vs Curvilinear relationships
Inferential Statistics Used to infer sample characteristics to a population
Table 5-2 Variable Classification IndependentDependent Presumed causePresumed effect The antecedentThe consequence Manipulated/measured by researcher Outcome (measured) Predicted fromPredicted to PredictorCriterion XY
Common Statistical Tests Chi-Square Determine association between two nominally scaled variables. Independent t-test Determine differences in one continuous DV between ONLY two groups. Dependent t-test Compare 2 related (paired) groups on one continuous DV. One-Way ANOVA Examine group differences between 1 continuous DV & 1 nominal IV. Can handle more than two groups of data.
What Analysis? IVDVStatistical Test 1 Nominal Chi-Square 1 Nominal (2 groups) 1 continuoust-test 1 Nominal (>2 groups) 1 continuousOne-Way ANOVA
Some Examples Chi-Square Gender and knee injuries in collegiate basketball players (Q angle) Independent t-test Differences in girls and boys (independent groups; mutually exclusive) on PACER laps Dependent t-test Pre and Post measurement of same group or matched pairs (siblings) on number of push-ups completed One-Way ANOVA Major (AT, ES, PETE; IV >2 levels ) and pre-test grade in this class
Norm-Referenced Measurement HPHE 3150 Dr. Ayers
Topics for Discussion Reliability (variance & PPM correlation support reliability & validity) Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability Relevance Degree to which a test pertains to its objectives
Reliability Observed, Error, and True Scores Observed Score = True Score + Error Score ALL scores have true and error portions True scores are impossible to measure
Reliability THIS IS HUGE!!!! Reliability is that proportion of observed score variance that is true score variance TIP: use algebra to move S 2 t to stand alone as shown in formula above (subtract S 2 e from both sides of equation ) S 2 o = S 2 t + S 2 e
Desirable reliability >.80 There is variation in observed, true & error scores Error can be +(↑ observed scores) or –(↓ observed scores) Error scores contribute little to observed variation Error score mean is 0 S 2 o = S 2 t + S 2 e Validity depends on reliability and relevance Observed variance is necessary Generally, longer tests are more reliable (fosters variance)
Table 6-1 Systolic Blood Pressure Recordings for 10 Subjects Subject Observed BP = True BP + Error BP Sum ( ) Mean (M) Variance (S 2 ) ╣ S ╣ Se is square root of S 2 e
Reliability Coefficients Interclass Reliability Correlates 2 trials Intraclass Reliability Correlates >2 trials
Interclass Reliability (Pearson Product Moment) Test Retest (administer test 2x & correlate scores) See Excel document (Norm-ref msmt examples)Excel document Time, fatigue, practice effect Equivalence (create 2 “equivalent” test forms) Odd/Even test items on a single test Addresses most of the test/retest issues Reduces test size 50% (not desirable); longer tests are > reliable Split Halves Spearman-Brown prophecy formula
Index of Reliability The theoretical correlation between observed scores and true scores High I of R = low error Square root of the reliability coefficient If r=.81, I of R=.9 Compared to the Coefficient of Determination: r 2 (shared variance)
I of R vs C of Det. If r=.81 I of R =? C of Det=?
Reliability So What? Find a friend and talk about: 1 thing you “got” today 1 thing you “missed” today; can they help?
Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous) ANOVA (1 nominal variable for 2 + groups, 1 continuous)
Correlation Are two variables related? What happens to Y when X changes? Linear relationship between two variables Quantifies the RELIABILITY & VALIDITY of a test or measurement
Reliability (0-1;.80 + goal) All scores: observed = true + error r xx =S 2 t /S 2 o proportion of observed score variance that is true score variance Interclass reliability coefficients (correlates 2 trials) Test/retest time, fatigue, practice effect Equivalent reduces test length by 50% Split-halves Index of Reliability Tells you what? Related to C of D how?
Standard Error of Measurement RELIABILITY MEASURE S=standard deviation of the test r xx’ =reliability of the test Reflects the degree to which a person's observed score fluctuates as a result of measurement errors
EXAMPLE: Test standard deviation=100r=.84 SEM = =100( .16) =100(.4) =40
SEM is the standard deviation of the measurement errors around an observed score EXAMPLE: Average test score=500SEM=40 68% of all scores should fall between (500+40) 95% of all scores range between: ?
Standard Error of Estimate (reflects accuracy of estimating a score on the criterion measure) VALIDITY MEASURE Standard Error Standard Error of Prediction
Standard Errors both are standard deviations SE of Measurement (reliability) SE of Estimate (criterion-related validity)
Factors Affecting Test Reliability 1)Fatigue ↓ 2)Practice ↑ 3)Subject variability homogeneous ↓, heterogeneous ↑ 4)Time between testing more time= ↓ 5)Circumstances surrounding the testing periods change= ↓ 6)Test difficulty too hard/easy= ↓ 7) Precision of measurement precise= ↑ 8)Environmental conditions change= ↓ SO WHAT? A test must first be reliable to be valid
Validity Types THIS SLIDE IS HUGE!!!! Content-Related Validity (a.k.a., face validity) Should represent knowledge to be learned Criterion for content validity rests w/ interpreter Use “experts” to establish Criterion-Related Validity Test has a statistical relationship w/ trait measured Alternative measures validated w/ criterion measure Concurrent: criterion/alternate measured same time Predictive: criterion measured in future Construct-Related Validity Validates theoretical measures that are unobservable
Methods of Obtaining a Criterion Measure Actual participation (game play) Skills tests, expert judges Perform the criterion (treadmill test) Distance runs, sub-maximal swim, run, cycle Heart disease (developed later in life) Present diet, behaviors, BP, family history Success in grad school GRE scores, UG GPA
Interpreting the “r” you obtain THIS IS HUGE!!!!
Correlation Matrix for Development of a Golf Skill Test (From Green et al., 1987) Playing golf Long puttChip shotPitch shotMiddle distance shot Drive Playing golf 1.00 Long putt Chip shot Pitch shot Middle distance shot Drive What are these? Concurrent Validity coefficients
Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Concurrent Validity coefficients Criterion
Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Reliability coefficients
Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What is this? Objectivity coefficient
Concurrent Validity This square represents variance in performance in a skill (e.g., golf)
Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf)
Concurrent Validity The orange color represents ERROR or unexplained variance in the criterion (e.g., golf) Remember: ↑error = ↓ validity Error
Concurrent Validity ACDB Consider the Concurrent validity of the above 4 possible skills test batteries
Concurrent Validity ACDB Which test battery would you be LEAST likely to use? Why? D – it has the MOST error and requires 4 tests to be administered
Concurrent Validity ACDB Which test battery would you be MOST likely to use? Why? C – it has the LEAST error but it requires 3 tests to be administered
Concurrent Validity ACDB Which test battery would you use if you are limited in time? A or B – requires 1 or 2 tests to be administered but you lose some validity