Download presentation
Presentation is loading. Please wait.
Published byAgatha Parker Modified over 9 years ago
1
Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine 30(4): 1-25, 2000
2
Measurement Error & Reliability Measurement error makes observed value differ from the true value. Reliability refers to reproducibility of values in repeated trials on the same subjects. The purpose is to quantify random error or ‘noise’. The smaller the random error, the better the measure.
3
Measures of Reliability Within-Subject Variation – Affects the precision of estimates of change in the variable of an experiment. – The smaller the within-subject variation, the easier it is to measure change in performance. Change in the Mean has two components: – Random change due to sampling error – Systematic change (learning or training effect) Retest Correlation-- represents how closely one trail matches another trial.
4
Within-Subject Variation Within-subject variation is the random variation in an individual over trials. Given 6 trials of subject 1: 71, 76, 74, 79, 79, 76 The sd of the the within-subject variation is called the standard error of measurement (SEM). The SEM represents the ‘typical error’.
5
‘Typical Error’ To estimate ‘typical error’ use many subjects and a few trials.
6
Computing Typical Error Compute difference scores Compute SD of difference scores Divide SD of difference by Typical Error = 4.1 / Typical Error = 2.9
7
Typical Error as a Percentage For many measures the typical error gets bigger as the value gets bigger. Athlete 1 has a mean & typical error of: 378.6 4.4 Athlete 2 has a mean & typical error of: 453.1 6.1 When the typical error is expressed as a percent of their respective means the values are similar: 1.2 and 1.3% This form of typical error is a Coefficient of Variation. Since it is a dimensionless measure it allows direct comparison of reliability.
8
Change in the Mean The change in the mean as a measure of reliability is has two components: – Random change due to sampling error. – Systematic change due to: learning effects, fatigue, lack of motivation or training effects. Be sure to give the subjects sufficient training to acclimate to the experiment before beginning, to avoid learning effects.
9
Retest Correlation The retest correlation (r) is not as good of measure of reliability as ‘typical error’. The retest r is sensitive to heterogeneity (spread) of values between participants. The ‘typical error’ can be estimated from a sample that isn’t even particularly representative. You cannot compare the reliability of two measures based upon their retest r alone, the retest r can change with a different sample, if the hetergeneity is different.
10
Threshold for a ‘Real Change’ 1.5 to 2.0 times ‘typical error’ represents a real change. Ex: if ‘typical error’ for the sum of 7 skinfolds is 1.6 mm an observed change of at least 2 to 3 mm would indicate a real change. The value of ‘typical error’ must come from a short time period (1-2 days for skinfold), in which there is no change in the subjects between trials.
11
Estimation of Sample Size To use ‘typical error’, the sample duration must be the same as intended study. The ‘typical error’ of the dependent variable represents the noise that obsures the change in the mean from pre to post. Using ‘typical error’ the sample sizes will tend to be unrealistical large. Sample size should be chosen to give adequate precision for an outcome. Precision is defined by 95% confidence intervals. – The range in which the true value is 95% likely to occur.
12
Estimation of Sample Size In a (pre - post) design, statistical theory predicts a confidence limits: t 0.975, n-1 s 2 / n – n is sample size – s is ‘typical error’ – t is t statistic Equating this to the confidence limits representing adequate precision ( d) n = 2(t s / d) 2 = 8 s 2 / d 2
13
Sample Size and Reliability Sample size is proportional the ‘typical error’ squared. Reduce ‘typical error’ and you need fewer subjects. When the ‘typical error’ equals the smallest worthwile effect (s = d) you only need 10 subjects. A test with twice the typical error would require 4 times the subjects.
14
Estimation of Individual Differences Individual differences occur when the response to a treatment differs between subjects. To estimate individual differences (S diff ) S diff = (2s 2 expt - 2s 2 ) where s expt is inflated typical error of experimental group and s is the typical error in control group (or from a reliability study).
15
Acceptable Likely Range for Typical Error 15 sub, 4 trials, typical error 1% True typical error = 1% * 1.24 to 1% 1.24 = 1.24 to 0.81 50 sub, 3 trials reduces the factors to 1.32 - 0.76
16
Analysis of Simple Studies Analysis of reliability with 2 trials is straight forward: compute typical error from difference scores, and the change in the mean is simply the mean difference. For 3 or more trials, check for learning effects by comparing consecutive pairs (trials 1&2, trials 2&3…). Download the spreadsheet from SportSci.Org
17
Excel Reliability (sportsci.org) Typical error = 1.2 / 2 Typical error =.83
18
Intraclass Correlation ICC(3,1) For a retest correlation measure of reliability, the ICC (3,1) [Shrout & Fleiss] is unbiased for any sample size. Use of ICC is appropriate with more than 2 trials. To caluclate ‘typical error’ from ICC: s = S (1 - r), where s is typical error and S is the ave sd for subjects in each trial and r is the ICC.
19
Reliability Between Different Equipment, Methods, Installations Use ICC (2,1) when retesting subjects on different equipment, methods or installations. The ICC (2,1) is derived from the fully- random model, where subjects and trials are considered as random effects. Researchers have often misapplied the ICC (2,1) to data from a single item of equipment.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.