Client Assessment and Other New Uses of Reliability Will G Hopkins Physiology and Physical Education University of Otago, Dunedin NZ Reliability: the Essentials.

Slides:



Advertisements
Similar presentations
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Advertisements

The t Test for Two Independent Samples
Copyright © 2011 by Pearson Education, Inc. All rights reserved Statistics for the Behavioral and Social Sciences: A Brief Course Fifth Edition Arthur.
Research Methods in Politics Chapter 14 1 Research Methods in Politics 14 Understanding Inferential Statistics.
Lecture 8: Hypothesis Testing
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Quantitative Data Analysis
A Spreadsheet for Analysis of Straightforward Controlled Trials
Research Skills Workshop Designing a Project
Covariates in Repeated-Measures Analyses Repeated Measures What change has occurred (in response to a treatment)? Mechanism Variables How much of the change.
Statistical vs Clinical or Practical Significance
Planning, Performing, and Publishing Research with Confidence Limits
Statistical vs Clinical Significance
Validity and Reliability
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Meta-Analyses: Appropriate Growth Will G Hopkins Faculty of Health Science AUT University, Auckland, NZ Resources: Cochrane Reviewers Handbook (2006) at.
Development and Implementation of a Recovery-Based System: Comparison of Instruments for Assessing Recovery Jeanette M. Jerrell, Ph.D. Professor of Neuropsychiatry,
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Quantitative Methods Lecture 3
The basics for simulations
Inferential Statistics and t - tests
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Experimental Measurements and their Uncertainties
Confidence Intervals Objectives: Students should know how to calculate a standard error, given a sample mean, standard deviation, and sample size Students.
Statistical Sampling.
Statistical vs. Practical Significance
Type I & Type II errors Brian Yuen 18 June 2013.
Understanding p-values Annie Herbert Medical Statistician Research and Development Support Unit
Chapter 15 ANOVA.
Module 16: One-sample t-tests and Confidence Intervals
Before Between After.
Putting Statistics to Work
CHAPTER 14: Confidence Intervals: The Basics
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Issues About Statistical Inference Dr R.M. Pandey Additional Professor Department of Biostatistics All-India Institute of Medical Sciences New Delhi.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Today Concepts underlying inferential statistics
Chapter 14 Inferential Data Analysis
Are the results valid? Was the validity of the included studies appraised?
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Unit 1 Accuracy & Precision.  Data (Singular: datum or “a data point”): The information collected in an experiment. Can be numbers (quantitative) or.
 If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine 30(4): 1-25, 2000.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Chapter 13 Understanding research results: statistical inference.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter 9 Introduction to the t Statistic
Statistical analysis.
Statistical analysis.
Validity and Reliability
Null Hypothesis Testing
Power, Sample Size, & Effect Size:
Comparing Populations
Psych 231: Research Methods in Psychology
What are their purposes? What kinds?
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Sample Sizes for IE Power Calculations.
Presentation transcript:

Client Assessment and Other New Uses of Reliability Will G Hopkins Physiology and Physical Education University of Otago, Dunedin NZ Reliability: the Essentials Assessment of Individual Clients and Patients Estimation of Sample Size for Experiments Estimation of Individual Responses to a Treatment Part of a mini- symposium presented at the Annual Meeting of the American College of Sports Medicine in Baltimore, May 31, 2001

Reliability: the Essentials Reliability is reproducibility of a measurement if or when you repeat the measurement. It's crucial for clinicians … because you need good reproducibility to monitor small but clinically important changes in an individual patient or client. It's crucial for researchers … because you need good reproducibility to quantify such changes in controlled trials with samples of reasonable size.

Reliability: the Essentials How do we quantify reliability ? Easy to understand for one subject tested many times : Chris Subject 76 Trial 2 72 Trial 1 74 Trial 3 79 Trial 4 79 Trial 5 77 Trial ± 2.8 Mean ± SD The 2.8 is the standard error of measurement. I call it the typical error, because it's the typical difference between the subject's true value and the observed values. It's the random error or noise in our assessment of clients and in our experimental studies. Strictly, this standard deviation of a subject's values is the total error of measurement.

Reliability: the Essentials We usually measure reliability with many subjects tested a few times : Chris Subject 7672 Trial 2Trial 1 4 Trial 2-1 The 3.4 divided by 2 is the typical error. The 3.4 multiplied by ±1.96 are the limits of agreement. The 2.6 is the change in the mean. Jo Kelly60 Pat8284 Sam Mean ± SD: 2.6 ± 3.4

Reliability: the Essentials And we can define retest correlations : Pearson (for two trials) and intraclass (two or more trials) Trial 2 Trial 1 Intraclass r = 0.95 Pearson r = 0.96

Assessment of Individual Clients and Patients When you test or retest an individual, take account of relative magnitudes of signal and noise. The signal is what you are trying to measure. It's the smallest clinically or practically important change (within the individual) or difference (between two individuals or between an individual and a criterion value). Rarely it's larger changes or differences.

Assessment of Individual Clients and Patients The noise is the typical error of measurement. It needs to come from a reliability study in which there are no real changes in the subjects. Or in which any real changes are the same for all subjects. Otherwise the estimate of the noise will be too large. Time between tests is therefore as short as necessary. A practice trial may be important, to avoid real changes. If published error is not relevant to your situation, do your own reliability study.

Assessment of Individual Clients and Patients If noise << signal... Example: body mass; noise in scales = 0.1 kg, signal = 1 kg. The scales are effectively noise-free. Accept the measurement without worry. If noise >> signal... Example: speed at ventilatory threshold; noise = 3%, signal = 1%. The noise swamps all but large changes or differences. Find a better test.

Assessment of Individual Clients and Patients If noise signal... Examples: many lab and field tests. Accept the result of the test cautiously. Or improve assessment by... 1.averaging several tests 2.using confidence limits 3.using likelihoods 4.possibly using Bayesian adjustment

1.Average several tests to reduce the noise. Noise reduces by a factor of 1/ n, where n = number of tests. 2.Use likely (confidence) limits for the subject's true value. Practically useful confidence is less than the 95% of research. For a single test, single score ± typical error are 68% confidence limits. For test and retest, change score ± typical error are 52% confidence limits. Assessment of Individual Clients and Patients

Example of likely limits for a change score: noise (typical error) = 1.0, smallest important change = 0.9. Assessment of Individual Clients and Patients "a positive change?" "no real change?" 012 Change score trivialnegativepositive -2 the true change is 52% likely to be between 0.5 and 2.5. If you see a change of 1.5, If you see a change of 0.5, the true change is 52% likely to be between -0.5 and 1.5.

Assessment of Individual Clients and Patients 3.Use likelihoods that the true value is greater/less than an important reference value or values. More precise than confidence limits, but needs a spreadsheet for the calculations. For single scores, the reference value is usually a pass-fail threshold. For change scores, the best reference values are ± the smallest important change.

Same example of a change score, to illustrate likelihoods : noise (typical error) = 1.0, smallest important change = 0.9. Assessment of Individual Clients and Patients "a positive change" "maybe no real change" 012 Change score trivialnegativepositive -2 66% the true change is positive; 66% 5% the true change is negative. 5% If you see 1.5, chances are... 29% the true change is trivial; 29% If you see 0.5, chances are... 39% the true change is positive; 45% the true change is trivial; 16% the true change is negative. 39%16%45%

Assessment of Individual Clients and Patients 4.Go Bayesian? That is, take into account your prior belief about the likely outcome of the test. When you scale down or reject outright an unlikely high score, you are being a Bayesian... because you attribute the high score partly or entirely to noise, not the client.

Assessment of Individual Clients and Patients To go Bayesian quantitatively … 1. specify your prior belief with likely limits ; 2. combine your belief with the observed score and the noise to give… 3.an adjusted score with adjusted likely limits or likelihoods. But how do you specify your prior belief believably ? Example: if you believe a change couldn't be outside ±3, where does the ±3 come from, and what likely limits define couldn't ? 80%, 90%, 95%, 99%... ? So use Bayes qualitatively but not quantitatively.

Based on having acceptable precision for the effect. Precision is defined by 95% likely limits. Estimate of likely limits needs typical error from a reliability study in which the time frame is the same as in the experiment. If published error is not relevant, try to do your own reliability study. Acceptable limits … Estimation of Sample Size for Experiments

Acceptable limits can't be both substantially positive and negative, in the worst case of observed effect = 0. Estimation of Sample Size for Experiments For a crossover, 95% likely limits = ±[ 2 x (typical error) / (sample size) ] x t 0.975,DF = ± d, where DF is the degrees of freedom in the experiment. Therefore sample size 8(typical error) 2 /d 2, so... 0 trivial Effect (change score) negativepositive Therefore 95% likely limits d-d = smallest important effect = ± d

Estimation of Sample Size for Experiments When typical error smallest effect, sample size 8. For a study with a control group, sample size 32 (4x as many). Beware: typical error in an experiment is often larger than in a reliability study, so you may need more subjects. When typical error << smallest effect, sample size could be ~1, but use ~8 in each group to ensure sample is representative.

Estimation of Sample Size for Experiments When typical error >> smallest effect … Test 100s of subjects to estimate small effects. Or test fewer subjects many times pre and post the treatment. Or use a smaller sample and find a test with a smaller typical error. Or use a smaller sample and hope for a large effect. Because larger effects need less precision. If you get a small effect, tell the editor your study will contribute to a meta-analysis.

Individual Responses to a Treatment An important but neglected aspect of research. How to see them? Three ways. 1. Display each subject's values : prepost test score prepost drug placebo time No Individual Responses to Drug prepost drug placebo time Substantial Individual Responses to Drug pre same reliability different reliability

Individual Responses to a Treatment 2. Look for an increase in the standard deviation of the treatment group in the post test. But you might miss it: prepost placebo time pre test score drug Each subject's values prepost test score time drug placebo Means and standard deviations relatively larger

Individual Responses to a Treatment 3. Look for a bigger standard deviation of the post-pre change scores in the treatment group. Now much easier to see any individual responses: To present the magnitude of individual responses... prepost test score prepost drug placebo Each subject's values placebo drug post-pre score 30 Means and SDs of change scores

Individual Responses to a Treatment Express individual responses as a standard deviation. Example: effect of drug = 14 ± 7 units (mean ± SD ). This SD for individual responses is free of measurement error. It is NOT the SD of the change score for the drug group. There is a simple formula for this SD (see next slide), but getting its likely limits is more challenging. If you find individual responses, try to account for them in your analysis using subject characteristics as covariates.

Individual Responses to a Treatment How to derive this standard deviation: From the standard deviations of the change scores of the treatment and control groups: ( SD 2 treat - SD 2 cont ). Or from analysis of the treatment and control groups as reliability studies : 2 ( error 2 treat - error 2 cont ). Or by using mixed modeling, especially to get its confidence limits. Identify subject characteristics responsible for the individual responses by using repeated-measures analysis of covariance. This approach also increases precision of the estimate of the mean effect.

This presentation, spreadsheets, more information at: A New View of Statistics SUMMARIZING DATA GENERALIZING TO A POPULATION Simple & Effect Statistics Precision of Measurement Precision of Measurement Confidence Limits Statistical Models Statistical Models Dimension Reduction Dimension Reduction Sample-Size Estimation Sample-Size Estimation newstats.org