Clinical Research: Sample Measure (Intervene) Analyze Infer.

Slides:



Advertisements
Similar presentations
The Multiple Regression Model.
Advertisements

RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Errors in Chemical Analyses: Assessing the Quality of Results
Sampling: Final and Initial Sample Size Determination
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Epidemiologic Methods- Fall Course Administration Format –Lectures: Tuesdays 8:15 am, except for Dec. 10 at 1:30 pm –Small Group Sections: Tuesdays.
Estimation of Sample Size
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Concept of Measurement
Chapter 8 Estimation: Single Population
Chapter 7 Estimation: Single Population
BCOR 1020 Business Statistics
Chapter 7 Correlational Research Gay, Mills, and Airasian
Quality Assurance in the clinical laboratory
Relationships Among Variables
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics, A First Course.
Chemometrics Method comparison
1 D r a f t Life Cycle Assessment A product-oriented method for sustainability analysis UNEP LCA Training Kit Module k – Uncertainty in LCA.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Regression and Correlation Methods Judy Zhong Ph.D.
V. Rouillard  Introduction to measurement and statistical analysis ASSESSING EXPERIMENTAL DATA : ERRORS Remember: no measurement is perfect – errors.
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Chapter 1: Introduction to Statistics
Epidemiologic Methods. Definitions of Epidemiology The study of the distribution and determinants (causes) of disease –e.g. cardiovascular epidemiology.
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
1 Lecture 2: Types of measurement Purposes of measurement Types and sources of data Reliability and validity Levels of measurement Types of scale.
Topic 6.1 Statistical Analysis. Lesson 1: Mean and Range.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Understanding Inferential Statistics—Estimation
Topic 5 Statistical inference: point and interval estimate
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Teaching Registrars Research Methods Variable definition and quality control of measurements Prof. Rodney Ehrlich.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Clinical Research: Sample Measure (Intervene) Analyze Infer.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Chapter 5 Errors In Chemical Analyses Mean, arithmetic mean, and average (x) are synonyms for the quantity obtained by dividing the sum of replicate measurements.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Reliability & Validity
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Yes - ANeed more information - CNo - B After competing for years under a cloud of suspicion, Jones tested positive for EPO June 23. Jones immediately requested.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Yes - ANeed more information - CNo - B After competing for years under a cloud of suspicion, Jones tested positive for EPO June 23. Jones immediately requested.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Medical Statistics as a science
Chapter 10: Confidence Intervals
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Sample Size Determination
Intro to Psychology Statistics Supplement. Descriptive Statistics: used to describe different aspects of numerical data; used only to describe the sample.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
Statistical analysis.
Quality Assurance in the clinical laboratory
Statistical analysis.
Understanding Results
Reliability & Validity
Understanding Research Results: Description and Correlation
Comparing Theory and Measurement
Presentation transcript:

Clinical Research: Sample Measure (Intervene) Analyze Infer

A study can only be as good as the data... -J.M. Bland

Understanding Measurement: Aspects of Reproducibility and Validity Reproducibility vs validity Impact of reproducibility on validity & statistical precision Assessing reproducibility of interval scale measurements –within-subject standard deviation –coefficient of variation (Section: assessing validity of interval scale measurements)

Measurement Scales

Reproducibility vs Validity Reproducibility –the degree to which a measurement provides the same result each time it is performed on a given subject or specimen –less than perfect reproducibility typically caused by random error Validity –from the Latin validus - strong –the degree to which a measurement truly measures (represents) what it purports to measure (represent) –less than perfect validity is fault of systematic error

Reproducibility vs Validity Reproducibility –aka: reliability, repeatability, precision, variability, dependability, consistency, stability Validity –aka: accuracy

Vocabulary for Error Overall Inferences from Studies Individual Measurements Systematic Error Validity (aka accuracy) Random Error PrecisionReproducibility

Reproducibility and Validity Good Reproducibility Poor Validity Poor Reproducibility Good Validity

Reproducibility and Validity Good Reproducibility Good Validity Poor Reproducibility Poor Validity

Why Care About Reproducibility? Impact on Validity Mathematically, the upper limit of a measurement’s validity is a function of its reproducibility Consider a study of height and basketball ability: –Assume height measurement: imperfect reproducibility –If we had measured height twice on a given person, most of the time we get two different values; at least 1 of the 2 values must be wrong (imperfect validity) –If study measures everyone only once, errors, despite being random, will lead to biased inferences when using these measurements (i.e. lack validity)

Impact of Reproducibility on Statistical Precision Classical Measurement Theory: –observed value (O) = true value (T) + measurement error (E) –If we assume E is random and normally distributed: E ~ N (0,  2 E ) Fraction error Error

Impact of Reproducibility on Statistical Precision Assume: –observed value (O) = true value (T) + measurement error (E) –E is random and ~ N (0,  2 E ) Then, when measuring a group of subjects, the variability of observed values (  2 O ) is a combination of: the variability in their true values (  2 T ) and the variability in the measurement error (  2 E )  2 O =  2 T +  2 E

Why Care About Reproducibility?  2 O =  2 T +  2 E More measurement error means more variability in observed measurements –e.g. measure height in a group of subjects. –If no measurement error –If measurement error Height Frequency

More variability of observed measurements has profound influences on statistical precision/power  2 O =  2 T +  2 E Descriptive studies: wider confidence intervals Analytic studies (Observational/RCT’s): power to detect a exposure (treatment) difference is reduced truth truth + error truthtruth + error

Mathematical Definition of Reproducibility Reproducibility Varies from 0 (poor) to 1 (optimal) As  2 E approaches 0 (no error), reproducibility approaches 1 Note: we can never directly measure this

Simulation study looking at the association of a given risk factor and a certain disease. Truth is a risk ratio= 2.0 R= reproducibility Power: probability of estimating a risk ratio within 15% of 2.0 Phillips and Smith, J Clin Epi 1993 Power

Sources of Random Measurement Error: What contributes to  2 E ? Observer (the person who performs the measurement) within-observer (intrarater) between-observer (interrater) Instrument within-instrument between-instrument Importance of each varies by study

Sources of Measurement Error e.g., plasma HIV viral load –observer: measurement to measurement differences in tube filling, time before processing –instrument: run to run differences in reagent concentration, PCR cycle times, enzymatic efficiency

Within-Subject Biologic Variability Although not the fault of the measurement process, moment-to-moment biological variability can have the same effect as errors in the measurement process Recall that: –observed value (O) = true value (T) + measurement error (E) –T = the average of measurements taken over time –E is always in reference to T –Therefore, lots of moment-to-moment within-subject biologic variability will serve to increase the variability in the error term and thus increase overall variability because  2 O =  2 T +  2 E

error

Assessing Reproducibility Depends on measurement scale Interval Scale –within-subject standard deviation and derivatives –coefficient of variation Categorical Scale –Kappa (see Clinical Epidemiology course) –(can be used for both predictors and outcomes)

Reproducibility of an Interval Scale Measurement: Peak Flow Assessment requires >1 measurement per subject Peak Flow Rate in 17 adults (Bland & Altman)

Assessment by Simple Correlation

Pearson Product-Moment Correlation Coefficient r (rho) ranges from -1 to +1 r r describes the strength of linear association r 2 = proportion of variance (variability) of one variable accounted for by the other variable

r = -1.0 r = 0.8 r = 0.0 r = 1.0 r = -1.0 r = 0.8r = 0.0

Correlation Coefficient for Peak Flow Data r ( meas.1, meas. 2) = 0.98

Limitations of Simple Correlation for Assessment of Reproducibility Depends upon range of data –e.g. Peak Flow r (full range of data) = 0.98 r (peak flow <450) = 0.97 r (peak flow >450) = 0.94

Additional Limitations of Simple Correlation for Assessment of Reproducibility Depends upon ordering of data –get different rho depending upon classification of meas 1 vs 2 Measures linear association only –it would be amazing if the replicates weren’t related

Meas. 2 Meas

Final Limitation of Simple Correlation for Assessment of Reproducibility Gives no meaningful parameter using the same scale as the original measurement –What does rho = 0.7 vs 0.8 vs 0.9 mean in the context of peak flow data which ranges from 200 to 600? –(Note: rho is not “R” from prior slide)

Within-Subject Standard Deviation Common (or mean) within-subject standard deviation (s w ) = 15.3 l/min

s w : Further Interpretation If assume that replicate results: – are normally distributed – mean of replicates estimates true value 95% of replicates are within (1.96)(s w ) of true value x  true value swsw (1.96) (s w )

Interpretation of s w : Peak Flow Data If assume that replicate results: – are normally distributed – mean of replicates estimates true value 95% of replicates within (1.96)(15.3) = 30 l/min of true value x  true value s w = 15.3 l/min (1.96) (s w ) = (1.96) (15.3) = 30

s w : Further Interpretation Difference between any 2 replicates for same person = diff = meas 1 - meas 2 Because var(diff) = var(meas 1 ) + var(meas 2 ), therefore, s 2 diff = s w 2 + s w 2 = 2s w 2 s diff

Interpreting s w : Difference Between Two Replicates If assume that differences: – are normally distributed and mean of differences is 0 – s diff estimates standard deviation The difference between 2 measurements for the same subject is expected to be less than (1.96)(s diff ) = (1.96)(1.41)s w = 2.77s w for 95% of all pairs of measurements x diff  0 s diff (1.96) (s diff )

s w : Further Interpretation: The Repeatability Value For Peak Flow data: The difference between 2 measurements for the same subject is expected to be less than 2.77s w =(2.77)(15.3) = 42.4 l/min for 95% of all pairs i.e. the difference between 2 replicates may be as much as 42.4 liters just by random measurement error alone liters termed (by Bland-Altman): “repeatability” or “repeatability coefficient” of measurement

Interpreting the “Repeatability” Value: Is 42.4 liters a lot? Depends upon the context Clinical management If other gold standards exist that are more reproducible, and: –differences < 42.4 are clinically relevant, then 42.4 is bad –differences < 42.4 not clinically relevant, then 42.4 not bad If no gold standards, probably unwise to consider differences as much as 42.4 to represent clinically important changes –would be valuable to know “repeatability” for all clinical tests Research Depends upon the differences in peak flow you hope to detect –If ~40, you’re in trouble –If several hundred, then not bad

One Common Underlying s w Appropriate only if there is one s w i.e, s w does not vary with true underlying value Within-Subject Std Deviation Subject Mean Peak Flow correlation coefficient = 0.17, p = 0.36

Another Interval Scale Example Salivary cotinine in children (Bland-Altman) n = 20 participants measured twice

Cotinine: Absolute Difference vs. Mean Subject Absolute Difference Subject Mean Cotinine correlation = 0.62, p = 0.001

Logarithmic (base 10) Transformation

Log 10 Transformed: Absolute Difference vs. Mean Subject abs log diff Subject mean log cotinine correlation = 0.07 p=0.7

s w for log-transformed cotinine data s w because this is on the log scale, it refers to a multiplicative factor and hence is known as the geometric within-subject standard deviation it describes variability in ratio terms (rather than absolute numbers)

Interpretation of s w : Cotinine Data If assume that replicate results: – are normally distributed – mean of replicates estimates true value 95% of replicates within a factor of 0.34 log10 of true value x  true value s w = log10 (1.96) (s w ) = (1.96) (0.175) = 0.34

Interpretation of s w : Cotinine Data 95% of replicates are within a factor of 0.34 log10 of true value back-transforming to base10 scale: –antilog(0.34) = = % of replicates are within a factor of 2.2 of true value An observed cotinine value of 2 ng/ml would tell us that the true value may be: –as little as 2/2.2 = 0.9 –as big as 2*2.2 = 4.4 –just by measurement error alone

Interpretation of s w : Cotinine Data Repeatability The difference between 2 measurements for the same subject is expected to be less than a factor of (1.96)(s diff ) = (1.96)(1.41)s w = 2.77s w for 95% of all pairs of measurements For cotinine data, s w = log10, therefore: –2.77*0.175 = 0.48 log10 –back-transforming, antilog(0.48) = = 3.1 For 95% of all pairs of measurements, the ratio between the measurements may be as much as 3.1 fold (this is “repeatability”)

Coefficient of Variation For cotinine data, the within-subject standard deviation (on the native scale) varies with the level of the measurement If the within-subject standard deviation is proportional to the level of the measurement, this can be summarized as: coefficient of variation = = = 0.49 At any level of cotinine, the within-subject standard deviation of repeated measures is 49% of the level

Coefficient of Variation for Peak Flow Data By definition, when the within-subject standard deviation is not proportional to the mean value, as in the Peak Flow data, then there is not a constant ratio between the within-subject standard deviation and the mean. Therefore, there is not one common coefficient of variation Estimating the the “average” coefficient of variation (within-subject sd/overall mean) is not meaningful

Peak Flow Data: Use of Coefficient of Variation when s w is Constant Could report a family of CV’s but this is tedious

Assessing Validity Measures can be assessed for validity in 3 ways: –Content validity Face Sampling –Construct validity –Empirical validity (aka criterion) Concurrent (i.e. when gold standards are present) –Interval scale measurement: 95% limits of agreement –Categorical scale measurement: sensitivity & specificity Predictive

Conclusions Measurement reproducibility plays a key role in determining validity and statistical precision in all different study designs When assessing reproducibility, for interval scale measurements: avoid correlation coefficients use within-subject standard deviation and derivatives like “repeatability” or coefficient of variation if within-subject sd is proportional to the magnitude of measurement (For categorical scale measurements, use Kappa) What is acceptable reproducibility depends upon desired use Assessment of validity depends upon whether or not gold standards are present, and can be a challenge when they are absent