TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine January 2008.

Slides:



Advertisements
Similar presentations
Validity and Reliability of Analytical Tests. Analytical Tests include both: Screening Tests Diagnostic Tests.
Advertisements

Standardized Scales.
Likelihood ratios Why these are the most thrilling statistics in Emergency Medicine.
Step 3: Critically Appraising the Evidence: Statistics for Diagnosis.
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.
GP 4001 Lecture Series Dealing with undifferentiated problems in primary care II.
Receiver Operating Characteristic (ROC) Curves
Sensitivity, Specificity and ROC Curve Analysis.
Estimation of Sample Size
Azita Kheiltash Social Medicine Specialist Tehran University of Medical Sciences Diagnostic Tests Evaluation.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Evaluation of Screening and Diagnostic Tests.
Concept of Measurement
Making sense of Diagnostic Information Dr Carl Thompson.
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Today Concepts underlying inferential statistics
(Medical) Diagnostic Testing. The situation Patient presents with symptoms, and is suspected of having some disease. Patient either has the disease or.
Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.
Interpreting Diagnostic Tests
Diagnosis Concepts and Glossary. Cross-sectional study The observation of a defined population at a single point in time or time interval. Exposure and.
Statistics in Screening/Diagnosis
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Multiple Choice Questions for discussion
Medical decision making. 2 Predictive values 57-years old, Weight loss, Numbness, Mild fewer What is the probability of low back cancer? Base on demographic.
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
1 Lecture 2: Types of measurement Purposes of measurement Types and sources of data Reliability and validity Levels of measurement Types of scale.
DEB BYNUM, MD AUGUST 2010 Evidence Based Medicine: Review of the basics.
Basic statistics 11/09/13.
Principles and Predictive Value of Screening. Objectives Discuss principles of screening Describe elements of screening tests Calculate sensitivity, specificity.
Reliability of Screening Tests RELIABILITY: The extent to which the screening test will produce the same or very similar results each time it is administered.
Vanderbilt Sports Medicine How to practice and teach EBM Chapter 3 May 3, 2006.
1 Interpreting Diagnostic Tests Ian McDowell Department of Epidemiology & Community Medicine January 2012 Note to readers: you may find the additional.
Screening and Diagnostic Testing Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego State University.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
CHP400: Community Health Program-lI Mohamed M. B. Alnoor Muna M H Diab SCREENING.
1 Epidemiological Measures I Screening for Disease.
Evaluating Diagnostic Tests Payam Kabiri, MD. PhD. Clinical Epidemiologist Tehran University of Medical Sciences.
Appraising A Diagnostic Test
Likelihood 2005/5/22. Likelihood  probability I am likelihood I am probability.
Evidence-Based Medicine Diagnosis Component 2 / Unit 5 1 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
Screening and its Useful Tools Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
HSS4303B – Intro to Epidemiology Feb 8, Agreement.
1 Wrap up SCREENING TESTS. 2 Screening test The basic tool of a screening program easy to use, rapid and inexpensive. 1.2.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Evidence based medicine Diagnostic tests Ross Lawrenson.
1 Medical Epidemiology Interpreting Medical Tests and Other Evidence.
Section Conditional Probability Objectives: 1.Understand the meaning of conditional probability. 2.Learn the general Multiplication Rule:
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Laboratory Medicine: Basic QC Concepts M. Desmond Burke, MD.
Evaluation of Diagnostic Tests & ROC Curve Analysis PhD Özgür Tosun.
Diagnosis:Testing the Test Verma Walker Kathy Davies.
Biostatistics Board Review Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
CHAPTER 3 Key Principles of Statistical Inference.
Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Critical Appraisal Course for Emergency Medicine Trainees Module 5 Evaluation of a Diagnostic Test.
Diagnostic studies Adrian Boyle.
DR.FATIMA ALKHALEDY M.B.Ch.B;F.I.C.M.S/C.M
Diagnostic Test Studies
Evidence-Based Medicine
Class session 7 Screening, validity, reliability
Understanding Results
Lecture 3.
Measuring Success in Prediction
بسم الله الرحمن الرحيم Clinical Epidemiology
Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor
Refining Probability Test Informations Vahid Ashoorion MD. ,MSc,
Evidence Based Diagnosis
Basic statistics.
Presentation transcript:

TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine January 2008

2 The Challenge of Clinical Measurement Diagnoses are based on information, from formal measurements and/or from your clinical judgment This information is seldom perfectly accurate: –Random errors can occur (machine not working?) –Biases in judgment or measurement can occur (“this kid doesn’t look sick”) –Due to biological variability, this patient may not fit the general rule –Diagnosis (e.g., hypertension) involves a categorical judgment; this often requires dividing a continuous score (blood pressure) into categories. Choosing the cutting-point is challenging

3 Therefore… You need to be aware … –That we express these complexities in terms of probabilities –That using a quantitative approach is better than just guessing! –That you will gradually become familiar with the typical accuracy of measurements in your chosen clinical field –That the principles apply to both diagnostic and screening tests –Of some of the ways to describe the accuracy of a measurement

Attributes of Tests or Measures Safety, Acceptability, Cost, etc. Reliability: consistency or reproducibility; this considers chance or random errors (which sometimes increase, sometimes decrease, scores) Validity: “Is it measuring what it is supposed to measure?” By extension, “what diagnostic conclusion can I draw from a particular score on this test?” Validity may be affected by bias, which refers to systematic errors (these fall in a certain direction) 4

5 Reliability and Validity Reliability Low High Validity Low High Biased result! ☺ Average of these inaccurate results is not bad. This is probably how screening questionnaires (e.g., for depression) work

Ways of Assessing Validity Content or “Face” validity: does it make clinical or biological sense? Does it include the relevant symptoms? Criterion: comparison to a “gold standard” definitive measure (e.g., biopsy, autopsy) –Expressed as sensitivity and specificity Construct validity (this is used with abstract themes, such as “quality of life” for which there is no definitive standard) 6

Criterion, or “Gold Standard” The clinical observation or simple test is judged against More definitive (but expensive or invasive) tests, such as a complete work-up, Or against Eventual outcome (for screening tests, when workup of well patients is unethical) Sensitivity and specificity are calculated 7

8 2 x 2 Table for Testing a Test TP = true positive; FP = false positive… Golden Rule: always calculate based on the gold standard Gold standard Disease Disease Present Absent Test score: Test positive Test negative a (TP) b (FP) c (FN) d (TN) Validity: Sensitivity Specificity = a/(a+c) = d/(b+d)

A Bit More on Sensitivity = Test’s ability to detect disease when it is present a/(a+c) = TP/(TP+FN) Mnemonics: - a sensitive person is one who is aware of your feelings - (1 – seNsitivity) = false Negative rate = how many cases are missed by the screening test? 9

…and More on Specificity Ability to detect absence of disease when it is truly absent (can it detect non-disease?) d/(b+d) = TN/(FP+TN) Mnemonics: –a specific test would identify only that type of disease. “Nothing else looks like this” –(1- sPecificity) = false Positive rate (How many are falsely classified as having the disease?) The FP idea will arise again, so keep it in mind! 10

11 Most Tests Provide a Continuous Score. Selecting a Cutting Point Pathological scores Healthy scores Move this way to increase sensitivity (include more of sick group) Move this way to increase specificity (exclude healthy people) Test scores for a healthy population Sick population Crucial issue: changing cut-point can improve sensitivity or specificity, but never both Possible cut-point

12 Clinical applications A specific test can be useful to rule in a disease. Why? –Very specific tests give few false positives. So, if the result is positive, you can be sure the patient has the condition (‘nothing else would give this result’): “SpPin” D + D - ab cd T + T - A sensitive test can be useful for ruling a disease out: –A negative result on a very sensitive test (which detects all true cases) reassures you that the patient does not have the disease: “SnNout”

Problems with Wrong Results False Positives can arise due to other factors (such as taking other medications, diet, etc.) They entail cost and danger of investigations, labeling, worry –This is similar to Type I or alpha error in a test of statistical significance: the possibility of falsely concluding that there is an effect of an intervention. False Negatives imply missed cases, so potentially bad outcomes if untreated –cf Type II or beta error: the chance of missing a true difference 13

14 Practical Question: “Doctor, how likely am I to have this disease?” = Predictive Values Sensitivity & specificity don’t tell you this, because they work from the gold standard. Now you need to work from the test result, but you won’t know whether this person is a true positive or a false positive (or a true or false negative). Hmmm… How accurately will a positive (or negative) result predicts disease (or health)?

Positive and Negative Predictive Values Based on rows, not columns Positive Predictive Value (PPV) = a/(a+b) = Probability that a positive score is a true positive NPV = d/(c+d); same for a negative test result BUT… there’s a big catch: We are now working across the columns, so PPV & NPV depend critically on how many cases of disease there are (prevalence). As prevalence goes down, PPV goes down (it’s harder to find the smaller number of cases) and NPV rises. So, PPV and NPV must be determined for each clinical setting, But this is then immediately useful to clinician: reflects this population, so tell us about this patient D + D - ab cd T + T -

16 D + D - T + T Sensitivity = 50/55 = 91% Specificity = 100/110 = 91% Prevalence = 55/165 = 33% A. Specialist referral hospital PPV = 50/60 = 83% NPV = 100/105 = 95% D + D - T + T Sensitivity = 50/55 = 91% Specificity = 1000/1100 = 91% Prevalence = 55/1155 = 3% B. Primary care PPV = 50/150 = 33% NPV = 1000/1005 = 99.5% Prevalence and Predictive Values

Imagine you know Sensitivity & Specificity. To work out PPV and NPV you need to guess prevalence, then work backwards: Fill cells in following order: “Truth” DiseaseDiseaseTotal PV PresentAbsent Test Pos Test Neg Total 1st2 nd 3rd 4 th 5th (from sensitivity)(from specificity) 7th 6th 8 th 9th 10 th 11th (from estimated prevalence)

18 Gasp…! Isn’t there an easier way to do all this…? Yes (good!) But first, you need a couple more concepts (less good…) Before you apply a diagnostic test, prevalence gives your best guess about the chances that this patient has the disease. This is known as “Pretest Probability of Disease”: (a+c) / N in the 2 x 2 table: It can also be expressed as odds of disease: (a+c) / (b+d), as long as the disease is rare ab cd N

19 Test scores are continuous scales. You can use this to combine sensitivity and specificity: Meet Receiver Operating Characteristic Curves Work out Sen and Spec for every possible cut-point, then plot these. Area under the curve indicates the information provided by the test 1-Specificity ( = false positives) Sensitivity Note: the theme of sensitivity & (1-specificity) will appear again!

20 This Leads to … Likelihood Ratios Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positives / false positives. [TP / FP] Advantages: –Combines sensitivity and specificity into one number –Can be calculated for many levels of the test –Can be turned into predictive values LR for positive test = Sensitivity / (1-Specificity) LR for negative test = (1-Sensitivity) / Specificity

21 Practical application: a Nomogram 1)You need the LR for this test 2)Plot the likelihood ratio on center axis (e.g., LR+ = 20) Example: Post-test probability = 91% ▪ 3) Select pretest probability (prevalence) on left axis (e.g. Prevalence = 30%) ▪ 4) Draw line through these points to right axis to indicate post-test probability of disease

Chaining LRs Together (1) Example: 45 year-old woman presents with “chest pain” –Based on her age, pretest probability that a vague chest pain indicates CAD is about 1% Take a fuller history. She reports a 1-month history of intermittent chest pain, suggesting angina (substernal pain; radiating down arm; induced by effort; relieved by rest…) –LR of this history for angina is about 100

23 The previous example: 1. From the History: Pretest probability rises to 50% based on history She’s young; pretest probability about 1% LR 100

24 Chaining LRs Together (2) 45 year-old woman with 1-month history of intermittent chest pain… After the history, post test probability is now about 50%. What will you do? Something more precise (but also more costly): Record an ECG –Results = 2.2 mm ST-segment depression. LR for ECG 2.2 mm result = 10. –Overall post test probability is now >90% for coronary artery disease (see next slide)

25 The previous example: ECG Results Now start pretest probability (i.e. 50%, prior to ECG, based on history) Post-test probability now rises to 90%