TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.

TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004

2 A Lab Report (Montfort Hospital Biochem Lab)

3 The Challenge of Clinical Measurement Diagnoses are based on information, from formal measurements or from your clinical judgment This information is seldom perfectly accurate: –Random errors can occur –Biases in judgment or measurement can occur –Due to biological variability, this patient may not fit the general rule –Diagnosis (e.g., hypertension) involves a categorical judgment; this often requires dividing a continuous score (blood pressure) into categories. Choosing the cutting-point may be arbitrary

4 Therefore… You need to be aware … –That diagnosis is a matter of probabilities –That using a quantitative approach is better than just guessing! –That you will ultimately become familiar with the typical accuracy of measurements in your chosen clinical field –Of some of the ways to describe the accuracy of a measurement –That the principles apply to both diagnostic and screening tests

Attributes of Tests or Measures Cost, Safety, Acceptability, etc. Reliability: reproducibility; this considers chance or random errors Validity: Does it measure what it is supposed to measure? By extension, what diagnostic conclusion can I draw from a particular score on the test? Validity may be affected by bias, or systematic errors

6 Reliability and Validity Reliability Low High Validity Low High

Ways of Assessing Validity Face, Content validity: does it make clinical or biological sense? Does it include the relevant symptoms? Criterion: comparison to a “gold standard” definitive measure –Expressed as sensitivity and specificity Construct validity (this is used with abstract themes, such as “quality of life” for which there is no definitive standard)

“Gold Standards” Sensitivity and specificity are judged against More definitive (but expensive or invasive) tests, such as a complete work-up, Or against Eventual outcome (for screening tests, when workup of well patients is unethical)

2 x 2 Table for Testing a Test Gold standard DiseaseDisease PresentAbsent Positive testa (TP)b (FP) Negative testc (FN)d (TN) Validity: SensitivitySpecificity = a/(a+c) = d/(b+d) TP = true positive; FP = false positive…

A Bit More on Sensitivity = Ability to detect disease when it is present a/(a+c) = TP/(TP+FN) Mnemonics: a sensitive person is one who can detect your feelings (1 – seNsitivity) = false Negative rate (i.e., How many cases are missed by the screening test?) Cf. power of statistical test (1-  )

…and More on Specificity Ability to detect absence of disease when it is truly absent (can it detect non-disease?) d/(b+d) = TN/(FP+TN) Mnemonics: –a specific test would identify only that type of disease. “Nothing else looks like this” –(1- sPecificity) = false Positive rate (How many are falsely classified as having the disease?)

12 Clinical applications A specific test can be useful to rule in a disease. If the result on a specific test is positive, you can be sure the patient has the condition: “SpPin” A sensitive test can be useful for ruling a disease out. A negative result on a very sensitive test reassures you that the patient does not have the disease: (“SnNout”)

13 The Selection of a Cutting Point Pathological scores Healthy scores Move this way to increase sensitivity Move this way to increase specificity Well population Sick population Crucial issue: changing cut-point can improve sensitivity or specificity, but at expense of the other

Problems with Wrong Results False Positives can arise due to other factors (such as taking other medications, diet, etc.) They entail cost and danger of investigations, labeling, worry –This is similar to Type I or alpha error in a test of statistical significance: the possibility of falsely concluding that there is an effect of an intervention. False Negatives imply missed cases, so potentially bad outcomes if untreated –cf Type II or beta error: the chance of missing a true difference

15 The Crucial Point: Predictive Values Sensitivity & specificity are characteristics of the test But the clinician, of course, gets the test result and do not know if this person is a true positive or a false positive (or a true or false negative). Hmmm… How do we assess the predictive value of a positive or negative result?

Predictive Values Based on rows, not columns PPV = a/(a+b); interprets positive test NPV = d/(c+d); interprets negative test Immediately useful to clinician: they tell us about the population and thus the patient Depend upon prevalence of disease, so must be determined for each clinical setting As prevalence goes down, PPV goes down and NPV rises D + D - aab cd T + T -

17 D + D - T + T - 50 5 10 100 Sensitivity = 50/55 = 91% Specificity = 100/110 = 91% A. Referral hospital: Prevalence = 55/165 = 33% PPV = 50/60 = 83% NPV = 100/105 = 95% D + D - T + T - 50 5 100 1000 Sensitivity = 50/55 = 91% Specificity = 1000/1100 = 91% B. Primary Care: Prevalence = 55/1155 = 3% PPV = 50/150 = 33% NPV = 1000/1005 = 99.5% Same Test, Two Clinical Situations

18 Practical Question: “Doctor, what’s my likelihood of having the disease?” To answer this question You need to have a general idea of the sensitivity & specificity of the test To interpret the results, you also need to know roughly the prevalence of the condition in your practice. You can then work out the PPV and answer the patient’s question. “Give me a break, dude … Surely there is an easier way to bring all this together?”

Prevalence of Disease We have seen how this influences the interpretation of a test score Before you do the test, prevalence gives your best guess about the probability that the patient has the disease Also known as Pretest Probability of Disease: (a+c) / N in 2 x 2 table Or, can be expressed as odds of disease: (a+c) / (b+d) ab cd N

Estimating predictive values for a specific setting is called ‘calibrating’ the test You could: –Apply a the test and a definitive test to a consecutive series of patients (rarely feasible) –Calculate from Bayes’s Theorem (ouch!) –Draw a hypothetical table (maybe?) –Use a nomogram (tell me how)

Calibration by hypothetical table Fill cells in following order: “Truth” DiseaseDiseaseTotal PV PresentAbsent Test Pos Test Neg Total 1st2 nd 3rd 4 th 5th (from sensitivity)(from specificity) 7th 6th 8 th 9th 10 th 11th (from prevalence)

22 Combining Sensitivity and Specificity: Receiver Operating Characteristic Curves Work out Sen and Spec at every possible cut-point, then plot these. Area under the curve indicates the information provided by the test 1-Specificity (= false positives) Sensitivity 00.20.40.60.81 0 0.2 0.4 0.6 0.8 1 Note: the theme of sensitivity & (1-specificity) will appear again!

23 Likelihood Ratios Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positives / false positives. Advantages: –Express sensitivity and specificity in one number –Can be calculated for many levels of the test –Can be turned into predictive values LR for positive test = Sensitivity / (1-Specificity) LR for negative test = (1-Sensitivity) / Specificity

24 Calibration with a Nomogram 1) You need the LR. 2) Select pretest probability (prevalence) on left axis 3) Select likelihood ratio on center axis 4) Draw line through right axis to indicate post- test probability of disease Example: Prevalence = 30% LR+ = 20; Post-test probability = 91%

Chaining LRs Together Example: 45 year-old woman with 1-month history of intermittent chest pain. –Pretest probability about 1% for CAD –History suggestive of angina (substernal pain; radiating down arm; induced by effort; relieved by rest…). LR of this history for angina is about 100

26 The previous example: 1. From the History: Pretest probability rises to 50% based on history She’s young; pretest probability about 1%

27 Chaining LRs Together 45 year-old woman with 1-month history of intermittent chest pain… After the history, post test probability is now about 50%. What will you do? Record an ECG –Results = 2.2 mm ST-segment depression. LR for ECG 2.2 mm = 10. –Overall post test probability is now >90% for coronary artery disease (see next slide)

28 The previous example: ECG Results Now start pretest probability (i.e. prior to ECG) at 50%, based on history: Post-test probability now rises to 90%

TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.

Similar presentations

Presentation on theme: "TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.

Similar presentations

Presentation on theme: "TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004."— Presentation transcript:

Similar presentations

About project

Feedback