Diagnostic tests Subodh S Gupta MGIMS, Sewagram

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Dichotomous Tests (Tom). Their results change the probability of disease Negative testPositive test Reassurance Treatment Order a Test A good test moves.
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center.
Performance of a diagnostic test
Create an Application Title 1Y - Youth Chapter 5.
CALENDAR.
DiseaseNo disease 60 people with disease 40 people without disease Total population = 100.
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
TEACHING ABOUT DIAGNOSIS
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Sampling in Marketing Research
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
MM4A6c: Apply the law of sines and the law of cosines.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Lecture 3 Validity of screening and diagnostic tests
SCREENING CHP400: Community Health Program-lI Mohamed M. B. Alnoor
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Static Equilibrium; Elasticity and Fracture
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Simple Linear Regression Analysis
How do we delay disease progress once it has started?
Biostatistics course Part 14 Analysis of binary paired data
9. Two Functions of Two Random Variables
Patient Survey Results 2013 Nicki Mott. Patient Survey 2013 Patient Survey conducted by IPOS Mori by posting questionnaires to random patients in the.
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
Chapter 8 Estimating with Confidence
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.
Estimation of Sample Size
Azita Kheiltash Social Medicine Specialist Tehran University of Medical Sciences Diagnostic Tests Evaluation.
Journal Club Alcohol and Health: Current Evidence September–October 2006.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Statistics in Screening/Diagnosis
Multiple Choice Questions for discussion
Diagnosis Articles Much Thanks to: Rob Hayward & Tanya Voth, CCHE.
Basic statistics 11/09/13.
Evidence Based Medicine Workshop Diagnosis March 18, 2010.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Appraising A Diagnostic Test
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
Likelihood 2005/5/22. Likelihood  probability I am likelihood I am probability.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
Diagnostic Test Characteristics: What does this result mean
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
1 Medical Epidemiology Interpreting Medical Tests and Other Evidence.
10 May Understanding diagnostic tests Evan Sergeant AusVet Animal Health Services.
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
Timothy Wiemken, PhD MPH Assistant Professor Division of Infectious Diseases Diagnostic Tests.
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
Diagnostic Test Studies
Diagnosis General Guidelines:
Evidence Based Diagnosis
Presentation transcript:

Diagnostic tests Subodh S Gupta MGIMS, Sewagram One of the important roles a physician plays is to diagnose illnesses. For reaching a diagnosis, a physician uses different clinical information; e.g. Symptoms and signs and laboratory tests. With experience, a clinician learns the importance of various clinical information and how to interpret the positivity or negativity of a given diagnostic test. A junior clinician remains surprised at times when his senior declares that even if a diagnostic test is positive, the chances that a patient suffers from the given illness is extremely low. Or, even when a test is negative, the chance that a patient suffers from a given illness is high. Is it just by experience? Or, is there a science to it? Before, I proceed, I want to clarify a point. We know that the term ’diagnostic test’ simply means a test performed in a laboratory. But, what we are going to discuss today will apply equally well to all kinds of clinical information; e.g. Symptoms, signs and various risk factors that the patient is exposed to. It may also represent a combination of clinical information.

Standard 2 X 2 table (For Diagnostic Tests) Disease Status Present (D+) Absent (D-) Total Diagnostic test Positive (T+) a b a+b Negative (T-) c d c+d a+c b+d N Gold Standard Let us start from what all of us already know and build from there. A simple way of looking at the relationships between a test’s results and the true diagnosis is given by a 2X2 table. The test is considered to be either positive or negative or the disease is either present or absent. There are then four possibilities.

Standard 2 X 2 table (For Diagnostic Tests) Disease Status Present (D+) Absent (D-) Diagnostic test Positive (T+) TP FP Negative (T-) FN TN Gold Standard The test has given the correct result when it is positive among those who are diseased or negative among those who are non-diseased. Or it can give incorrect result when it is positive among those without the disease or negative in the presence of the disease.

Gold standard In any study of diagnosis, the method being evaluated has to be compared to something The best available test that is used as comparison is called the GOLD STANDARD Need to remember that all gold standards are not always gold; New test may be better than the gold standard

Test parameters Gold Standard a b a+b c d c+d a+c b+d N Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) a b a+b Negative (T-) c d c+d a+c b+d N The rate of correct identification of those who are diseased is known as Sensitivity (True Positives). Similarly, the rate of correct identification of those who are not diseased is known as Specificity (True Negatives). We know from our previous knowledge that a sensitive test should be chosen when we do not want to miss any case. Similarly, a specific test should be used when we want to confirm a diagnosis or we do not want that anyone who may not have the disease is labeled diseased. Sensitivity = Pr(T+|D+) = a/(a+c) --Sensitivity is PID (Positive In Disease) Specificity = Pr(T-|D-) = d/(b+d) --Specificity is NIH (Negative In Health)

Test parameters a b a+b c d c+d a+c b+d N Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) a b a+b Negative (T-) c d c+d a+c b+d N Gold Standard Complementary to the concept of Sensitivity and Specificity are the concepts of Error Rates. False positive rate is the rate of incorrectly identifying the disease among those who are not diseased. And, False negative rate is the rate of incorrectly identifying a subject as not diseased among those who are diseased. There is another parameter known as ‘Diagnostic Accuracy’ or simply ‘Accuracy’ of a test, which gives proportion of study subjects among whom the test gives correct diagnosis. False Positive Rate (FP rate) = Pr(T+|D-) = b/(b+d) False Negative Rate (FN rate) = Pr(T-|D+) = c/(a+c) Diagnostic Accuracy = (a+d)/n

Test parameters a b a+b c d c+d a+c b+d N Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) a b a+b Negative (T-) c d c+d a+c b+d N Gold Standard Two more parameters are very important in relation with a diagnostic test – Positive predictive value and Negative predictive value. The probability of a disease, given the results of a test, is called ‘Predictive value’ of a test. PPV means probability of a disease after the test has come positive and similarly NPV means probability of a disease after the test has come negative. Positive Predictive Value (PPV) = Pr(D+|T+) = a/(a+b) Negative Predictive Value (NPV) = Pr(D-|T-) = d/(c+d)

Test parameters: Example Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) 90 5 95 Negative (T-) 10 105 100 200 Sensitivity = 90/(90+10), Specificity = 95/(95+5) FP rate = 5/ (95+5); FN Rate = 10/ (90+10) Diagnostic Accuracy = (90+95) / (90+10+5+95) PPV = 90/(90+5); NPV = 95/(95+10) Based on this information, let us calculate the different test parameters.

PPV & NPV with Prevalence Sensitivity 90% Specificity 95% False Negative Rate 10% False Positive Rate 5% PPV 94.7% NPV 90.5% Diagnostic Accuracy 92.5%

Let us see this in a graphical form Let us see this in a graphical form. Let us imagine a clinical data which takes on a range of values. Imagine the first curve represents the distribution of the test criterion among the healthy persons and the curve on the right side represents the distribution of the same test criterion among group of patients suffering from a particular illness. Usually, there is an overlap in the test criterion among the healthy and the diseased. Imagine, the cut-off is set at the vertical line given here.

Then, the different areas shaded in different colors represent the TP, FP, FN and TN.

Healthy population vs sick population Now, let us imagine two different situations. One situation, where we do the test in a setting where the chances of patients being diseased is high. Healthy Sick

Predictive Values in hospital-based data Most test positives here are sick. But this is because there are as many sick as healthy people overall. What if fewer people were sick, relative to the healthy?

Predictive Values in population-based data Now most test positives below are healthy. This is because the number of false positives from the larger healthy group outweighs the true positives from the sick group. Thus, the chance that a test positive is sick depends on the prevalence of the disease in the group tested!

Test Parameters: Example Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) 90 5 95 Negative (T-) 10 105 100 200 Let us examine this taking a numerical example. Prevalence = 50% PPV = 94.7% NPV = 90.5% Diagnostic Accuracy = 92.5%

Test Parameters: Example Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) 90 95 185 Negative (T-) 10 1805 1815 100 1900 2000 Prevalence = 5% PPV = 48.6% NPV = 99.4% Diagnostic Accuracy = 94.8%

Test Parameters: Example Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) 90 995 1085 Negative (T-) 10 18905 18915 100 19900 20000 Prevalence = 0.5% PPV = 8.3% NPV = 99.9% Diagnostic Accuracy = 95%

Test Parameters: Example Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) 90 9995 10085 Negative (T-) 10 189905 189915 100 199900 200000 Prevalence = 0.05% PPV = 0.9% NPV = 100% Diagnostic Accuracy = 95%

PPV & NPV with Prevalence 50% 5% 0.5% 0.05% Sensitivity 90% Specificity 95% PPV 94.7% 48.6% 8.3% 0.9% NPV 90.5% 99.4% 99.9% 100% Diagnostic Accuracy 92.5% 94.8%

Trade-offs between Sensitivity and Specificity It is important that we have a test that is both highly sensitive and highly specific. However, this is usually not possible. There is a trade-off between sensitivity and specificity. When we change a decision threshold for any test.

Sensitivity and Specificity solve the wrong problem!!! When we use Diagnostic test clinically, we do not know who actually has and does not have the target disorder, if we did, we would not need the Diagnostic Test. Our Clinical Concern is not a vertical one of Sensitivity and Specificity, but a horizontal one of the meaning of Positive and Negative Test Results. BE-Workshop-DT-July2007

When a clinician uses a test, which question is important ? If I obtain a positive test result, what is the probability that this person actually has the disease? If I obtain a negative test result, what is the probability that the person does not have the disease? BE-Workshop-DT-July2007

Test parameters Gold Standard a b a+b c d c+d a+c b+d N Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) a b a+b Negative (T-) c d c+d a+c b+d N Sensitivity = Pr(T+|D+) = a/(a+c) Specificity = Pr(T-|D-) = d/(b+d) PPV = Pr(D+|T+) = a/(a+b) NPV = Pr(D-|T-) = d/(c+d)

Likelihood Ratios Likelihood Ratio is a ratio of two probabilities Likelihood ratios state how many time more (or less) likely a particular test results are observed in patients with disease than in those without disease. LR+ tells how much the odds of the disease increase when a test is positive. LR- tells how much the odds of the disease decrease when a test is negative

The likelihood ratio for a positive result (LR+) tells how much the odds of the disease increase when a test is positive. The likelihood ratio for a negative result (LR-) tells you how much the odds of the disease decrease when a test is negative

Likelihood Ratios The LR for a positive test is defined as: LR (+) = Prob (T+|D) / Prob(T+|ND) LR (+) = [TP/(TP+FN)] [FP/(FP+TN)] LR (+) = (Sensitivity) / (1-Specificity)

Likelihood Ratios The LR for a negative test is defined as: LR (-) = Prob (T-|D) / Prob(T-|ND) LR (-) = [FN/(TP+FN)] [TP/(FP+TN)] LR (-) = (1-Sensitivity) / (Specificity)

What is a good ‘Likelihood Ratios’? A LR (+) more than 10 or a LR (-) less than 0.1 provides convincing diagnostic evidence. A LR (+) more than 5 or a LR (-) less than 0.2 is considered to give strong diagnostic evidence.

Likelihood Ratio: Example Gold Standard Disease Status Present (D+) Absent (D-) Total Diagnostic Test Positive (T+) 90 5 95 Negative (T-) 10 105 100 200 Likelihood Ratio for a positive test = (90/100) / (5/100) = 90/ 5 = 18 Likelihood Ratio for a negative test = (10/100) / (95/100) = 10/ 95 = 0.11

Exercise In a hypothetical example of a diagnostic test, serum levels of a biochemical marker of a particular disease were compared with the known diagnosis of the disease. 100 international units of the marker or greater was taken as an arbitrary positive test result:

Example Disease Status Present Absent Total Marker >=100 431 30 461 <100 29 116 145 460 146 606

Exercise Initial creatine phosphokinase (CK) levels were related to the subsequent diagnosis of acute myocardial infarction (MI) in a group of patients with suspected MI. Four ranges of CK result were chosen for the study:

Exercise Disease Status Present Absent Total CPK >=280 97 1 98 80-279 118 15 133 40-79 13 26 39 1-39 2 88 100 230 130 360

Odds and Probability a b a+b Disease Status Present Absent Total a b a+b Probability of Disease = (# with disease) / (# with & # without disease) = a/ (a+b) Odds of a disease = (# with disease) / (# without disease) = a/ b Probability = Odds/ (Odds+1); Odds = Probability / (1-Probability)

Use of Likelihood Ratio Employment of following three step procedure: 1. Identify and convert the pre-test probability to pre-test odds. 2. Determine the post-test odds using the formula, Post-test Odds = Pre-test Odds * Likelihood Ratio 3. Convert the post-test odds into post-test probability.

Likelihood Ratio: Example A 52 yr woman presents after detecting 1.5 cm breast lump on self-exam. On clinical exam, the lump is not freely movable. If the pre-test probability is 20% and the LR for non-movable breast lump is 4, calculate the probability that this woman has breast cancer.

Likelihood Ratio: Solution First step Pre-test probability = 0.2 Pre-test odds = Pre-test prob / (1-pre-test prob) Pre-test odds = 0.2/(1-0.2) = 0.2/0.8 = 0.25 Second step Post-test odds Pre-test odds * LR Post-test odds = 0.25*4 = 1 Third step Post-test probability = Post-test odds / (1 + Post-test odds) Post-test probability = 1/(1+1) = ½ = 0.5

Receiver Operating Characteristic (ROC) Finding a best test Finding a best cut-off Finding a best combination probably negative Equivocal Probably positive Definitive positive Another way to express the relationship between sensitivity and specificity for a given test is to construct a curve, called Receiver Operating Characteristic (ROC) Curve.

ROC curve constructed from multiple test thresholds

Receiver Operating Characteristic (ROC) ROC Curve allows comparison of different tests for the same condition without (before) specifying a cut-off point. The test with the largest AUC (Area under the curve) is the best.

Features of good diagnosis study Comparative (compares new test against old test). Should be a “gold standard” Should include both positive and negative results Usually will involve “blinding” for both patient, tester and investigator.

Gold standard In any study of diagnosis, the method being evaluated has to be compared to something The best available test that is used as comparison is called the GOLD STANDARD Need to remember that all gold standards are not always gold; New test may be better than the gold standard

Typical setting for finding Sensitivity and Specificity Best if everyone who gets the new test also gets “gold standard” Doesn’t happen in the real world Not even a sample of each (case-control type) Case series of patients who had both tests

Setting for finding Sensitivity and Specificity Sensitivity should not be tested in “sickest of sick” Should include spectrum of disease Specificity should not be tested in “healthiest of healthy” Should include similar conditions.

Precision How precise are the estimates of Sensitivity, Specificity, False Positive Rate, False Negative Rate, Positive Predictive Value and Negative Predictive Value? If reported without a measure of precision, clinicians cannot know the range within which the true values of the indices are likely to lie. When evaluations of diagnostic accuracy are reported the precision of test characteristics should be stated.

Sample size for adequate sensitivity

Sample size for adequate specificity

Exercise Dr Egbert Everard wants to test a new blood test (Sithtastic) for the diagnosis of the dark side gene. He wants the test to have a sensitivity of at least 70% and a specificity of 90% with 5% confidence levels. Disease prevalence in this population is 10%.  (i) How many patients does Egbert need to be 95% sure his test is more than 70% sensitive?  (ii) How many patients does Egbert need to be 95% sure that his test is more than 90% specific? 

Biases in Research on Diagnostic Tests Observer Bias Spectrum Bias Reference Test Bias Bias Index Work-Up (Verification Bias) Diagnostic Suspicion Bias BE-Workshop-DT-July2007

Observer bias Blinding Investigators should be blinded to the test results when interpreting the reference test, and blinded to the reference test results when interpreting the test. Should they also be blinded to other patient characteristics? BE-Workshop-DT-July2007

Spectrum bias Indeterminate results dropped from analysis BE-Workshop-DT-July2007

Reference Test Bias What if the ‘Gold Standard’ is not gold after all? Absence of Gold standard Methods to deal with the absence of a gold standard: Correcting for Reference Test Bias (Gart & Buck) Bayesian estimations (Joseph, Gyorkos, Coupal) Latent class modeling (Walter, Cook, Irwig) BE-Workshop-DT-July2007

BIAS INDEX What if the test itself commits a certain types of errors more commonly than the other? BI = (b-c)/N BE-Workshop-DT-July2007

Work-up (Verification Bias) Occurs when a test efficacy study is restricted to patients in whom the disease status is known. A study by Borow et al (Am Heart J,1983) on patients who were referred for valve surgery on the basis of echocardiographic assessment reported excellent diagnostic agreement between the findings at echocardiography and at surgery. BE-Workshop-DT-July2007

Review Bias The ‘Test’ and ‘Gold Standard’ should follow a randomized sequence of administration. This tends to offset the Diagnostic Suspicion Bias that may creep in, when the Gold Standard is always applied and interpreted last. It will also balance any effect of time on rapidly increasing severity of the disease and thereby avoid a bias towards more positives in the test which is performed later. BE-Workshop-DT-July2007

Ethical Issues in Diagnostic Test Research Invasive techniques Labeling Confidentiality Human subjects BE-Workshop-DT-July2007

Review of studies published between 1990-93. QUALITIES OF STUDIES EVALUATING DIAGNOSTIC TESTS Reid MC et al. Use of methodological standards in diagnostic test research: getting better but still not good. JAMA 1995; 274: 645. Review of studies published between 1990-93. Work-up Bias: 38% Studies Observer Bias (Blinding): 53% Studies Bias from Indeterminate Results: 62% Studies No assessment of variability across test observers, test instruments, or time: 68% Studies BE-Workshop-DT-July2007

Patient Characteristics not described: 68% Studies QUALITIES OF STUDIES EVALUATING DIAGNOSTIC TESTS Small Sample Size, with no description of Confidence Intervals: 76% Studies Patient Characteristics not described: 68% Studies Possible Interactions or Effect Modification Ignored: 88% Studies Only two (6%) of 34 articles published from 1990-1993 (N Engl J Med, JAMA, Lancet, BMJ) met six or more of the Standards. BE-Workshop-DT-July2007

How to use an Article about a Diagnostic Test? USERS GUIDES TO THE MEDICAL LITERATURE How to use an Article about a Diagnostic Test? Are the results of the study valid? What are the results and will they help me in caring for my patients? BE-Workshop-DT-July2007

Methodological Questions for Appraising Journal Articles about Diagnostic Tests 1. Was there an independent, ‘blind’ comparison with a ‘gold’ standard’ of diagnosis? 2. Was the setting for the study as well as the filter through which the study patients passed, adequately described? 3. Did the patient sample include an appropriate spectrum of disease? 4. Have they done analysis of the pertinent subgroups 5. Where the tactics for carrying out the test described in sufficient detail to permit their exact replication?

6. Was the reproducibility of the test result (precision) and its interpretation (observer variation) determined? 7. Was the term ‘ normal’ defined sensibly? 8. Was precision of the test statistics given? 9. Was the indeterminate test results presented? 10. If the test is advocated as a part of a cluster or sequence of tests, was its contribution to the overall validity of the cluster or sequence determined? 11. Was the ‘ utility’ of the test determined?

Thank you