Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Diagnostic Test Accuracy Prima Conferinta a Societatii Nationale de Metodologie si Statistica Medicala Professor Ario Santini MD,BDS,DDS,PhD,FDS,FFGDP,DGDP(UK),DipFMed,FADM.

Similar presentations


Presentation on theme: "Introduction to Diagnostic Test Accuracy Prima Conferinta a Societatii Nationale de Metodologie si Statistica Medicala Professor Ario Santini MD,BDS,DDS,PhD,FDS,FFGDP,DGDP(UK),DipFMed,FADM."— Presentation transcript:

1 Introduction to Diagnostic Test Accuracy Prima Conferinta a Societatii Nationale de Metodologie si Statistica Medicala Professor Ario Santini MD,BDS,DDS,PhD,FDS,FFGDP,DGDP(UK),DipFMed,FADM. The University of Medicine & Pharmacy, Tg-Mures, Rm. Hon Fellow. The University of Edinburgh

2 Learning objectives To define what is meant by test accuracy
Learning objectives To define what is meant by test accuracy To understand the basic study design for evaluating test accuracy To understand the meaning of Sensitivity, Specificity, Positive Predictive Value and Negative Predictive Value, and how to evaluate them numerically. To define what is meant by test accuracy Know the basic primary study design for evaluating test accuracy Sensitivity, Specificity, Positive Predictive Value and Negative Predictive Value

3 A DIAGNOSTIC TEST compares a NEW TEST for diagnosing a condition with a GOLD standard
TEST POPULATION Positive Negative Positive Negative Diseased Gold Standard NEW TEST Not Diseased A good test will correctly identify patients with the condition (true positives) while MINIMISING the number of patients without the condition who also test positive (false positives). Similarly, it will correctly identify patients who do not have the condition [true negatives) and minimise the number of patients given negative results when they do have the condition [false negatives]. Screening tests look for the early signs of a disease in asymptomatic patients so that the disease can treated before it gets to an advanced stage. The acceptability of false-positive and false-negative results depends in part on the seriousness of the condition and its treatment. A false-positive result causes unnecessary anxiety for the patient and can lead to expensive, unpleasant or dangerous treatments that are not indicated. A false-negative result on the other hand can lull a patient into a false sense of security and other symptoms and signs of disease might be ignored. Blind Comparison Of results Validation Study

4 Introduction to Diagnostic Research
Terms used 2. EXPRESSING TEST ACCURACY NUMERICALLY Terms used 2. EXPRESSING TEST ACCURACY NUMERICALLY

5 Index test, target condition and reference standard
INDEX TEST: the test under evaluation for accuracy TARGET CONDITION: the condition under detection May be a pathologically defined condition (e.g. fracture) May simply be an indication for treatment (e.g. high blood pressure) REFERENCE STANDARD: the best available standard of identifying the target condition, against which the results of the index test will be compared. INDEX TEST: the test under evaluation, the test we want to know the accuracy of TARGET CONDITION: the condition we are trying to detect May be a pathologically defined condition (eg fracture) May simply be an indication for treatment (eg high blood pressure) REFERENCE STANDARD: the best way available of identifying the target condition, used to verify the results of the index test

6 What is test accuracy? A comparison between …..
The disease state (Target condition) estimated by a test of interest, (Index test) AND The best estimate of the true disease state.(Reference standard) It is an unequivocal acknowledgement that most tests make errors even if correctly performed TEST ACCURACY A comparison between ….. The disease state (target condition) estimated by a test of interest (“the index test”) AND The best estimate of the true disease state (“the reference standard”) It is an explicit recognition that most tests make errors even if correctly performed

7 Notes on reference standards
The accuracy of an INDEX TEST cannot be evaluated without a REFERENCE STANDARD There should be consensus that the REFERENCE STANDARD is more accurate than the INDEX TEST [At least at the commencement of a study] There may be more than one acceptable REFERENCE STANDARD that would be appropriate for use in a test accuracy study The REFERENCE STANDARD may comprise several tests A degree of pragmatism may be required when choosing an acceptable REFERENCE STANDARD The most accurate REFERENCE STANDARD may not be feasible or ethical, and less accurate methods may have to be used. The accuracy of an index test cannot be evaluated without a reference standard There should be consensus that the reference standard is more accurate than the index test There may be more than one acceptable reference standard that would be appropriate for use in a test accuracy study A degree of pragmatism may be required when choosing an acceptable reference standard (The most accurate reference standard may not be feasible or ethical, and less accurate methods may have to be used) The reference standard may comprise several pieces of information (several tests)

8 Information regarding test accuracy be useful in the following
Which patients could develop the disease [Predisposition] Which patients have asymptomatic disease [Screening] Which patients have symptomatic disease [Diagnosis ] How advanced is the disease Will the disease progress over time [Prognosis] Is a drug effective Is the disease controlled [Monitoring ] Has the disease recurred? [Relapse] Which patients could develop the disease [Predisposition] Which patients have asymptomatic disease [Screening] Which patients have symptomatic disease [Diagnosis ] How advanced is the disease Will the disease progress over time [Prognosis] Is a drug effective Is the disease controlled [Monitoring ] Has the disease recurred? [Relapse]

9 Introduction to Diagnostic Research
Methodology Did the patient sample include an appropriate spectrum of patients to whom the test will be applied? Was the REFERENCE STANDARD applied regardless of the INDEX TEST result? Was there was an independent and blind comparison between the REFERENCE STANDARD and the INDEX TEST. Studies should enrol either all eligible patients suspected of having the target condition during a specified period, or a random sample of those patients. The essential point is that investigators should have no freedom of choice as to which individual patients are or are not included. There is evidence that studies comparing patients with known disease with a control group without the condition tend to exaggerate diagnostic accuracy. Inappropriate exclusions may result in either overestimates (eg by excluding ‘difficult to diagnose’ patients) or underestimates (eg by excluding patients with ‘red flags’ suggesting presence of disease) of the degree of diagnostic accuracy. Patients included in the study should match the target population of the guideline in terms of severity of the target condition, demographic features, presence of differential diagnosis or co-morbidity, setting of the study and previous testing protocols. This is similar to the question of ‘blinding’ in intervention studies. The index test should always been done first, or by a separate investigator with no knowledge of the outcome of the reference test. Bias can be introduced if a threshold level is set after data has been collected. Any minimum threshold should be specified at the start of the trial. Variations in test technology, execution, or interpretation (eg use of a higher ultrasound transducer frequency) may affect estimates of diagnostic accuracy. Estimates of test accuracy are based on the assumption that the reference standard is 100% sensitive (=accurately diagnoses the target condition). This is the similar to question 2.1, but in this case relates to making sure the reference standard is applied without any prior knowledge of the outcome of previous tests. The definition of the target condition used when testing the reference standard may differ from that used by the NHS in Scotland. eg threshold levels used in laboratory cultures may differ. The index test and reference standard should be performed as close together in time as possible, otherwise changes in the patients condition is likely to invalidate the results. In some cases the choice of reference standard may be influenced by the outcome of the index test or the urgency of the need for diagnosis. Use of different reference standards is likely to lead to overestimates of both sensitivity and specificity. Not including all patients in the analysis may lead to bias as there may be some systematic difference between those lost to follow-up and those analysed. Rate the overall methodological quality of the study, using the following as a guide: High quality (++): Majority of criteria met. Little or no risk of bias. Results unlikely to be changed by further research. Acceptable (+): Most criteria met. Some flaws in the study with an associated risk of bias, Conclusions may change in the light of further studies. Low quality (0): Either most criteria not met, or significant flaws relating to key aspects of study design. Conclusions likely to change in the light of further studies.

10 Introduction to Diagnostic Research
2. Applicability Are the intended patients similar to the TARGET population? Is it possible to integrate the INDEX TEST into a clinical setting? Who will conduct the INDEX TEST in a clinical setting and who will interpret the results? Is the test affordable? Studies should enrol either all eligible patients suspected of having the target condition during a specified period, or a random sample of those patients. The essential point is that investigators should have no freedom of choice as to which individual patients are or are not included. There is evidence that studies comparing patients with known disease with a control group without the condition tend to exaggerate diagnostic accuracy. Inappropriate exclusions may result in either overestimates (eg by excluding ‘difficult to diagnose’ patients) or underestimates (eg by excluding patients with ‘red flags’ suggesting presence of disease) of the degree of diagnostic accuracy. Patients included in the study should match the target population of the guideline in terms of severity of the target condition, demographic features, presence of differential diagnosis or co-morbidity, setting of the study and previous testing protocols. The definition of the target condition used when testing the reference standard may differ from that used by the NHS in Scotland. eg threshold levels used in laboratory cultures may differ. Estimates of test accuracy are based on the assumption that the reference standard is 100% sensitive (=accurately diagnoses the target condition). Make sure the reference standard is applied without any prior knowledge of the outcome of previous tests.

11 Basic study design to assess test accuracy
Series of patients Index Test TEST UNDER EVALUATION Reference Standard BEST CURRENT WAY TO DETERMINE PRESENCE OR ABSENCE OF DISEASE calculate test accuracy Ideally BLIND and independent verification Of INDEX TEST results with a REFERENCE STANDARD Basic study design to assess test accuracy

12 Introduction to Diagnostic Research
Results EXPRESSING TEST ACCURACY NUMERICALLY Sensitivity Specificity Positive predictive value Negative predictive value Likelihood ratios Pre-test probability and odds Post-test probability and odds Receiver operating curve EXPRESSING TEST ACCURACY NUMERICALLY

13 Characteristics of a test 2 x2 table
The subject must take TWO diagnostic tests 1. REFERENCE TEST [The Gold Standard] 2. INDEX TEST [NEW test] Validation Study The results of the comparison of a diagnostic test with a gold-standard test need to be tabulated in a 2 x 2 table, (also known as a 2 x 2 matrix) If the values for the various features of a test (such as sensitivity and specificity) fell within reasonable limits, we would be able to say 'that the test was valid. Note that each subject needs to take two diagnostic tests – the gold-standard test the new test.

14 2 x2 table [binary classification]
Reference test Results Disease Present Disease Absent 2 x2 table [binary classification]   Disease Status Index test results + True Positives False Positives - False Negatives The four possible outcomes of cross classification are represented in the diagnostic 2x2 contingency table. FALSE POSITIVE is an error in DATA REPORTING in which a test result incorrectly indicates presence of a condition, such as a disease (the result is positive), when in reality it is not. FALSE NEGATIVE is an error in which a test result incorrectly indicates no presence of a condition (the result is negative), when in reality it is present. The four possible outcomes of cross classification are represented in the diagnostic 2x2 contingency table. FALSE POSITIVE is an error in DATA REPORTING in which a test result incorrectly indicates presence of a condition, such as a disease (the result is positive), when in reality it is not. FALSE NEGATIVE is an error in which a test result incorrectly indicates no presence of a condition (the result is negative), when in reality it is present.

15 TEST CHARACTERISTICS Likelihood ratio for a positive test result (LR+)
Sensitivity (true-positive rate) Specificity (true-negative rate) Positive predictive value (PPV) Negative predictive value (NPV) Likelihood ratio for a positive test result (LR+) Likelihood ratio for a negative test result (LR-) Accuracy of a test TEST CHARACTERISTICS

16 - Index Test + Reference Test Results True Positives False Positiv es
Disease Present Disease Absent   Index Test Results. Disease + True Positives False Positiv es - False Negatives There are a number of words and phrases used to describe the characteristics of a diagnostic test Each of these values should be calculated.

17 (true-negative rate) The proportion of subjects who
Sensitivity (true-positive rate) The proportion of subjects with the disorder (by Reference Test) who have a positive result (Index Test) a/a + c Specificity (true-negative rate) The proportion of subjects who do not hove the disorder (by Reference Test) and who have a negative Index Test result. d/b+ d Sensitivity (true-positive rate) The proportion of subjects with the disorder (by gold standard) who have a positive result (by new test) a/a + c Specificity (true-negative rate) The proportion of subjects who do not hove the disorder and who hove a negative test d/b+ d

18 Positive predictive value (PPV)
The proportion of subjects with a positive test result who do have the disorder a/a + b Negative predictive value (NPV) The proportion of subjects with a negative test result who do not have the disorder d/c + d Positive predictive value (PPV) The proportion of subjects with a positive test result who do hove the disorder a/a + b Negative predictive value (NPV) The proportion of subjects with a negative test result who do not have the disorder d/c + d

19 Likelihood ratio for a positive test result (LR+)
sensitivity 1 - specifity Likelihood ratio for a positive test result (LR+) How much more likely will a positive test be found in a person with, as opposed to without , the condition? Sensitivity/1-Specificity Likelihood ratio for a negative test result (LR-) How much more likely will a negative test be found in a person with, as opposed to without, the condition? Specificity/1-Sensitivity Likelihood ratio for a positive test result (LR+) How much more likely is a positive test to be found in a person with, as opposed to without , the condition? Sensitivity/1-Specificity Likelihood ratio for a negative test result (LR-) How much more likely is a negative test to be found in a person with, as opposed to without, the condition? Specificity/1-Sensitivity

20 The proportion of subjects given the correct results (a + d)
Accuracy of a test The proportion of subjects given the correct results (a + d) (a + b + c + d) Accuracy of a test The proportion of subjects given the correct results (a + d) (a + b + c + d)

21 UNDERSTANDING THE RESULTS

22 UNDERSTANDING THE RESULTS
Sensitivity, Specificity and Predictive values can be confusing. If a patient has a disorder, what is the chance of getting a positive result with the INDEX TEST?  Sensitivity If a patient has a positive result with the INDEX TEST, what is the chance that they do have the disorder?  Positive Predictive Value: The following statements clarify the difference between sensitivity and positive predictive 'value: Sensitivity, specificity and predictive values can be confusing. Sensitivity: If a patient has a disorder, what is the chance of getting a positive result on the new test? Positive predictive value: If a patient has a positive result with the new test, what is the chance that they do have the disorder?

23 UNDERSTANDING THE RESULTS
If a patient does not have the disorder, what is the chance of getting a negative result on the INDEX TEST?  Specificity If a patient has a negative result with the INDE TEST, what is the chance that they do not have the disorder?  Negative predictive value: The following statements clarify the difference between specificity and negative predictive value: Specificity: If a patient does not have the disorder, what is the chance of getting a negative result on the new test? Negative predictive value: If a patient has a negative result on the new test, what is the chance that they do not have the disorder?

24 UNDERSTANDING THE RESULTS
SpPin - when a highly specific test is used, a positive test result tends to rule in the disorder. SnNout - when a highly sensitive test is used, a negative test result tends to rule out the disorder. Sensitivity and specificity are not affected by changes in the prevalence of the disorder. Prevalence: The proportion of individuals in a population having a disease or characteristic. Prevalence refers to the number of cases of a disease that are present in a particular population at a given time, Incidence refers to the number of new cases that develop in a given period of time.

25 Positive Predictive Value Negative Predictive Value What are they?
PPV = Positive Predictive Value The proportion of those who test positive with the INDEX TEST really have disease? NPV = Negative Predictive Value The proportion of those who test negative with the INDEX TEST really do not have disease? PPV and NPV What are they

26 UNDERSTANDING THE RESULTS
Predictive values depend on the PREVALENCE of the disorder. As the PREVALENCE of a disease in a population increases Positive predictive value will increase. Negative predictive value will decrease. To decide whether predictive values are applicable to a particular population, it is necessary to know where a diagnostic study was conducted. The results are only applicable when the prevalence of the disorder is the same. Predictive values depend on the prevalence of the disorder. As the prevalence of a disorder in the population goes up Positive predictive value will increase. Negative predictive value will decrease.

27 UNDERSTANDING THE RESULTS
Likelihood ratio are often more useful than predictive values. Likelihood ratios are calculated from sensitivity and specificity They remain constant even when the PREVALENCE of the disorder changes. [cf. predictive values]. Likelihood ratios show how many times more likely patients with a disorder are to have a particular test result than patients without the disorder. The likelihood ratio for a positive test result should be high as possible above 1. Positive results are desirable in patients with the disorder. The likelihood ratio for a negative test result should be as low as possible below 1. Negative test results are undesirable in patients with the disorder.

28 UNDERSTANDING THE RESULTS
Likelihood ratio or Fagan nomogram The likelihood ratio nomogram (or 'Fagan nomogram') enables the post-test probability to be graphically calculated if the pre-test probability and likelihood ratio are known. If a line is drawn connecting the pre-test probability of disease and the likelihood ratio, it intersects at the post-test probability of disease when extended to the right.

29 UNDERSTANDING THE RESULTS
ROC Curve a receiver operating characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity or the sensitivity index d', known as "d-prime" in signal detection and biomedical informatics, or recall in machine learning. The false-positive rate is also known as the fall-out and can be calculated as (1 - specificity). The ROC curve is thus the sensitivity as a function of fall-out. In general, if the probability distributions for both detection and false alarm are known, the ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from  to ) of the detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability in x-axis.

30 Validation of a Index Test against a Reference Standard
Validation of a Diagnostic [Index] test

31 Test accuracy Sensitivity, Specificity
Disease Status by Reference test [GOLD STANDARD] Present Absent Disease Status by Diagnostic Test   Index Test + TP FP TP+FP - FN TN FN+TN TP+FN FP+TN TP+FP+ FN+TN calculating Positive Predictive Value and = Negative predictive Value Sensitivity Specificity TP/TP+ FN TN/FP+TN

32 Test accuracy Positive Predictive Value & Negative Predictive Value
Disease No Disease +ve - ve TP FN FP TN PPV TP / (TP+FP) NPV TN / (FN+TN) PPV: What proportion of those who test positive with the index test really have disease? NPV: What proportion of those who test negative with the index test really do not have disease? The nearer either PPV or NPV are to 1 or 100%, the better the test PPV: What proportion of those who test positive with the index test have the disease? NPV: What proportion of those who test negative with the index test do not have disease? The nearer either PPV or NPV are to 1 or 100%, the better the test

33 Relationship between Sensitivity, Specificity, PPV and NPV and Test errors
As Sensitivity increases, NPV increases and False Negative test errors decrease As Specificity increases, PPV increases and False Positive test errors decrease As sensitivity increases, NPV increases and False Negative test errors decrease As specificity increases, PPV increases and False Positive test errors decrease

34 Relationship of test accuracy to patient outcomes from testing
High’ sensitivity and ‘high’ negative predictive value: LESS FALSE NEGATIVES ‘High’ specificity and ‘high’ positive predictive value: LESS FALSE POSITIVES High’ sensitivity and ‘high’ negative predictive value: LESS FALSE NEGATIVES ‘High’ specificity and ‘high’ positive predictive value: LESS FALSE POSITIVES

35 The consequences of Negative and Positive Index test results
In most testing situations ONE OR OTHER of false negative or false positive test errors are more important.... maximizing Sensitivity / NPV will minimize false negative test errors maximizing Specificity/PPV will minimize false positive test errors? The consequences of negative and positive index test results In most testing situations ONE OR OTHER of false negative or false positive test errors are more important.... ………maximise sensitivity / NPV and minimise false negative test errors OR maximise specificity/PPV and minimise false positive test errors?

36 Test Errors: CT scan in acute appendicitis
Reference test Disease Present Absent Index Test + TP FP - FN TN FP: CT +ve but NO appendicitis present unnecessary surgery with associated risks Let us consider the con According to the Amer College Radiology, computed tomography is the most accurate imaging study for evaluating suspected acute appendicitis and alternative aetiologies of right lower quadrant pain. FP: CT +ve but NO appendicitis present unnecessary surgery with associated risks FN: CT -ve but appendicitis present delayed diagnosis with potential for adverse outcomes such as ruptured appendix; abscess formation; peritonitis positive urine glucose test is only 22% sensitive, which means that the test misses nearly four-fifths of true diabetics. In the presence of classic symptoms and a family history, the window-cleaner's baseline odds (pretest likelihood) of having ;he.lCcmrlidOw-clepretty high.rand theyjacecenlssehheed)' of having the condition are I; (the high and they are only reduced to about four-fifths of thiffter a likelihood ratio of a negative test, 0.78; see section 7.4) ajlearly single negative urine test. In view of his symptoms, this man 'hat as needs to undergo a more definitive test for diabetes. Note tte the \he definitions in Table 7.3 show, if the test had been positiv since exindow-cleaner would have good reason to be concerned'lod at pven though the test is not very sensitive (i.e. it is not gO(good picking up people with the disease), it is pretty specific (i.e. it is good at excluding people without the disease). FN: CT -ve but appendicitis present delayed diagnosis with potential for adverse outcomes such as ruptured appendix; abscess formation; peritonitis

37 SUMMARY Test accuracy is comparison between the disease state estimated by a test of interest (INDEX Test) and the best estimate of the true disease state provided by (REFERENCE STANDARD) Interpretation of numerical test accuracy metrics requires consideration of the number and consequences of test errors. In order to decide which dimension of test accuracy is more important in a testing situation the consequence of being an INDEX TEST positive or an INDEX TEST negative need to be considered. Test accuracy is comparison between the disease state estimated by a test of interest (“the index test”) and the best estimate of the true disease state provided by (“the reference standard”) Interpretation of numerical test accuracy summary metrics requires consideration of the number and consequences of test errors. In order to decide which dimension of test accuracy is more important in a testing situation the consequence of being an index test positive or an index test negative need to be considered.

38 Results of a Validation Study
An Exercise you can complete

39 Results of a Validation Study
6 7 13 In this diagram Numbers have been added . Remember the previous slide . 13 21 966 987 987 1000 1000 27 973

40 • Positive predictive value = a/a+b = 6/13 = 46.2%
Results of a Validation Study Use the data to calculate the following features of the INDEX test • Sensitivity = a/a+c = 6/27 = 22.2% • Specificity = d/b+d =966/973 = 99.3% • Positive predictive value = a/a+b = 6/13 = 46.2% • Negative predictive value =d/c+d = 966/987 = 97.9% • Accuracy = (a+d)/(a+b+c+d) = 972/1000 = 97.2% Likelihood ratio of a positive test = sensitivity/(l - specificity) = 22.2/0.7 = 32 • Likelihood ratio of a negative nest = (1 - sensitivity)/specificity = 77.8/99.3 = 0.78 I have retained these slides for your future use . The PDf is available and this will aid you in calculation Test Characteristics in the future . positive urine glucose test is only 22% sensitive, which means that the test misses nearly four-fifths of true diabetics. In the presence of classic symptoms and a family history, the window-cleaner's baseline odds (pretest likelihood) of having ;he.lCcmrlidOw-clepretty high.rand theyjacecenlssehheed)' of having the condition are I; (the high and they are only reduced to about four-fifths of thiffter a likelihood ratio of a negative test, 0.78; see section 7.4) ajlearly single negative urine test. In view of his symptoms, this man 'hat as needs to undergo a more definitive test for diabetes. Note tte the \he definitions in Table 7.3 show, if the test had been positiv since exindow-cleaner would have good reason to be concerned'lod at pven though the test is not very sensitive (i.e. it is not gO(good picking up people with the disease), it is pretty specific (i.e. it is good at excluding at excluding people without the disease).

41 Useful sites Cochrane Systematic Reviews of Diagnostic Test Accuracy (DTA) are published in the Cochrane Database of Systematic A complete list of published DTA protocols & reviews is available. dta.cochrane.org/

42 Useful sites https://www.nice.org.uk/ NICE guidelines.
Diagnostics Assessment Programme The aims of the Programme are: to promote the rapid and consistent adoption of innovative clinically and cost-effective diagnostic technologies in the NHS to improve treatment choice or the length and quality of life by evaluating diagnostic technologies that have the potential to improve key clinical decisions to improve the efficient use of NHS resources by evaluating diagnostic technologies that have the potential to improve systems and processes for the delivery of health and social care.

43 Useful sites http://www.stard-statement.org/ STARD statement
The objective of the STARD initiative (STAndards for the Reporting of Diagnostic Accuracy studies) is to improve the accuracy and completeness of reporting of studies of diagnostic accuracy. The STARD statement consists of a checklist of 25 items.

44 If you have been, thanks for listening
01/11/2015


Download ppt "Introduction to Diagnostic Test Accuracy Prima Conferinta a Societatii Nationale de Metodologie si Statistica Medicala Professor Ario Santini MD,BDS,DDS,PhD,FDS,FFGDP,DGDP(UK),DipFMed,FADM."

Similar presentations


Ads by Google