DIAGNOSTIC TESTS Assist. Prof. E. Çiğdem Kaspar Yeditepe University Faculty of Medicine Department of Biostatistics and Medical Informatics.

Slides:

Advertisements

Similar presentations

Validity and Reliability of Analytical Tests. Analytical Tests include both: Screening Tests Diagnostic Tests.

Advertisements

Diagnostic Test Studies Tran The Trung Nguyen Quang Vinh.

Lecture 3 Validity of screening and diagnostic tests

TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.

Module 6 “Normal Values”: How are Normal Reference Ranges Established?

Diagnostic Tests Patrick S. Romano, MD, MPH Professor of Medicine and Pediatrics Patrick S. Romano, MD, MPH Professor of Medicine and Pediatrics.

Critically Evaluating the Evidence: diagnosis, prognosis, and screening Elizabeth Crabtree, MPH, PhD (c) Director of Evidence-Based Practice, Quality Management.

Receiver Operating Characteristic (ROC) Curves

Estimation of Sample Size

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Epidemiology in Medicine Sandra Rodriguez Internal Medicine TTUHSC.

Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Concept of Measurement

Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.

Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.

Today Concepts underlying inferential statistics

By Dr. Ahmed Mostafa Assist. Prof. of anesthesia & I.C.U. Evidence-based medicine.

Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.

Judgement and Decision Making in Information Systems Diagnostic Modeling: Bayes’ Theorem, Influence Diagrams and Belief Networks Yuval Shahar, M.D., Ph.D.

Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.

Critiquing for Evidence-based Practice: Diagnostic and Screening Tests M8120 Columbia University Fall 2001 Suzanne Bakken, RN, DNSc.

Statistics in Screening/Diagnosis

BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.

Multiple Choice Questions for discussion

Medical decision making. 2 Predictive values 57-years old, Weight loss, Numbness, Mild fewer What is the probability of low back cancer? Base on demographic.

Non-Traditional Metrics Evaluation measures from the Evaluation measures from the medical diagnostic community medical diagnostic community Constructing.

Vanderbilt Sports Medicine How to practice and teach EBM Chapter 3 May 3, 2006.

Dr K N Prasad Community Medicine

Screening and Diagnostic Testing Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego State University.

Evaluation of Diagnostic Tests

EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.

+ Clinical Decision on a Diagnostic Test Inna Mangalindan. Block N. Class September 15, 2008.

Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.

CpSc 810: Machine Learning Evaluation of Classifier.

MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.

Appraising A Diagnostic Test

Likelihood 2005/5/22. Likelihood  probability I am likelihood I am probability.

Evidence-Based Medicine Diagnosis Component 2 / Unit 5 1 Health IT Workforce Curriculum Version 1.0 /Fall 2010.

1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.

Evaluating Results of Learning Blaž Zupan

Fundamentals of Clinical Research for Radiologists Presented by: Reema Al-Shawaf.

Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Screening of diseases Dr Zhian S Ramzi Screening 1 Dr. Zhian S Ramzi.

Screening and its Useful Tools Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.

1 Wrap up SCREENING TESTS. 2 Screening test The basic tool of a screening program easy to use, rapid and inexpensive. 1.2.

Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.

Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.

Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.

Screening.  “...the identification of unrecognized disease or defect by the application of tests, examinations or other procedures...”  “...sort out.

10 May Understanding diagnostic tests Evan Sergeant AusVet Animal Health Services.

EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.

Design of Clinical Research Studies ASAP Session by: Robert McCarter, ScD Dir. Biostatistics and Informatics, CNMC

BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.

Laboratory Medicine: Basic QC Concepts M. Desmond Burke, MD.

Evaluation of Diagnostic Tests & ROC Curve Analysis PhD Özgür Tosun.

ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.

Timothy Wiemken, PhD MPH Assistant Professor Division of Infectious Diseases Diagnostic Tests.

PTP 560 Research Methods Week 12 Thomas Ruediger, PT.

SCREENING FOR DISEASE. Learning Objectives Definition of screening; Principles of Screening.

© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.

Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.

Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.

Some epidemiological principles and methods

Diagnostic Test Studies

How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.

Class session 7 Screening, validity, reliability

Comunicación y Gerencia

Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor

Evidence Based Diagnosis

Presentation transcript:

DIAGNOSTIC TESTS Assist. Prof. E. Çiğdem Kaspar Yeditepe University Faculty of Medicine Department of Biostatistics and Medical Informatics

Why we need a diagnostic test?  We need “information” to make a decision  “Information” is usually a result from a test  Medical tests:  To screen for a risk factor (screen test)  To diagnosse a disease (diagnostic test)  To estimate a patient’s prognosis (pronostic test)  When and in whom, a test should be done?  When “information” from test result have a value.

Value of a diagnostic test  The ideal diagnostic test:  Always give the right answer:  Positive result in everyone with the disease  Negative result in everyone else  Be quick, safe, simple, painless, reliable & inexpensive  But few, if any, tests are ideal.  Thus there is a need for clinically useful substitutes

Is the test useful ?  Reproducibility (Precision)  Accuracy (compare to “gold standard”)  Feasibility  Effects on clinical decisions  Effects on Outcomes

Determining Usefulness of a Medical Test QuestionPossible DesignsStatistics for Results 1. How reproducible is the test? Studies of: - intra- and inter observer & - intra- and inter laboratory variability Proportion agreement, kappa, coefficient of variance, mean & distribution of differences (avoid correlation coefficient)

Determining Usefulness of a Medical Test QuestionPossible DesignsStatistics for Results 2. How accurate is the test? Cross-sectional, case- control, cohort-type designs in which test result is compared with a “gold standard” Sensitivity, specificity, PV+, PV-, ROC curves, LRs

Determining Usefulness of a Medical Test QuestionPossible Designs Statistics for Results 3. How often do test results affect clinical decisions? Diagnostic yield studies, studies of pre- & post test clinical decision making Proportion abnormal, proportion with discordant results, proportion of tests leading to changes in clinical decisions; cost per abnormal result or per decision change

Determining Usefulness of a Medical Test QuestionPossible Designs Statistics for Results 4. What are the costs, risks, & acceptability of the test? Prospective or retrospective studies Mean cost, proportions experiencing adverse effects, proportions willing to undergo the test

Determining Usefulness of a Medical Test QuestionPossible DesignsStatistics for Results 5. Does doing the test improve clinical outcome, or having adverse effects? Randomized trials, cohort or case-control studies in which the predictor variable is receiving the test & the outcome includes morbidity, mortality, or costs related either to the disease or to its treatment Risk ratios, odd ratios, hazard ratios, number needed to treat, rates and ratios of desirable and undesirable outcomes

Common Issues for Studies of Medical Tests  Spectrum of Disease Severity and Test Results:  Difference between Sample and Population?  Almost tests do well on very sick and very well people.  The most difficulty is distinguishing Healthy & early, presymtomatic disease.  Subjects should have a spectrum of disease that reflects the clinical use of the test.

Common Issues for Studies of Medical Tests  Sources of Variation:  Between patients  Observers’ skill  Equipments => Should sample several different institutions to obtain a generalizable result.

Common Issues for Studies of Medical Tests  Importance of Blinding: (if possible)  Minimize observer bias  Ex. Ultrasound to diagnose appendicitis (It is different to clinical practice)

Studies of the Accuracy of Tests  Does the test give the right answer?  “Tests” in clinical practice:  Symptoms  Signs  Laboratory tests  Imagine tests To find the right answer. “Gold standard” is required

How accurate is the test?  Validating tests against a gold standard:  New tests should be validated by comparison against an established gold standard in an appropriate subjects  Diagnostic tests are seldom 100% accurate (false positives and false negatives will occur)

Describing the performance of a new diagnostic test Physicians are often faced with the task of evaluation the merit of a new diagnostic test. An adequate critical appraisal of a new test requires a working knowledge of the properties of diagnostic tests and the mathematical relationships between them.

The gold standard test: Assessing a new diagnostic test begins with the identification of a group of patients known to have the disorder of interest, using an accepted reference test known as the gold standard. Limitations: 1) The gold standard is often the most risky, technically difficult, expensive, or impractical of available diagnostic options. 2) For some conditions, no gold standard is available.

The basic idea of diagnostic test interpretation is to calculate the probability a patient has a disease under consideration given a certain test result. A 2 by 2 table can be used for this purpose. Be sure to label the table with the test results on the left side and the disease status on top as shown here: TestDisease PresentAbsent Positive True Positive False Positive Negative False Negative True Negative

The sensitivity The sensitivity of a diagnostic test is the probability that a diseased individual will have a positive test result. Sensitivity is the true positive rate (TPR) of the test. Sensitivity = P(T + |D + )=TPR = TP / (TP+FN)

The specificity The specificity of a diagnostic test is the probability that a disease-free individual will have a negative test result. Specificity is the true negative rate (TNR) of the test. Specificity=P(T - |D - ) = TNR =TN / (TN + FP).

False-positive rate: False-positive rate: The likelihood that a nondiseased patient has an abnormal test result. FPR = P(T + |D-)= = FP / (FP+TN)

False-negative rate: False-negative rate: The likelihood that a diseased patient has a normal test result. FNR = P(T - |D + )= = FN / (FN+TP)

Pretest Probability is the estimated likelihood of disease before the test is done. It is the same thing as prior probability and is often estimated. If a defined population of patients is being evaluated, the pretest probability is equal to the prevalence of disease in the population. It is the proportion of total patients who have the disease. P(D + ) = (TP+FN) / (TP+FP+TN+FN)

predictive value Sensitivity and specificity describe how well the test discriminates between patients with and without disease. They address a different question than we want answered when evaluating a patient, however. What we usually want to know is: given a certain test result, what is the probability of disease? This is the predictive value of the test.

Predictive value of a positive test Predictive value of a positive test is the proportion of patients with positive tests who have disease. PVP=P(D + |T + ) = TP / (TP+FP) posttest This is the same thing as posttest probability of disease given a positive test. It measures how well the test rules in disease.

Predictive value of a negative test Predictive value of a negative test is the proportion of patients with negative tests who do not have disease. In probability notation: PVN = P(D - |T - ) = TN / (TN+FN) It measures how well the test rules out disease. This is posttest probability of non-disease given a negative test.

Evaluating a 2 by 2 table is simple if you are methodical in your approach. TestDisease PresentAbsent PositiveTPFPTotal positive NegativeFNTNTotal negative Total with disease Total without disease Grand total

Bayes’ Rule Method Bayes’ rule is a mathematical formula that may be used as an alternative to the back calculation method for obtaining unknown conditional probabilities such as PVP or PVN from known conditional probabilities such as sensitivity and specificity. General form of Bayes’ rule is Using Bayes’ rule, PVP and PVN are defined as

Example Example The following table summarizes results of a study to evaluate the dexamethasone suppression test (DST) as a diagnostic test for major depression. The study compared results on the DST to those obtained using the gold standard procedure (routine psychiatric assessment and structured interview) in 368 psychiatric patients. 1.What is the prevalence of major depression in the study group? 2.For the DST, determine a-Sensitivity and specificity b-False positive rate (FPR) and false negative rate (FNR) c-Predictive value positive (PVP) and predictive value negative (PVN)

DST Result DepressionTotal Total Sensitivity = P(T + |D + )=TPR=TP/(TP+FN)=84/215=0.391 Specificity=P(T - |D - )=TNR=TN / (TN + FP)=148/153=0.967 FPR = P(T + |D - )=FP/(FP+TN)=5/153=0.033 FNR = P(T - |D + )=FN/(FN+TP)=131/215=0.609 PVN = P(D - |T - ) = TN / (TN+FN)=148/279=0.53 PVP=P(D + |T + ) = TP / (TP+FP)=84/89=0.944

FNR=1-Sensitivity= =0.609 FPR=1-Specificity= =0.033

Validating tests against a gold standard  A test is valid if:  It detects most people with disorder (high Sen)  It excludes most people without disorder (high Sp)  a positive test usually indicates that the disorder is present (high PV+)  The best measure of the usefulness of a test is the LR: how much more likely a positive test is to be found in someone with, as opposed to without, the disorder

ROC (Receiver Operating Characteristic ) CURVE  We want to be able to compare the accuracy of diagnostic tests.  Sensitivity and specificity are candidate measures for accuracy, but have some problems, as we’ll see.  ROC curves are an alternative measure We plot sensitivity against 1 – specificity to create the ROC curve for a test

ROC (Receiver Operating Characteristic ) CURVE The ROC Curve is a graphic representation of the relationship between sensitivity and specificity for a diagnostic test. It provides a simple tool for applying the predictive value method to the choice of a positivity criterion. ROC Curve is constructed by plottting the true positive rate (sensitivity) against the false positive rate (1-specificty) for several choices of the positivity criterion.

Plotting the ROC curve is a popular way of displaying the discriminatory accuracy of a diagnostic test for detecting whether or not a patient has a disease or condition. ROC methodology is derived from signal detection theory [1] where it is used to determine if an electronic receiver is able to satisfactory distinguish between signal and noise. It has been used in medical imaging and radiology, psychiatry, non-destructive testing and manufacturing, inspection systems.

Specific Example Test Result Pts with disease Pts without the disease

Test Result Call these patients “negative”Call these patients “positive” Threshold

Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease True Positives Some definitions...

Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False Positives

Test Result Call these patients “negative”Call these patients “positive ” without the disease with the disease True negatives

Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False negatives

Test Result without the disease with the disease ‘‘-’’‘‘+’’ Moving the Threshold: right

Test Result without the disease with the disease ‘‘-’’‘‘+’’ Moving the Threshold: left

True Positive Rate (sensitivity) 0% 100% False Positive Rate (1-specificity) 0% 100% ROC curve

RECEIVER OPERATING CHARACTERISTIC (ROC) curve  ROC curves (Receiver Operator Characteristic)  Ex. SGPT and Hepatitis 1-Specificity Sensitivity 1 1 SGPT D + D - Sum < > Sum

True Positive Rate 0%0% 100% False Positive Rate 0%0% 100% True Positive Rate 0% 100% False Positive Rate 0%0% 100% A good test: A poor test: ROC curve comparison

Best Test: Worst test: True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % The distributions don’t overlap at all The distributions overlap completely ROC curve extremes

‘Classical’ estimation  Binormal model:  X ~ N(0,1) in nondiseased population  X ~ N(a, 1/b) in diseased population  Then ROC(t) = (a + b -1 (t)) for 0 < t < 1  Estimate a, b by ML using readings from sets of diseased and nondiseased patients

ROC curve estimation with continuous data  Many biochemical measurements are in fact continuous, e.g. blood glucose vs. diabetes  Can also do ROC analysis for continuous (rather than binary or ordinal) data  Estimate ROC curve (and smooth) based on empirical ‘survivor’ function (1 – cdf) in diseased and nondiseased groups  Can also do regression modeling of the test result  Another approach is to model the ROC curve directlyas a function of covariates

The most commonly used global index of diagnostic accuracy is the area under the ROC curve (AUC).

Area under ROC curve (AUC)  Overall measure of test performance  Comparisons between two tests based on differences between (estimated) AUC  For continuous data, AUC equivalent to Mann-Whitney U-statistic (nonparametric test of difference in location between two populations)

True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % AUC = 50% AUC = 90% AUC = 65% AUC = 100% True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % AUC for ROC curves

Examples using ROC analysis  Threshold selection for ‘tuning’ an already trained classifier (e.g. neural nets)  Defining signal thresholds in DNA microarrays (Bilban et al.)  Comparing test statistics for identifying differentially expressed genes in replicated microarray data (Lönnstedt and Speed)  Assessing performance of different protein prediction algorithms (Tang et al.)  Inferring protein homology (Karwath and King)

Homology Induction ROC

Example: One of the parameters which are evaluated for the diagnosis of CHD, is the value of “HDL/Total Cholesterol”. Consider a population consisting of 67 patients with CHD, 93 patients without CHD. The result of HDL/Total Cholesterol values of these two groups of patients are as follows. CHD+ Hdl/Total Cholestrol CHD- Hdl/Total Cholestrol 0,29 0,26 0,39 0,16. 0,25 0,36 0,30 0,20.

To construct the ROC Curve, we should find sensitivity and specificity for each cut off point. We have two alternatives to find these characteristics. Cross tables Normal Curve Descriptive Statistics HDL/Total Cholestrol,2926,066,16,52,2301,048,06,34 GROUP CHD- CHD+ Mean SD MinMax

If HDL/Total Cholestrol is less than or equal to 0,26, we classify this group into diseased. SensitivitySpecificity

Best cutoff point Let cutoff=0,171 Usually, the best cut-off point is where the ROC curve "turns the corner”

ROC Curve 1 - Seçicilik 1,0,9,8,7,6,5,4,3,2,10,0 Sensitivity 1,0,9,8,7,6,5,4,3,2,1 0,0 1-Specificity Cutoff=0.26 TPR=0.78 FPR=0.31 TNR=0.69 FNR=0.22