HSS4303B – Intro to Epidemiology Feb 8, 2010 - Agreement.

Slides:



Advertisements
Similar presentations
Validity and Reliability of Analytical Tests. Analytical Tests include both: Screening Tests Diagnostic Tests.
Advertisements

Lecture 3 Validity of screening and diagnostic tests
The Research Consumer Evaluates Measurement Reliability and Validity
Step 3: Critically Appraising the Evidence: Statistics for Diagnosis.
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
TUTORIAL SCREENING Dr. Salwa Tayel, Dr. A. Almazam, Dr Afzal Mahmood
SCREENING FOR DISEASE Nigel Paneth. THREE KEY MEASURES OF VALIDITY 1.SENSITIVITY 2.SPECIFICITY 3.PREDICTIVE VALUE.
GerstmanChapter 41 Epidemiology Kept Simple Chapter 4 Screening for Disease.
Concept of Measurement
1 Comunicación y Gerencia 18/4/2011Dr Salwa Tayel (Screening) بسم الله الرحمن الرحيم.
Screening. Screening refers to the application of a test to people who are as yet asymptomatic for the purpose of classifying them with respect to their.
Statistics for Health Care
Today Concepts underlying inferential statistics
Principles of Epidemiology Lecture 12 Dona Schneider, PhD, MPH, FACE
Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.
Screening Sherine Shawky, MD, Dr.PH Assistant Professor Public Health King Abdulaziz University College of Medicine
Measurement and Data Quality
Diagnosis Concepts and Glossary. Cross-sectional study The observation of a defined population at a single point in time or time interval. Exposure and.
David Yens, Ph.D. NYCOM PASW-SPSS STATISTICS David P. Yens, Ph.D. New York College of Osteopathic Medicine, NYIT l PRESENTATION.
Statistics in Screening/Diagnosis
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Lecture 4: Assessing Diagnostic and Screening Tests
HSS4303B Intro to Epidemiology Feb 4, 2010 – Screening Tests.
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Basic statistics 11/09/13.
Statistics for Health Care Biostatistics. Phases of a Full Clinical Trial Phase I – the trial takes place after the development of a therapy and is designed.
Principles and Predictive Value of Screening. Objectives Discuss principles of screening Describe elements of screening tests Calculate sensitivity, specificity.
Reliability of Screening Tests RELIABILITY: The extent to which the screening test will produce the same or very similar results each time it is administered.
Screening and Diagnostic Testing Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego State University.
Evaluation of Diagnostic Tests
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
CHP400: Community Health Program-lI Mohamed M. B. Alnoor Muna M H Diab SCREENING.
Evaluating Diagnostic Tests Payam Kabiri, MD. PhD. Clinical Epidemiologist Tehran University of Medical Sciences.
Appraising A Diagnostic Test
Instructor Resource Chapter 6 Copyright © Scott B. Patten, 2015.
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine January 2008.
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
Screening and its Useful Tools Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
Diagnostic Tests Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /7/20151.
1 Wrap up SCREENING TESTS. 2 Screening test The basic tool of a screening program easy to use, rapid and inexpensive. 1.2.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Evidence based medicine Diagnostic tests Ross Lawrenson.
1 Medical Epidemiology Interpreting Medical Tests and Other Evidence.
Diagnosis Examination(MMSE) in detecting dementia among elderly patients living in the community. Excel.
Chapter 13 Understanding research results: statistical inference.
PTP 560 Research Methods Week 12 Thomas Ruediger, PT.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Critical Appraisal Course for Emergency Medicine Trainees Module 5 Evaluation of a Diagnostic Test.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Diagnostic studies Adrian Boyle.
DR.FATIMA ALKHALEDY M.B.Ch.B;F.I.C.M.S/C.M
Clinical Epidemiology
Diagnostic Test Studies
When is the post-test probability sufficient for decision-making?
Reliability and Validity
Principles of Epidemiology E
Class session 7 Screening, validity, reliability
Understanding Results
Dr. Tauseef Ismail Assistant Professor Dept of C Med. KGMC
Comunicación y Gerencia
How do we delay disease progress once it has started?
Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor
Natalie Robinson Centre for Evidence-based Veterinary Medicine
Refining Probability Test Informations Vahid Ashoorion MD. ,MSc,
Computation of Post Test Probability
Presentation transcript:

HSS4303B – Intro to Epidemiology Feb 8, Agreement

CT resultCancer present Cancer absent Positive56532 negative Compute: Prevalence of cancer 44% Sensitivity & specificity 93.3% and 96.1% % of false positives 532/ (56+532) % of false negatives 4/( ) PV+ and PV- 9.5% and 100% Answers from Thursday’s Homework

Last Time… Screening Tests – Validity and Reliability – Specificity and Sensitivity – Pos Predictive Value and Neg Predictive Value

Screening test results Truly diseases (cases) Truly non- diseases Totals Positive (thinks it’s a case) aba+b Negative (thinks it’s not a case) cdc+d totalsa+cb +da+b+c+d Sensitivity = a/(a+c) Specificity = d/(b+d) PV+ = a/(a+b) PV- = d/(c+d)

Ultimately, What Do All These Indicators Want To Tell Us? “What is the likelihood is it that you have the disease?”

Likelihood Ratio A way of using the sensitivity and specificity of a test to see if a positive or negative result usefully changes the probability of having the disease Assesses the value of performing the screening test at all Who is this useful for?

Likelihood Ratio LR+ (positive likelihood ratio) – The probability of a positive test result for a person who really has the disease divided by the probability of a positive test result for someone who doesn’t really have the disease – i.e. “P(true positives)” / “P(false positives)” = sensitivity / (1 − specificity)

Likelihood Ratio LR- (negative likelihood ratio) – The probability of a negative test result for a person who really has the disease divided by the probability of a negative test result for someone who doesn’t really have the disease – i.e. “P(false negatives)” / “P(true negatives)” = (1 − sensitivity) / specificity

Screening test results Truly diseases (cases) Truly non- diseases Totals Positive (thinks it’s a case) aba+b Negative (thinks it’s not a case) cdc+d totalsa+cb +da+b+c+d Sensitivity = a/(a+c) Specificity = d/(b+d) PV+ = a/(a+b) PV- = d/(c+d) True positives True negatives False positives False negatives a d b c LR+ = P (true +ve)/ P(false +ve) =(a/(a+c)) / (b/(b+d)) =(a/(a+c))/(1-(d/(b+d)) =sensitivity / (1-specificity)

Interpreting the LR A likelihood ratio of >1 indicates the test result is associated with the disease A likelihood ratio <1 indicates that the result is associated with absence of the disease In other words – High LR+ means strong suspicion that a +ve test result means the person has the disease – Low LR- means strong suspicion that a –ve test result means the person doesn’t have disease – What about “1”?

Interpreting the LR Arbitrary cutoffs: – LR+ >10 means strong diagnostic value – LR- <0.1 means strong diagnostic value – (Some literature suggests 5 and 0.2 are more appropriate cutoffs) The likelihood ratio, which combines information from sensitivity and specificity, gives an indication of how much the odds of disease change based on a positive or a negative result

LR+ The smallest possible value of the LR+ is zero, when sensitivity is zero. The maximum possible value of the LR+ is infinity when the denominator is minimized (specificity = 1, so 1 - specificity = 0). LR+ = 1: indicates a test with no value in sorting out persons with and without the disease of interest, since the probability of a positive test result is equally likely for affected and unaffected persons.

LR- The smallest value of the LR – occurs when the numerator is minimized (sensitivity = 1, so 1 - sensitivity = 0), resulting in an LR– of zero. The largest value of the LR – occurs when the denominator is minimized (specificity = 0), resulting in an LR – of positive infinity. LR – = 1: indicates a test with no value in sorting out persons with and without the disease of interest, as the probability of a negative test result is equally likely among persons affected and unaffected with the disease of interest.

FNA test (fine needle aspiration) CancerNo cancerTotals +ve FNA ve FNA Totals Prevalence Sensitivity Specificity PV+ PV-

FNA test (fine needle aspiration) CancerNo cancerTotals +ve FNA ve FNA Totals Prevalence Sensitivity Specificity PV+ PV- 38% 93% 92% 88% 96% LR+ = sensitivity / (1-specificity) = 0.93 / (1-0.92) = <- FNA test has high diagnostic value

Probability of presence of disease Pretest probability of disease - the likelihood that a person has the disease of interest before the test is performed. Pretest odds of disease are defined as the estimate before diagnostic testing of the probability that a patient has the disease of interest divided by the probability that the patient does not have the disease of interest. Posttest odds of disease are defined as the estimate after diagnostic testing of the probability that a patient has the disease of interest divided by the probability that the patient does not have the disease of interest. Posttest probability of disease – the likelihood that a person has the disease of interest post the test is performed.

Pretest probability and pretest odds CancerNo cancer Mammography positive 14 True positives 8 False positives 22 Mammography negative 1 False negatives 91 True negatives Pretest probability = Pretest odds = pretest probability / (1-pretest probability) = = 0.15

Pretest probability and pretest odds CancerNo cancer Mammography positive 14 True positives 8 False positives 22 Mammography negative 1 False negatives 91 True negatives Pretest probability = 15/114 = 0.13 Pretest odds = pretest probability / (1-pretest probability) = 0.13/0.87 = 0.15

What does this have to do with LR? LR = post test odds / pre test odds So now we can compute the odds of having the disease after applying the test and computing LR

Pretest probability and pretest odds CancerNo cancer Mammography positive 14 True positives 8 False positives 22 Mammography negative 1 False negatives 91 True negatives Pretest odds = 0.15 Sensitivity = 93% Specificity = 92% Compute LR+ and LR-: LR+ = 0.93/0.08 = LR- = 0.07/0.92 = 0.08

So… Knowing pretest odds and LR+, what are the posttest odds ? (i.e., odds of having the disease after positive test result)? Post test odds = LR x pre=test odds = x 0.15 = 1.74 NB, textbook (p.99) multiplies by 0.15 and gets 1.76, which is wrong

And then…. Can you now compute post-test probability? – (do you remember the difference between probability and odds?) Post test prob = post test odds / (1 -+ post test odds) = 1.74 / 2.74 = 0.64

LR vs PV Positive predictive value is the proportion of patients with positive test results who are correctly diagnosed. The likelihood ratio indicates the value of the test for increasing certainty about a positive diagnosis – Relates to a comparison between pre-test odds of having the disease vs post-test odds of having the disease LR+ = post-test odds / pre-test odds

LR vs PV Remember that PV varies with prevalence of the disease LR is independent of prevalence

CancerNo cancer Mammography positive 14 True positives 8 False positives 22 Mammography negative 1 False negatives 91 True negatives Pretest odds = 0.15 Sensitivity = 93% Specificity = 92% LR+ = LR- = 0.08 Post test odds = 1.74 Post test prob = 64% Similar thing can be done with LR-, but in general we don’t bother

Performance Yield True Disease Status + - Results of Screening Test + - Sensitivity:a / (a + c) = 400 / ( ) = 80% Specificity:d / (b + d) = / ( ) = 99% PV+:a / (a + b) = 400 / ( ) = 29% PV-:d / (c + d) = / ( ) = 99% Prevalence: (a+c)/(a+b+c+d) = 500/ = 0.5% LR+ = sens / (1-spec) = 0.8/(1-0.99) = 80

Comparing LR and PV True Disease Status + - Results of Screening Test + - PV+=29% LR+ = 80 Among persons who screen positive, 29% are found to have the disease. A positive test result increases your odds of having the disease by 80 fold

Homework #1 Geenberg p. 105, question 1-13: – Japanese patients underwent CT scans to detect first signs of cancer, then had pathology tests 2 years later to confirm whether or not they actually had cancer CT resultCancer present Cancer absent Positive56532 negative Compute: 1.LR+ 2.LR- 3.Pre-test probability of cancer 4.Pre-test odds of cancer 5.Post-test odds of cancer 6.Post-test probability of cancer (Answers are in the notes section of this slide)

What if you have a continuous variable? What kind of variableis cancer vs no cancer? What is a continuous diagnostic variable? Examples: – Body temperature – Blood pressure – Height – Weight – etc

Receiver Operator Curve (ROC) signal noise

Useful for comparing 2 diagnostic tests. The greater the area under the curve, the better signal-to-noise ratio and the better the test

Agreement See article on website called “Kappa.pdf”

Remember Reliability? The extent to which the screening test will produce the same or very similar results each time it is administered. Inter-rater reliability is “the variation in measurements when taken by a different persons but with the same method or instruments” Also called CONCORDANCE

Inter-rater Reliability Is a measurement of Agreement – A score of how much consensus there is among judges, observers, technicians or any number of people who are using the same instrument(s) to measure the same data. Eg: Judges scoring a beauty pageant contestant from 1-10 Several psychologists using a PTSD scale to assess a patient Different devices measuring body temperature simultaneously on same patient

How Do We Measure Agreement? Lots of stats available to us: – Inter-rater correlation coefficient – Intra-class correlation coefficient – Concordance correlation coefficient – Fleiss’s kappa – Cohen’s kappa

Kappa (κ) Cohen – Two raters Fleiss – Adaptation of Cohen, applicable to multiple raters Kappa is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance

Cohen’s Kappa Cohen the Barbarian

Cohen’s Kappa Κ = {Pr(a) – Pr(e)} / {1-Pr(e)} Pr(a) = relative observed agreement Pr(e) = prob that agreement is due to chance Results in a ratio from 0 to 1

Two Judges Decide Whether Or Not 75 Beauty Pageant Contestants Are Hot Judge #1 = Hasselhoff Judge #2 = Shatner

The Data JudgeYes They Are HotNo They Are NotTotals Hasselhoff41344 Shatner42731 Totals453075

The Data JudgeYes They Are HotNo They Are NotTotals Hasselhoff41344 Shatner42731 Totals Pr(a) = relative observed agreement = ( )/ 75 = 90.7%

The Data JudgeYes They Are HotNo They Are NotTotals Hasselhoff41344 Shatner42731 Totals Pr(a) = relative observed agreement = ( )/ 75 = 90.7% Pr(e) = prob that agreement is due to chance = (44x45/ (31x30)/75 2 = = 51.7% (multiply marginals and divide by total squared)

Compute Kappa K = [ Pr(a) – Pr(e) ] / 1 – Pr(e) = (0.907 – 0.517) / ( ) = 0.81 How do we interpret this?

Interpreting Kappa Hasselhoff and Shatner are in almost perfect agreement over who is hot and who is not.

What if….? There are >2 raters? There are >2 categories? – Eg, “ugly, meh, hmm, pretty hot, very hot, smokin’” – Eg, “don’t like, somewhat like, like” Then it is possible to apply kappa, but only to determine complete agreement. So? – Dichotomize variables – Weighted kappa

Homework #2 Compute Cohen’s Kappa in both cases and interpret. (The answers are in the notes section of this slide)

So When/Why Use Screening Tests? Basis for CriteriaCriteria Effect of morbidity and mortality on population Morbidity or mortality of the disease must be a sufficient concern to public health. A high-risk population must exist. Effective early intervention must be known to reduce morbidity or mortality. Screening testThe screening test should be sensitive and specific. The screening test must be acceptable to the target population. Minimal risk should be associated with the screening test. Diagnostic work-up for a positive test result must have acceptable morbidity given the number of false-positive results.