Epidemiology and Biostatistics 679: Clinical Epidemiology June 6-29, 2005 Instructors: Dr. Jean Bourbeau Dr. Dick Menzies

Epidemiology and Biostatistics 679: Clinical Epidemiology June 6-29, 2005 Instructors: Dr. Jean Bourbeau (jean.bourbeau@mcgill.ca) Dr. Dick Menzies (dick.menzies@mcgill.ca) Dr. Kevin Schwartzman (course coordinator; kevin.schwartzman@mcgill.ca) Research Offices: Respiratory Epidemiology and Clinical Research Unit Montreal Chest Institute K1 3650 St. Urbain

Course Objectives The general objective of this 3-credit course is to provide students with a basic understanding of the methods of epidemiology, as applied to clinical practice and clinical research. Specifically, we will address key principles of testing and measurement in the clinical context, as well as study design, analysis, and inference in the clinical research setting. Students will be encouraged to apply concepts covered in the course to their own areas of interest.

Course Materials Textbook: Fletcher, Clinical Epidemiology: The Essentials, 3 rd edition Course pack with supplemental readings from McGill bookstore Lecture notes, handouts, assignments from course website (www.mcgill.ca/epi- biostat/summer/courses) Journal articles on-line from Health Sciences Library (www.health.library.mcgill.ca)

Format Ten classroom sessions, from 1:30-4:45 Mondays, Wednesdays, and Fridays for four weeks (no class June 24 and July 1) Attendance at all sessions is mandatory. Students will be divided into teams of 3-4, for purposes of assignments and presentations (8 groups total)

Assignments Before each lecture, an assignment addressing key points of that day’s lecture will be distributed. During each classroom session, one team will give an oral presentation outlining its answers to the assignment on the topic of that day’s lecture. Over the month, all students will be expected to present in this fashion. The written assignments must be handed in (1 per team) at the beginning of the following lecture.

Assignments For lecture 1 (today: diagnostic tests and screening) the oral presentation of the assignment will be during lecture 2, with the written assignment due at the beginning of lecture 3 For lecture 2 (Wednesday, June 8: measurement issues) the oral presentation will also be during that class, with the written assignment due at the beginning of lecture 3 After that, there will be one oral presentation per classroom session, with the written assignment due at the beginning of the following session

Assignments The assignments will include questions about papers from the medical literature, which reflect issues addressed in the lectures With the exception of assignments 1 and 2, these papers will be selected by the group responsible for each oral presentation, and identified ahead of time so that all students in the class use the same paper. Papers should be available on-line through the health sciences library For example, the students responsible for the oral presentation on cohort studies will select a paper reporting a cohort study of interest to them.

Assignments For the final assignment, each group will hand in a summary (maximum 2 pages double- spaced) of an original proposed research protocol, addressing a clinical research question which group members consider relevant. Further details on content and format will be provided in class. These summaries will be presented by the groups in class on Monday, June 27 and handed in that day.

Final Exam A written final exam, in short-answer format, will be administered in class on Wednesday, June 29.

Grading Written homework assignments (8):20% Oral presentation of homework assignment:10% Written protocol summary:20% Oral presentation of protocol summary:10% Final exam:30% Class participation:10% TOTAL100%

Academic Integrity It is understood that assignments submitted by groups of students will include contributions of all group members; for such assignments, a single copy submitted with all group members’ names will be sufficient. However, we expect that each group will submit its own assignment, written separately from those of other groups. The same holds true for the protocol summaries. Where assignments cite others’ research work, appropriate references must be provided. Direct quotes from other writers should be indicated by quotation marks.

Academic Integrity III. ACADEMIC OFFENCES The integrity of University academic life and of the degrees the University confers is dependent upon the honesty and soundness of the teacher- student learning relationship and, as well, that of the evaluation process. Conduct by any member of the University community that adversely affects this relationship or this process must, therefore, be considered a serious offence. 15 Plagiarism (a) No student shall, with intent to deceive, represent the work of another person as his or her own in any academic writing, essay, thesis, research report, project or assignment submitted in a course or program of study or represent as his or her own an entire essay or work of another, whether the material so represented constitutes a part or the entirety of the work submitted. (b) Upon demonstration that the student has represented and submitted another person’s work as his or her own, it shall be presumed that the student intended to deceive; the student shall bear the burden of rebutting this presumption by evidence satisfying the person or body hearing the case that no such intent existed, notwithstanding Article 22 of the Charter of Student Rights. (c) No student shall contribute any work to another student with the knowledge that the latter may submit the work in part or whole as his or her own. Receipt of payment for work contributed shall be cause for presumption that the student had such knowledge; the student shall bear the burden of rebutting this presumption by evidence satisfying the person or body hearing the case that no such intent existed (notwithstanding Article 22 of the Charter of Students’ Rights). Downloaded and excerpted from A Handbook on Student Rights and Responsibilities, 2003, p. 17. Available on-line at http://upload.mcgill.ca/secretariat/greenbookenglish.pdfhttp://upload.mcgill.ca/secretariat/greenbookenglish.pdf Additional information is available at www.mcgill.ca/integrity/www.mcgill.ca/integrity/

#DateTopicsInstructor(s) 1Mon June 6Introduction, course overview Diagnostic tests, screening, prevention All Schwartzman 2Wed June 8Measurement issues: precision, validity, responsiveness; clinical scales/scores Bourbeau 3Fri June 10From clinical observations to research: hierarchy of study designs Planning and designing a first study Menzies Dr. S. Dial, MUHC 4Mon June 13Measures of disease occurrence, association; descriptive, cross-sectional and ecologic studies Menzies 5Wed June 15Cohort studies, survival analysis, selection biasMenzies 6Fri June 17Clinical trialsBourbeau 7Mon June 20Case-control studies Beginning your own clinical research Peer review process; protocol assignment Schwartzman/ Menzies 8Wed June 22Confounding, matching; analysis Inference and hypothesis testing Schwartzman Fri June 24HOLIDAY—NO CLASS 9Mon June 27Protocol summary presentations Exam review All 10Wed June 29Final examAll

Lecture 1 Topic: DIAGNOSTIC TESTS AND SCREENING Objectives Students will be able to: 1. Define and calculate the following: Sensitivity, specificity, positive and negative predictive values of diagnostic tests 2.Illustrate the influence of prevalence and/or pre-test probability on predictive values 3.Define pre- and post-test probabilities in terms of Bayes’ theorem and likelihood ratios 4.Identify key elements of screening programs and evaluations of their impact 5.Describe the impact of misclassification on results of clinical research studies

Diagnostic Tests and Screening Readings: Fletcher, chapters 1 (Introduction), 3 (Diagnosis), 8 (Prevention) Barry MJ, Prostate-specific antigen testing for early diagnosis of prostate cancer, N Engl J Med 2001; 344:1373-1377 [Clinical Practice] Hamm CW et al, Emergency room triage of patients with acute chest pain by means of rapid testing for cardiac troponin T or troponin I, N Engl J Med 1997; 337:1648-53 (for assignment)

Tests as diagnostic aids and screening tools - key element of clinical medicine and public health. Electrocardiogram, cardiac enzymes for diagnosis of myocardial infarction Murphy’s sign (right upper abdominal tenderness on inspiration) in diagnosis of acute cholecystitis Pap smear for detection of cervical cancer Also essential in many epidemiologic studies where diagnostic criteria and/or tests are used to establish exposure, outcome status. Goal is to minimize misclassification; yet some misclassification may be inevitable for logistical reasons Diagnostic Tests

Diagnostic Tests and Screening—Slide 2 Definitive diagnosis/classification may be difficult or impossible to obtain. “Gold standard” may be expensive, inappropriate (e.g. autopsy based) or unsuitable (e.g. clinical follow-up when immediate decision required). Tests may serve as surrogates but this requires that they be appropriately validated against a suitable gold standard - and that their properties be documented.

Diagnostic Tests and Screening--Slide 3 We will focus largely on the situation where the diagnosis/outcome and the test result are both dichotomous, i.e. Disease:Present vs. absent Test: Positive vs. negative We need to know how well the test separates those who have the disease of interest from those who do not.

Diagnostic Tests and Screening-- Slide 4 We can use a 2x2 table to describe the various possibilities: Disease + Disease - Test +True + False + Test -False - True - True positive rate = P(T + D + ) = TP/(TP+FN) = Sensitivity: The probability that a diseased individual will be identified as such by the test

Diagnostic Tests and Screening-- Slide 5 Disease +Disease - Test +True +False + Test -False -True - True negative rate= P(T - D - ) = TN/(TN+FP) = Specificity: The probability that an individual without the disease will be identified as such by the test

Diagnostic Tests and Screening-- Slide 6 Complementary probabilities: False negative rate = FN/(TP+FN) = P(T - D + ) = 1-sensitivity False positive rate = FP/(TN+FP) = P(T+ D - ) = 1-specificity

Diagnostic Tests and Screening-- Slide 7 Example: A researcher develops a new saliva pregnancy test. She collects samples from 100 women known to be pregnant by blood test (the gold standard) and 100 women known not be pregnant, also based on the same blood test. The saliva test is “positive” in 95 of the pregnant women. It is also “positive” in 15 of the non-pregnant women. What are the sensitivity and specificity?

Diagnostic Tests and Screening-- Slide 8 PregnantNon-pregnantTotals Saliva + 9515110 Saliva - 58590 Totals 100100200 Sensitivity = TP/(TP+FN) = 95/100 = 95% Specificity = TN/(TN+FP) = 85/100 = 85%

Diagnostic Tests and Screening-- Slide 9 Is it more important that a test be sensitive or specific? It depends on its purpose. A cheap mass screening test should be sensitive (few cases missed). A test designed to confirm the presence of disease should be specific (few cases wrongly diagnosed). Note that sensitivity and specificity are two distinct properties. Where classification is based on an cutpoint along a continuum, there is a tradeoff between the two.

Diagnostic Tests and Screening-- Slide 10 Example: The saliva pregnancy test detects progesterone. A refined version is developed. Suppose you add a drop of indicator solution to the saliva sample. It can stay clear (0 reaction) or turn green (1+), red (2+), or black (3+). (For purposes of discussion we will ignore overlapping colors)

Diagnostic Tests and Screening-- Slide 11 The researcher conducts a validation study and finds the following: PregnantNon-pregnantTotals Saliva 3+ 85590 Saliva 2+ 101020 Saliva 1+ 31720 Saliva 0 26870 Totals 100100200

Diagnostic Tests and Screening-- Slide 12 The sensitivity and specificity of the saliva test will depend on the definition of “positive” and “negative” used. If “positive”  1+, sensitivity = (85+10+3)/100 = 98% specificity = 68/100 = 68% If “positive”  2+, sensitivity = (85+16)/100 = 95% specificity = (68+17)/100 = 85% If “positive” = 3+, sensitivity = 85/100 = 85% specificity = (68+17+10)/100 = 95%

Diagnostic Tests and Screening-- Slide 13 The choice of cutpoint depends on the relative adverse consequences of false-negatives vs. false-positives. If it is most important not to miss anyone, use  sensitivity and  specificity. If it is most important that people not be erroneously labeled as having the condition, use  sensitivity and  specificity.

Diagnostic Tests and Screening-- Slide 14 In practice, the clinician or researcher needs to know how to interpret test results without the simultaneous gold standard measurement. (If you already know the “gold standard” result, why would you obtain the other test?) Hence we need to know: 1. How likely is a patient to have the condition of interest, given a “positive” test result? This is P(D + T + ), or the positive predictive value of the test [=TP/(TP+FP)] 2. How likely is a patient not to have the condition of interest, given a “negative” test result? This is P(D - T - ), or the negative predictive value of the test [=TN/(TN+FN)]

Diagnostic Tests and Screening-- Slide 15 Key point: The positive and negative predictive values depend on the pretest probability of the condition of interest - in addition to the sensitivity and specificity of the test. This pretest probability is often the prevalence of the condition in the population of interest. But it can also reflect restriction of this population based on clinical features and/or other test results. For example, the pretest probability of pregnancy will be very different among young women using oral contraceptives from that among sexually active young women using no form of contraception.

Diagnostic Tests and Screening-- Slide 16 Example: The saliva pregnancy test is administered 30 days after the first day of the last menstrual period to two groups of women who have thus far “missed” a period. Group 1: 1000 sexually active young women using no contraception. Pretest probability of pregnancy 40% (hypothetical) Based on sensitivity of 95%, expected TP = 400 x 0.95 = 380 expected FN = 400-380 = 20 Based on specificity of 85%, expected TN = 600 x 0.85 = 510 expected FP = 600-510 = 90 Pregnant Non-pregnantTotals Test +38090470 Test -20510530 Totals4006001000

Diagnostic Tests and Screening-- Slide 17 Positive predictive value = TP = 380/470 = 81% TP+FP In this context, a woman with a positive saliva test has an 81% chance of being pregnant. Negative predictive value = TN = 510/530 = 96% TN+FN In this context, a woman with a negative saliva test has a 96% chance of not being pregnant (and a 4% chance of being pregnant)

Diagnostic Tests and Screening-- Slide 18 Group 2: 1000 oral contraceptive users - pretest probability of pregnancy = 10% (hypothetical) Pregnant Non-pregnant Totals Test + 95135230 Test - 5765770 Totals 100 9001000 Using sensitivity = 95%, expected TP = 0.95 x 100 = 95 expected FN = 100-95 = 5 Using specificity = 85%, expected TN = 0.85 x 900 = 765 expected FP = 900-765 = 135

Diagnostic Tests and Screening-- Slide 19 In this context, positive predictive value is only 95/230 = 41% [TP/(TP+FP)] Negative predictive value is [TN/(TN+FN)] = 765/770 = 99%

Diagnostic Tests and Screening-- Slide 20 In which situation is the saliva test more helpful? Group 1: Test +: 81% probability of pregnancy Pretest probability 40% Test -: 4% probability of pregnancy Group 2: Test +: 41% probability Pretest probability 10% Test -: 1% probability

Diagnostic Tests and Screening-- Slide 21 Note that the same test would likely be used and interpreted very differently in these two contexts. This does not imply any difference in the characteristics of the test itself, i.e. sensitivity and specificity are not altered by the pretest probability of the condition of interest. Test are most useful when the pretest probability is in a middle range. They are unlikely to be useful when the pretest probability is already very high or low.

Diagnostic Tests and Screening-- Slide 22 Deriving predictive values (post-test probabilities) using a 2x2 table: 1. Fill in totals with/without disease based on pretest probabilities. In general these depend on external information about the population of interest and cannot be extrapolated from a validation study. 2. Fill in the positives and false negatives using sensitivity. - TP = Number with disease x sensitivity - FN = Number with disease x (1-sensitivity) 2. Fill in true negatives and false positives using specificity. - TN = Number free of disease x specificity - FP = Number free of disease x (1-specificity) 4. Calculate PPV = TP/(TP+FP) Calculate NPV = TN/(TN+FN)

Diagnostic Tests and Screening-- Slide 23 Bayes’ theorem: Allows us to calculate revised (“posterior” or post-test) probabilities, based on “prior” (pretest) probabilities and new information (here, test results). General form: P(B A) = P(A B) x P(B) P[(A B) x P(B)] + [P(A B) x P(B)] Note that B corresponds to “Not B”, so P(B) = 1 - P(B)

Diagnostic Tests and Screening-- Slide 24 For positive predictive value, P (D + T + ) = P (T + D + ) x P(D + ) [P(T + D + ) x P(D + )] + [P(T + D - ) x P(D - )] Note this is identical to TP TP+FP

Lecture 17 - DTESTS - Slide 25 For negative predictive value, P(D - T - ) = P(T - D - ) x P(D - ) [P(T - D - ) x P(D - )]+[P(T - D + )xP(D + )] which is equal to TN TN+FN

Diagnostic Tests and Screening-- Slide 26 Example: What would be the positive and negative predictive values for the saliva pregnancy test if the pretest probability of pregnancy is 20%? (sensitivity = 95%, specificity = 85%) P(pregnant T + ) = P(T + pregnant) x P(pregnant) [P(T + pregnant)xP(pregnant)]+[P(T + not pregnant)xP(not pregnant)] =0.95 x 0.2 = 0.19 = 0.61 or 61% (0.95x0.2)+(0.15x0.8) 0.19+0.12

Diagnostic Tests and Screening - Slide 27 P(not pregnant T - ) = P(T - not pregnant)xP(not pregnant) [P(T - not pregnant)xP(not pregnant)]+ [P(T - pregnant)xP(pregnant)] = 0.85 x 0.8 = 0.68 = 0.99 or 99% (0.85x0.8)+(0.05x0.2) 0.68+0.01

Diagnostic Tests and Screening - Slide 28 Likelihood Ratios An alternative way of developing post-test probabilities (predictive values) Relationship between pre- and post-test odds, where Odds = [probability of x]/[1-probability of x] –If pre-test probability of pregnancy is 20%, then odds of pregnancy = 0.2/(1-0.2) = 0.25 –Odds of no pregnancy = 0.8/(1-0.8) = 4 [the reciprocal] Probability = [odds of x]/[1+odds of x] –If prior odds of pregnancy = 0.25, then pre-test probability of pregnancy = 0.25/(1+0.25) = 0.2

Diagnostic Tests and Screening - Slide 29 Likelihood Ratios Post-test odds = pre-test odds x likelihood ratio, where Likelihood ratio = [P test result│condition of interest] [P test result│no condition of interest]

Diagnostic Tests and Screening - Slide 30 Likelihood Ratios Pregnancy example, saliva test as before –Prior odds 0.25 (20% pre-test probability) –Sensitivity 95%, specificity 85% Post-test odds with positive test = 0.25 x (0.95/0.15) = 0.25 x 6.33 = 1.58 Post-test probability = 1.58/(1+1.58) = 61% This approach can be particularly useful for tests with multiple categories, and for serial testing

Diagnostic Tests and Screening-- Slide 31 Pitfalls in assessments of diagnostic test performance Importance of pretest probability, as discussed. Pretest probability (and predictive values) cannot ordinarily be extrapolated from a validation study, since the proportions with and without disease are determined by the investigator - unless there is truly random sampling that reflects the context in which the test will be applied.

Diagnostic Tests and Screening - Slide 32 Was the test applied in a consistent fashion to all members of the validation sample? e.g. was test interpretation properly blinded? (unrelated to “true” presence or absence of disease or clues to it) Was the gold standard applied in a consistent fashion to all members of the validation sample? (again, blinded application not related to results of test(s) being evaluated)

Diagnostic Tests and Screening-- Slide 33 Example: New diagnostic tests for pulmonary embolism “Positive” results confirmed by pulmonary angiography (an invasive test with some risk) “Negative” results confirmed by clinical follow-up, i.e. does the patient return with further symptoms or signs? - this condition can resolve spontaneously and not recur

Diagnostic Tests and Screening-- Slide 34 Result: Good documentation of true and false positives Overestimate true negatives, underestimate false negatives  sensitivity of test overestimated specificity of test also overestimated

Diagnostic Tests and Screening-- Slide 35 Importance of the sample used for test validation: What was the spectrum of the condition evaluated? How similar is this to the situation in which the test will be used? Example: saliva pregnancy test Imagine that test hinges on ability to detect progesterone, a hormone where the level increases as pregnancy progresses If the test is validated by comparing women who are 3 months pregnant with young, non-pregnant women, it will perform very well as progesterone levels are very high by 3 months.

Diagnostic Tests and Screening - Slide 36 On the other hand, the sensitivity may be much lower if the pregnant group consists of women who are only 1 month after their last menstrual period. Conversely, the estimated specificity of the test will be higher if the comparison group has very low progesterone levels (e.g. postmenopausal women).

Diagnostic Tests and Screening-- Slide 37 You would reject results of a validation study involving women who are 3 months pregnant, or women who are postmenopausal by 3 months, pregnancy is usually relatively obvious by history and thus is unlikely to be the situation where the test will be used. the test would never be administered to postmenopausal women!

Diagnostic Tests and Screening-- Slide 38 So: Sensitivity and specificity estimates do not depend on the prevalence of the condition in question. BUT their values and their validity depend on the context in which they were obtained, vis-a-vis the context in which they will be used. This in turn will affect positive and negative predictive values, quite apart from the prevalence/prior probability of the condition.

Diagnostic Tests and Screening - Slide 39 Misclassification The use of an imperfect diagnostic test leads to misclassification (assigning individuals to the wrong category). In research studies, it is most often nondifferential. That is, the probability of misclassification is not associated with the exposure or intervention under study. For example, the use of an imperfect cardiac enzyme assay to define myocardial infarction in a primary prevention study with a novel anti-platelet agent. Another example: ascertaining the development of HIV infection based on a saliva test, comparing injection drug users who do vs. who do not clean their needles (in a cohort study).

Diagnostic Tests and Screening-- Slide 40 The effect of nondifferential misclassification is to dilute any association which may be present, i.e. the effect measure is biased toward the null value. Consider the extreme case where the cardiac enzyme assay is no better than flipping a coin. Then no effect of the antiplatelet drug will be detected, even if it is truly very beneficial. If the degree of misclassification is known, then corrected 2x2 tables and parameter estimates can be derived.

Diagnostic Tests and Screening-- Slide 41 Differential misclassification implies that measurement error is associated with study group membership, i.e. it operates differentially between groups. For example, imagine that the antiplatelet drug directly interferes with the cardiac enzyme assay, leading to underestimation of enzyme levels. Here, the drug may appear to be protective even if in reality, it is no better than placebo. Hence depending on the specific circumstances, differential misclassification may lead to under- or overestimation of the true association between exposure and outcome.

Screening “The identification of an unrecognized disease or risk factor by…[a] procedure that can be applied rapidly.” (Fletcher, p. 167) Screening is relevant only if disease is relatively common, testing is sensitive, specific, and cost-effective, and early treatment improves outcomes Sensitivity may be calculated by Detection method: Cases found by screening Cases found by screening + those identified during followup of screened persons (interval cases) Incidence method: Incidence among unscreened - interval incidence among screened Incidence among unscreened Incidence method accounts for “overdiagnosis” of abnormalities that are not clinically important, e.g. prostate cancer

Diagnostic Tests and Screening-- Slide 42 Biases in performance of screening tests (Does screening lead to better survival?) 1. Lead time bias The earlier in its natural history an ultimately fatal disease is detected, the longer will be the survival from the time of diagnosis, even if there is no difference in treatment effect. e.g. 2 years3 years 5 years Disease Detectable Clinical Death developsby screening symptoms If 2 persons A+B develop the same disease at the same age but person A is diagnosed by screening, person A will live 3 more years than person B from time of diagnosis, even if neither is treated, though the chronological survival is equivalent

Diagnostic Tests and Screening-- Slide 43 2. Length bias The probability of detecting a disease during its preclinical period is proportional to the length of that period, which is inversely proportional to the rate of disease progression. Hence cases diagnosed by screening may be “destined” for a more favourable evolution, regardless of treatment.

Diagnostic Tests and Screening-- Slide 44 3. Overdiagnosis bias (a variant of length bias; courtesy of Dr. W. Black) Screening may detect disease that would never have become clinically detectable, e.g. remains stable or regresses spontaneously. It may also detect disease that would not have contributed to the patient’s death e.g. competing mortality risks among smokers with early-stage lung cancer, or men with early-stage prostate cancer detected by PSA screening.

Diagnostic Tests and Screening-- Slide 45 4.Compliance bias Persons who comply with a screening intervention may be healthier—on average--and have healthier behaviours than non-compliers. Also likely to be healthier than an unscreened “control group,” which implicitly includes a mixture of persons who would and would not have complied, had they been offered screening. Leads to biases in observational (non-randomized) studies, and with analyses limited to “compliers” within randomized trials. Relevance of “intent to screen” analyses.

Epidemiology and Biostatistics 679: Clinical Epidemiology June 6-29, 2005 Instructors: Dr. Jean Bourbeau Dr. Dick Menzies

Similar presentations

Presentation on theme: "Epidemiology and Biostatistics 679: Clinical Epidemiology June 6-29, 2005 Instructors: Dr. Jean Bourbeau Dr. Dick Menzies"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Epidemiology and Biostatistics 679: Clinical Epidemiology June 6-29, 2005 Instructors: Dr. Jean Bourbeau Dr. Dick Menzies

Similar presentations

Presentation on theme: "Epidemiology and Biostatistics 679: Clinical Epidemiology June 6-29, 2005 Instructors: Dr. Jean Bourbeau Dr. Dick Menzies"— Presentation transcript:

Similar presentations

About project

Feedback