RELIABILITY OF DISEASE CLASSIFICATION Nigel Paneth.

Slides:



Advertisements
Similar presentations
CHI-SQUARE(X2) DISTRIBUTION
Advertisements

Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Chi-Squares II (other categorical measures of association)
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Bivariate Analysis Cross-tabulation and chi-square.
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
Psy B07 Chapter 6Slide 1 CATEGORICAL DATA & χ 2. Psy B07 Chapter 6Slide 2 A Quick Look Back  Reminder about hypothesis testing: 1) Assume what you believe.
Hypothesis Testing IV Chi Square.
Analysis of frequency counts with Chi square
GerstmanChapter 41 Epidemiology Kept Simple Chapter 4 Screening for Disease.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Chi-square Test of Independence
Experimental Evaluation
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
8/2/2015Slide 1 SPSS does not calculate confidence intervals for proportions. The Excel spreadsheet that I used to calculate the proportions can be downloaded.
Problem 1: Relationship between Two Variables-1 (1)
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Copyright © 2005 by Evan Schofer
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Accuracy Assessment. 2 Because it is not practical to test every pixel in the classification image, a representative sample of reference points in the.
David Yens, Ph.D. NYCOM PASW-SPSS STATISTICS David P. Yens, Ph.D. New York College of Osteopathic Medicine, NYIT l PRESENTATION.
Determining Sample Size
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Lecture 4: Assessing Diagnostic and Screening Tests
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Evidence Based Medicine
Statistics Sampling Distributions
Lecture Slides Elementary Statistics Twelfth Edition
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Analysis of Categorical Data. Types of Tests o Data in 2 X 2 Tables (covered previously) Comparing two population proportions using independent samples.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
Chapter 26 Chi-Square Testing
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Analyze Improve Define Measure Control L EAN S IX S IGMA L EAN S IX S IGMA Chi-Square Analysis Chi-Square Analysis Chi-Square Training for Attribute Data.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Nonparametric Tests of Significance Statistics for Political Science Levin and Fox Chapter Nine Part One.
Remember Playing perfect black jack – the probability of winning a hand is.498 What is the probability that you will win 8 of the next 10 games of blackjack?
Dan Piett STAT West Virginia University Lecture 12.
Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of.
State the ‘null hypothesis’ State the ‘alternative hypothesis’ State either one-tailed or two-tailed test State the chosen statistical test with reasons.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
Leftover Slides from Week Five. Steps in Hypothesis Testing Specify the research hypothesis and corresponding null hypothesis Compute the value of a test.
Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
Inter-observer variation can be measured in any situation in which two or more independent observers are evaluating the same thing Kappa is intended to.
1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
1 Chapter 9 Nonparametric Tests of Significance. 2 Introduction  A test of significance, such as the t-test, is referred to as a parametric test when.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
DRAWING INFERENCES FROM DATA THE CHI SQUARE TEST.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2012 by Nelson Education Limited. Chapter 12 Association Between Variables Measured at the Ordinal Level 12-1.
THE NORMAL DISTRIBUTION
CHI-SQUARE(X2) DISTRIBUTION
Association Between Variables Measured at the Ordinal Level
Association between two categorical variables
Measures of Association
Chi Square (2) Dr. Richard Jackson
Presentation transcript:

RELIABILITY OF DISEASE CLASSIFICATION Nigel Paneth

TERMINOLOGY Reliability is analogous to precision Validity is analogous to accuracy Reliability is how well an observer classifies the same individual under different circumstances. Validity is how well a given test reflects another test of known greater accuracy.

RELIABILITY AND VALIDITY Reliability includes: assessments of the same observer at different times - INTRA-OBSERVER RELIABILITY assessments of different observers at the same time - INTER-OBSERVER RELIABILITY Reliability assumes that all tests or observers are equal; Validity assumes that there is a gold standard to which a test or observer should be compared.

ASSESSING RELABILITY How do we assess reliability? One way is to look simply at percent agreement. Percent agreement is the proportion of all diagnoses classified the same way by two observers.

EXAMPLE OF PERCENT AGREEMENT Two physicians are each given a set of 100 X-rays to look at independently and asked to judge whether pneumonia is present or absent. When both sets of diagnoses are tallied, it is found that 95% of the diagnoses are the same.

IS PERCENT AGREEMENT GOOD ENOUGH? Do these two physicians exhibit high diagnostic reliability? Can there be 95% agreement between two observers without really having good reliablity?

Compare the two tables below: Table 1Table 2 MD#1 YesNo MD#2 Yes13 No294 MD#1 YesNo MD#2 Yes433 No252 In both instances, the physicians agree 95% of the time. Are the two physicians equally reliable in the two tables? MD#1 YesNo MD#2 Yes433 No252

What is the essential difference between the two tables? The problem arises from the ease of agreement on common events (e.g. not having pneumonia in the first table). So a measure of agreement should take into account the “ease” of agreement due to chance alone.

USE OF THE KAPPA STATISTIC TO ASSESS RELIABILITY Kappa is a widely used test of inter or intra-observer agreement (or reliability) which corrects for chance agreement.

KAPPA VARIES FROM + 1 to means that the two observers are perfectly reliable. They classify everyone exactly the same way. 0 means there is no relationship at all between the two observer’s classifications, above the agreement that would be expected by chance. - 1 means the two observers classify exactly the opposite of each other. If one observer says yes, the other always says no.

GUIDE TO USE OF KAPPAS IN EPIDEMIOLOGY AND MEDICINE Kappa >.80 is considered excellent Kappa is considered good Kappa is considered fair Kappa <.40 is considered poor

1 st WAY TO CALCULATE KAPPA 1. Calculate observed agreement (cells in which the observers agree/total cells). In both table 1 and table 2 it is 95% 2. Calculate expected agreement (chance agreement) based on the marginal totals

Table 1’s marginal totals are: OBSERVED MD#1 YesNo MD#2 Yes134 No

How do we calculate the N expected by chance in each cell? We assume that each cell should reflect the marginal distributions, i.e. the proportion of yes and no answers should be the same within the four-fold table as in the marginal totals. OBSERVEDMD #1 YesNo MD#2Yes134 No EXPECTEDMD #1 YesNo MD#2Yes4 No

To do this, we find the proportion of answers in either the column (3% and 97%, yes and no respectively for MD #1) or row (4% and 96% yes and no respectively for MD #2) marginal totals, and apply one of the two proportions to the other marginal total. For example, 96% of the row totals are in the “No” category. Therefore, by chance 96% of MD #1’s “No’s” should also be in the “No” column. 96% of 97 is EXPECTED MD#1 YesNo MD#2Yes4 No

By subtraction, all other cells fill in automatically, and each yes/no distribution reflects the marginal distribution. Any cell could have been used to make the calculation, because once one cell is specified in a 2x2 table with fixed marginal distributions, all other cells are also specified. EXPECTEDMD #1 YesNo MD#2Yes No

Now you can see that just by the operation of chance, of the 100 observations should have been agreed to by the two observers. ( ) EXPECTEDMD #1 YesNo MD#2Yes No

Lets now compare the actual agreement with the expected agreement. Expected agreement is 6.76% from perfect agreement of 100% (100 – 93.24) Actual agreement is 5.0% from perfect agreement (100 – 95). So our two observers were 1.76% better than chance, but if they had agreed perfectly they would have been 6.76% better than chance. So they are really only about ¼ better than chance (1.76/6.76)

Below is the formula for calculating Kappa from expected agreement Observed agreement - Expected Agreement 1 - Expected Agreement 95% % = 1.76% = % 6.76%

How good is a Kappa of 0.26? Kappa >.80 is considered excellent Kappa is considered good Kappa is considered fair Kappa <.40 is considered poor

In the second example, the observed agreement was also 95%, but the marginal totals were very different ACTUALMD #1 YesNo MD#2Yes46 No

Using the same procedure as before, we calculate the expected N in any one cell, based on the marginal totals. For example, the lower right cell is 54% of 55, which is 29.7 ACTUALMD #1 YesNo MD#2Yes46 No

And, by subtraction the other cells are as below. The cells which indicate agreement are highlighted in yellow, and add up to 50.4% ACTUALMD #1 YesNo MD#2Yes No

Enter the two agreements into the formula: Observed agreement - Expected Agreement 1 - Expected Agreement 95% % = 44.6% = % 49.6% In this example, the observers have the same % agreement, but now they are much different from chance. Kappa of 0.90 is considered excellent

A 2 nd WAY TO CALCULATE THE KAPPA STATISTIC MD#1 YesNo MD#2 YesAB N1N1 NoCDN2N2 N3N3 N4N4 total 2(AD - BC) N 1 N 4 + N 2 N 3 where the Ns are the marginal totals, labeled thus:

Look again at the tables on slide 7. For Table 1: 2(94 x x 3) = 176 =.26 4 x x For Table 2: 2(52 x x 2) = 4460 = x x

Note parallels between: THE ODDS RATIO THE CHI-SQUARE STATISTIC THE KAPPA STATISTIC Note that the cross-products of the four-fold table, and their relation to marginal totals, are central to all three expressions