1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009
1/26/09 New Mexico Department of Health 2 Outline Description of the Problem –Random variation –Survey samples versus complete count datasets –Observed events versus underlying risk Statistical Tools –Confidence intervals –Combining data –SMR
1/26/09 3 Small Numbers: The Problem
1/26/09 New Mexico Department of Health 4 Random Variation Exercise: –Select a sample –Calculate the median age State of New Mexico Median age 1 –36.0 Why are they different? American Community Survey, U.S. Census Bureau. Downloaded on 1/21/09 from
1/26/09 New Mexico Department of Health 5 Random Variation and Sample Size What if we had a sample of New Mexico residents that was: –Randomly selected –n=5,000 Would it better match the state Census Bureau estimate?
1/26/09 New Mexico Department of Health 6 Size Matters The larger sample helps to “cancel out” the effects of random variation. –Some sample subjects are older than the median. –Some sample subjects are younger than the median. –As you increase the number of sample subjects, the differences cancel out, and you get closer to the median.
1/26/09 New Mexico Department of Health 7 Reliability and Validity The term "accuracy" is often used in relation to validity, while the term, "precision" is used to describe reliability.
1/26/09 New Mexico Department of Health 8 Numerator vs. Denominator A large sample size means we have a large denominator, but the numerator also matters. Some methods use a “Poisson distribution,” which considers ONLY the numerator size when assessing precision. If we have only 1 event in one year, and 2 the next year, the addition of a single event doubles the rate of occurrence.
1/26/09 New Mexico Department of Health 9 Random Variation and Complete Count Datasets What are some “complete count” datasets? How do we use them for community health assessment?
1/26/09 New Mexico Department of Health 10
1/26/09 New Mexico Department of Health 11
1/26/09 New Mexico Department of Health 12
1/26/09 New Mexico Department of Health 13
1/26/09 New Mexico Department of Health 14
1/26/09 New Mexico Department of Health 15
1/26/09 New Mexico Department of Health 16
1/26/09 New Mexico Department of Health 17 Summary of the Problem Measurements are subject to sampling variability, also known as random error. Even complete count datasets are subject to random error because we use them as a reflection of the underlying disease risk or rate.
1/26/09 New Mexico Department of Health 18 Summary of the Problem A larger sample (denominator, population size) helps to “cancel out” the effects of random variation. Size matters, in both the numerator and the denominator. A measure that is relatively free from the effects of random variation is called “precise,” “reliable,” and “stable.” Those terms are synonymous.
1/26/09 19 Small Numbers: Statistical Tools
1/26/09 New Mexico Department of Health 20 Tool #1. Confidence Intervals Use confidence intervals to help you decide whether the rate is stable. Won’t solve the problem, but will provide information to help you interpret the rates. The stability of an observed rate is important when comparing areas or assessing whether disease risk has increased or decreased.
1/26/09 New Mexico Department of Health 21
1/26/09 New Mexico Department of Health 22
1/26/09 New Mexico Department of Health 23
1/26/09 New Mexico Department of Health 24
1/26/09 New Mexico Department of Health 25 Calculation of 95% C.I. The 95% confidence interval is calculated as 1.96 x Standard Error of the estimate (s.e.). s.e. is calculated as So the 95% C.I. is 1.96
1/26/09 New Mexico Department of Health 26 The “Normal” Distribution
1/26/09 New Mexico Department of Health 27 Poisson Distribution
1/26/09 New Mexico Department of Health 28 Calculation of 95% C.I. p stands for “probability.” It is the rate without the multiplier (e.g., 100,000 for deaths). q is the complement of the probability (1 minus P). In Union County, there were 2 diabetes deaths among the 4,470 population, for a probability of (45 in 100,000)
1/26/09 New Mexico Department of Health 29 Calculation of 95% C.I. Formula: 1.96 p= , q= , n=4,470 (pq)/n = / 4470 = √(pq)/n = ………………… = 1.96√(pq)/n = 1.96 x = Then we need to add the multiplier back in, so the confidence interval is: + 100,000* = + 62
1/26/09 New Mexico Department of Health 30 Calculation of 95% C.I. The diabetes death rate was 44.7 per 100,000. The confidence interval statistic is applied both above and below the rate. C.I. LL (lower limit) is: = -17.3, and since we cannot have a negative rate, we’ll call it “0” C.I. UL (upper limit) is: = The diabetes death rate for Union County in 2006 was 44.7 per 100,000 (95% C.I., 0 to 106.7)
1/26/09 New Mexico Department of Health 31
1/26/09 New Mexico Department of Health 32
1/26/09 New Mexico Department of Health 33
1/26/09 New Mexico Department of Health 34 Confidence Interval Factoids The confidence interval may be thought of as the range of probable true values for a statistic. The confidence interval is an indication of the precision (stability, reliability) of the estimate. A confidence interval is typically expressed as a symmetric value (e.g., "plus or minus 5%"). But for percentages, when the point estimate is close to 0% or 100%, a confidence interval with an asymmetric shape can be used.
1/26/09 New Mexico Department of Health 35 More Confidence Interval Factoids The 95% confidence interval (calculated as 1.96 times the standard error of a statistic) indicates the range of values within which the statistic would fall 95% of the time if the researcher were to calculate the statistic from an infinite number of samples of the same size drawn from the same base population. Unless otherwise stated, a confidence interval will be the "95% confidence interval."
1/26/09 New Mexico Department of Health 36 More Confidence Interval Factoids The 90% confidence interval, also commonly used, is calculated as 1.65 times the standard error of the estimate. To calculate a confidence interval when the number of health events = 0, you may use 0 as the lower confidence limit, and for the upper confidence limit, assume a count of 3 health events in the same population.
1/26/09 New Mexico Department of Health 37 Tool #2. Combine Data Combine years Combine geographic areas (e.g., use the regional estimate rather than the county estimate) Use a broader age group
1/26/09 New Mexico Department of Health 38
1/26/09 New Mexico Department of Health 39
1/26/09 New Mexico Department of Health 40
1/26/09 New Mexico Department of Health 41 Interpretation of Diabetes Deaths in Union County Union County’s diabetes death rate ( ) was higher than the state, overall rate, but was not statistically significantly higher. In other words, the Union County rate was “marginally higher” than the New Mexico state rate. Was it higher than Santa Fe County?
1/26/09 New Mexico Department of Health 42 Differences Between Two Rates SStatistical significance of a change in a rate from time 1 to time 2 SStatistical significance of the difference between two rates in one time period (e.g., Union County versus Santa Fe County). **Test of Proportions**
1/26/09 New Mexico Department of Health 43 Test of Proportions Proportion1: Union County diabetes death rate: 41.3/100,000 = Proportion 2: Santa Fe County diabetes death rate: 20.4/100,000 = Difference between the two proportions: =
1/26/09 New Mexico Department of Health 44 Test of Proportions (cont’d) The difference between the two rates ( ) must be considered in the context of the standard error of the difference between two rates (pooled standard error), computed as: If the difference between the two rates, , is greater than 1.96 x s.e. diff, then the difference is considered statistically significant. Bruning, J.L., and Kintz, B.L. (1977) Computational Handbook of Statistics. Scott, Foresman and Company: London.
1/26/09 New Mexico Department of Health 45 Calculation of s.e. diff Union County: –p1= –q1= –n1=33,929 Santa Fe County: –p2= –q2= –n2=1,092,565 p=proportion, q=(1-p), n is the person-years at risk, or the sum of the population counts across all eight years.
1/26/09 New Mexico Department of Health 46 Calculation of s.e. diff
1/26/09 New Mexico Department of Health 47 Evaluation of the Difference Union County: 41.3/100,000 = Santa Fe County: 20.4/100,000 = Difference: = s.e. diff = 1.96 * s.e. diff = Is greater than ? No. Union County’s rate is greater than Santa Fe County’s rate, but the difference is NOT statistically significant.
1/26/09 New Mexico Department of Health 48 Tool #3. SMR and ISR Standardized Mortality (or Morbidity) Ratio (SMR) –Estimates the number of deaths (or health events) one would EXPECT, based on … The age- and sex-specific rates in a standard population (e.g., New Mexico rate) The age and sex distribution of the index area. Indirectly Standardized Rates –Use SMR to perform age adjustment when the number of cases is less than 20.
1/26/09 New Mexico Department of Health 49 Standardized Mortality Ratio The all-cause death rate in New Mexico in 2006 was deaths per 100,000 population. All other things being equal, we should expect the same death rate in Union County.
1/26/09 New Mexico Department of Health 50 Standardized Mortality Ratio BUT all other things are NOT equal. –2006, % of population over age 65 was 18.9% in Union County, compared with 12.3% statewide. In an older population, we would expect a higher death rate.
1/26/09 New Mexico Department of Health 51 Standardized Mortality Ratio And Union County’s death rate is higher: deaths per 100,000. IF we adjust the New Mexico death rate to account for Union County’s older population, THAT is how many deaths we should EXPECT.
1/26/09 New Mexico Department of Health 52 SMR (Observed/Expected) (Rate x Pop) / 100,000 Standardized Mortality Ratio for 2006 Union County, All-cause Mortality
1/26/09 New Mexico Department of Health 53 SMR, Union County An SMR <1.0 indicates less-than-expected mortality. An SMR >1.0 indicates greater-than- expected mortality (also known as “excess mortality). Union County’s SMR was 1.28, so the county had excess mortality in Was it significantly more than expected?
1/26/09 New Mexico Department of Health 54 Indirect Age-Standardization You should not use direct age adjustment when there are fewer than 20 (some say 25) health events. If you multiply the New Mexico crude rate by the Union County SMR, you get the indirectly age-adjusted rate for Union County. –Union Co. crude all-cause death rate: –NM crude all-cause death rate: –Union County SMR: 1.28 –Union County indirectly age-standardized rate: (still higher than the state rate, but the effects of Union County’s age distribution have been removed).
1/26/09 New Mexico Department of Health 55 Confidence Interval for SMR Observed deaths: 61 (# deaths from Vital Records data) Expected deaths: 47.7 (# expected from SMR calculation) SMR: 1.28 (observed / expected) StdErr for SMR: 0.16 (SQRT(observed)) / expected 95% Confidence Interval: 0.32 (1.96 x StdErr) Significance TestDoes the 95% confidence interval include 1.0? –If "yes" -> not significant –If "no" -> statistically significant
1/26/09 New Mexico Department of Health 56 Summary: Statistical Tools Use confidence intervals assess the stability of a rate. Use C.I. to see if your local rate is significantly different from the state rate. A statistic called a “Test of Proportions” uses the “pooled standard error” to test whether two local rates are significantly different.
1/26/09 New Mexico Department of Health 57 Summary: Statistical Tools Combine data to improve the stability of your rate. –Combine persons (e.g., broader age group) –Combine place (larger area) –Combine time (more years)
1/26/09 New Mexico Department of Health 58 Summary: Statistical Tools Use the Standardized Mortality (Morbidity) Ratio (SMR) to compare a local rate to a standard population (e.g., state overall). The SMR “expected” can be used for indirect age-adjustment when the number of health events is fewer than 20, or if the age-specific death rates are not known.
1/26/09 59 Thanks! Lois M. Haggard, PhD Community Health Assessment Program, NMDOH