© 2010 Jones and Bartlett Publishers, LLC
Design Strategies and Statistical Methods Used in Descriptive Epidemiology Chapter 4
© 2010 Jones and Bartlett Publishers, LLC The program that directs the researcher along the path of systematically collecting, analyzing, and interpreting data. It allows for descriptive assessment of events and for statistical inference concerning relationships between exposure and disease and defines the domain for generalizing the results. What is a study design?
© 2010 Jones and Bartlett Publishers, LLC Means of organizing, summarizing, and describing epidemiologic data by person, place, and time Descriptive statistics can take on various forms, including tables, graphs, and numerical summary measures Application of statistical methods makes it possible to effectively describe the public health problem Objective: Define descriptive epidemiology
© 2010 Jones and Bartlett Publishers, LLC Provides information about a disease or condition Provides clues to identify a new disease or adverse health effect Identifies the extent of the public health problem Obtains a description of the public health problem that can be easily communicated Identifies the population at greatest risk Assists in planning and resource allocation Identifies avenues for future research Why is descriptive epidemiology helpful?
© 2010 Jones and Bartlett Publishers, LLC Four types of descriptive studies: 1. Ecologic studies 2. Case reports 3. Case series 4. Cross-sectional surveys Objective: Describe uses, strengths, and limitations of descriptive study designs
© 2010 Jones and Bartlett Publishers, LLC Involves aggregated data on the population level Ecologic fallacy Ecologic study
© 2010 Jones and Bartlett Publishers, LLC Correlation between eating 5 or more servings of fruits and vegetables per day and being overweight (in the United States and US territories, 2007)
© 2010 Jones and Bartlett Publishers, LLC A case report involves a profile of a single individual A case series involves a small group of patients with a similar diagnosis Provide evidence for larger scale studies (hypothesis generating) Case reports and case series
© 2010 Jones and Bartlett Publishers, LLC Conducted over a short period of time (usually a few days or weeks) and the unit of analysis is the individual There is no follow-up period Cross-sectional survey (sometimes called prevalence survey)
© 2010 Jones and Bartlett Publishers, LLC Can be used to study several associations at once Can be conducted over a short period of time Produce prevalence data Biases due to observation (recall and interviewer bias) and loss-to-follow-up do not exist Can provide evidence of the need for analytic epidemiologic study Cross-sectional survey Strengths
© 2010 Jones and Bartlett Publishers, LLC Unable to establish sequence of events Infeasible for studying rare conditions Potentially influenced by response bias Cross-sectional study Weaknesses
© 2010 Jones and Bartlett Publishers, LLC Cross-sectional surveys that are routinely conducted U.S. Census Behavior Risk Factor Surveillance System National Health Interview Survey National Hospital Discharge Survey Serial surveys
© 2010 Jones and Bartlett Publishers, LLC Percentages of overweight and obese adults in the United States between 1990 and From CDCs BRFSS annual survey.
© 2010 Jones and Bartlett Publishers, LLC ChinaChina: 1.32 billion (19.84%) IndiaIndia: 1.13 billion (16.96%) United StatesUnited States: million (4.56%) IndonesiaIndonesia: million (3.47%) BrazilBrazil: million (2.8%) PakistanPakistan: 163 million (2.44%) BangladeshBangladesh: million (2.38%) NigeriaNigeria: 148 million (2.22%) RussiaRussia: 142 million (2.13%) JapanJapan: million (1.92%) MexicoMexico: million (1.6%) PhilippinesPhilippines: 88.7 million (1.33%) VietnamVietnam: 87.4 million (1.31%) GermanyGermany: 82.2 million (1.23%) EthiopiaEthiopia: 77.1 million (1.16%) Approximately 4.3 billion people live in these 15 countries, representing roughly two-thirds of the world's population.
© 2010 Jones and Bartlett Publishers, LLC 7/1/2007 6,600,411,051 8/1/2007 6,606,949,106 6,538,055 9/1/2007 6,613,487,162 6,538,056 10/1/2007 6,619,814,313 6,327,151 11/1/2007 6,626,352,369 6,538,056 12/1/2007 6,632,679,520 6,327,151 1/1/2008 6,639,217,576 6,538,056 2/1/2008 6,645,755,632 6,538,056 3/1/2008 6,651,871,878 6,116,246 4/1/2008 6,658,409,934 6,538,056 5/1/2008 6,664,737,085 6,327,151 6/1/2008 6,671,275,141 6,538,056 7/1/2008 6,677,602,292 6,327,151 Monthly World Population
© 2010 Jones and Bartlett Publishers, LLC Nominal data (dichotomous or binary) Ordinal data Discrete Continuous Objective Define the four general types of data
© 2010 Jones and Bartlett Publishers, LLC DescriptionExamples NominalCategorical – unordered categories Two levels – dichotomous More than two levels – multichotomous Sex, disease (yes, no) Race, marital status, education status OrdinalCategorical – ordering informativePreference rating (e.g., agree, neutral, disagree) DiscreteQuantitative – IntegersNumber of cases ContinuousQuantitative – Values on a continuum Dose of ionizing radiation Types of data
© 2010 Jones and Bartlett Publishers, LLC Ratios, proportions, and rates are commonly used measures for describing dichotomous data The general formula for a ratio, proportion, or rate is x/y*10 n 10 n is called the rate base, with typical values of n = 0, 1,..., 5 Objective: Define ratio, proportion, and rate
© 2010 Jones and Bartlett Publishers, LLC In a ratio the values of x and y are independent such that the values of x are not contained in y The rate base for a ratio is typically 1 Ratio
© 2010 Jones and Bartlett Publishers, LLC In a proportion, x is contained in y A proportion is typically expressed as a percentage, such that the rate base is 100 Proportion
© 2010 Jones and Bartlett Publishers, LLC A rate may be thought of as a proportion with the addition that it represents the number of health-related states or events in a population over a specified time period Rate
© 2010 Jones and Bartlett Publishers, LLC Rate equations
© 2010 Jones and Bartlett Publishers, LLC Rate equations
© 2010 Jones and Bartlett Publishers, LLC Diseases or events that affect a larger proportion of the population than the conventional incidence rate. Cumulative incidence rate (attack rate)
© 2010 Jones and Bartlett Publishers, LLC The crude rate of an outcome is calculated without any restrictions, such as by age or sex, on who is counted in the numerator or denominator These rates are limited if we try to compare them between subgroups of the population or over time because of potential confounding influences, such as differences in the age-distribution between groups Objective: Distinguish between crude and age-adjusted rates
© 2010 Jones and Bartlett Publishers, LLC In 2002, the crude mortality rate in Florida was 1,096 per 100,000 compared with 579 per 100,000 in Utah The crude mortality rate ratio is 1.9, meaning the rates in Florida are 1.9 times (or 90%) higher than in Utah However, the age distribution differs considerably between Florida and Utah. In Florida 6.3% of the population is under five years of age and 16.7% of the population is 65 years and older. Corresponding percentages in Utah are 9.8% and 8.5%. Example of the importance of age-adjustment
© 2010 Jones and Bartlett Publishers, LLC Using the direct method of age- adjustment based on the 2000 US standard population yielded rates of 762 in Florida and 782 in Utah per 100,000 Thus, after adjusting for differences in the age distribution, the rate in Florida is 0.97 times that in Utah Example of the importance of age-adjustment (continued)
© 2010 Jones and Bartlett Publishers, LLC US crude and age-adjusted (to the 2000 US standard population) rates for all-cause mortality and all malignant cancers according to year. Data from the National Cancer Institute.
© 2010 Jones and Bartlett Publishers, LLC Direct Indirect Two methods for calculating age-adjusted rates
© 2010 Jones and Bartlett Publishers, LLC MaleFemale AgeCountsPopulationRate per 100,000 CountsPopulationRate per 100,000 < ,6289,537, ,0997,624, ,4224,535, ,0532,984, ,6092,322, Total Age-specific and overall all cause malignant cancer incidence rates among males and females in
© 2010 Jones and Bartlett Publishers, LLC Suppose that we want to know the rate for females assuming they had the same age- distribution as males. To do this we multiply the age-specific female cancer rates by the age- specific population values for males to get expected number of cases for females for each age group, assuming they had the same age distribution as males. These expected counts are then summed and divided by the total male population. Direct method
© 2010 Jones and Bartlett Publishers, LLC The age-adjusted rate ratio for males to females is now This means that if females had the same age-distribution as males, malignant cancer incidence would be 28% higher for males than females, as opposed to 9% higher found using crude rates. The crude rate is 1.09 times (or 9%) higher for males than females The resulting malignant cancer rate for females age-adjusted to the male population is: Direct method
© 2010 Jones and Bartlett Publishers, LLC Population A Age (years)Population# deathsAttack Rate /1000= /4000= /6000=.020 Total /11000=.0146 Population B Age (years)Population# deathsAttack Rate /5000= /2000= /500=.020 Total /7500=.0187 Example 2
© 2010 Jones and Bartlett Publishers, LLC Population A Age (years)PopulationAttack Rate Pop. BExpected x.024= x.005= x.020= Age-adjusted rate: 164/11000=.0149 Crude rate ratio:.0146/.0187=.7822% lower in population A Adjusted rate ratio:.0146/.0149=.982% lower in population A Example 2 Continued
© 2010 Jones and Bartlett Publishers, LLC In situations where age-specific rates are unstable because of small numbers or some are simply missing, age- adjustment is still possible using the indirect method Objective: Define the standardized morbidity (or mortality) ratio
© 2010 Jones and Bartlett Publishers, LLC Standardized morbidity (or mortality) ratio (SMR)
© 2010 Jones and Bartlett Publishers, LLC SMR = 1 The health-related states or events observed were the same as expected from the age-specific rates in the standard population. SMR > 1 More health-related states or events were observed than expected from the age- specific rates in the standard population. SMR < 1 Less health-related states or events were observed than expected from the age- specific rates in the standard population. Interpretation of the SMR
© 2010 Jones and Bartlett Publishers, LLC Suppose that some or all of the female age- specific counts are unavailable, but that the total count is available Further suppose that the age-specific rates for males can be calculated Now multiply the age-specific rates in the male (standard) population by the age-specific female population values to obtain the expected number of all malignant cancer cases per age- specific group (see following table) Example of SMR
© 2010 Jones and Bartlett Publishers, LLC AgeMale Rate per 100,000 Female Population Expected Counts < Total Data for calculating the age- adjusted malignant cancer rate for females using the indirect method
© 2010 Jones and Bartlett Publishers, LLC This ratio indicates that fewer malignant cancer cases (about 25%) were observed in females than expected from the age-specific rates of males Sum the expected counts to obtain the total number of expected malignant cancers in the comparison population Example of SMR
© 2010 Jones and Bartlett Publishers, LLC Population A Age (years)Population# deathsAttack Rate /1000= /2000= /3000=.030 Total Population B Age (years)Population# deaths Not Available Not Available Total Example 2 – Indirect Method
© 2010 Jones and Bartlett Publishers, LLC Age (years)Population BAttack Rate AExpected Deaths SMR = Observed/Expected = 95/73 = 1.3 The ratio indicates 30% more deaths than expected, based on the age-specific rates of population A (standard population) Example 2 Continued
© 2010 Jones and Bartlett Publishers, LLC Tables Line listing Frequency distribution Graphs Bar chart, pie chart Histogram Epidemic curve Box plot Two-way (or bivariate) scatter plot Spot map Area map Line graph Objective: Be familiar with tables, graphs, and numerical methods for describing epidemiologic data
© 2010 Jones and Bartlett Publishers, LLC Breast cancer incidence rates white women in Utah (by LDS status) and SEER (without Utah) by year of diagnosis Poor (5-year RSR<30%) Medium (5-year RSR 30-80%) Good (5-year RSR>80%) Low Incidence <20StomachCervix NHL Melanomas-Skin Medium Incidence Lung & Bronchus Colorectal High Incidence 100+Breast (female) Prostate Prevalence of Selected Invasive Cancer Sites According to Their Incidence and Relative Survival Rate Combinations
© 2010 Jones and Bartlett Publishers, LLC For U.S. Whites, Female breast cancer rate = per 100,000 Male prostate cancer rate = per 100,000 What cancer is more common? Breast cancer in women Prostate cancer in men
© 2010 Jones and Bartlett Publishers, LLC Measures of central tendency Mean Median Mode Measures of dispersion Range Inter-quartile range Variance Standard deviation Coefficient of variation Empirical rule Chebychev’s inequality Numerical methods
© 2010 Jones and Bartlett Publishers, LLC For discrete and continuous variables Correlation coefficient (denoted by r) Coefficient of determination (denoted by r 2 ) Spearman’s rank correlation coefficient Slope coefficient based on regression analysis Slope coefficient based on multiple regression analysis For nominal and ordinal variables Spearman’s rank correlation coefficient Slope coefficient based on logistic regression analysis Slope coefficient based on multiple logistic regression analysis Objective: Be familiar with measures for evaluating the strength of the association between variables
© 2010 Jones and Bartlett Publishers, LLC Under analytic epidemiologic studies, the risk ratio (also called relative risk) and odds ratio are commonly used to measure association, as will be discussed in a later chapter Other measures of association