Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH
Quantifying Experience
The purpose of our Vital Event Data Release Scoring Criteria is to establish clear guidelines for determining if tabulated data can be released to any requester. Regarding this: - this is for providing safeguards and assurances toward maintaining the confidentiality of the persons who comprise the data (re-identification issue) - this is NOT for determining if vital event data are too small for calculating stable or reliable point estimators or rates (statistical issue)
Quantifying Experience Bruce Cohen (MA) suggested that the following would be reasonable criteria and goals for a tabular data release system: 1)Protects confidentiality of individuals 2)Be simple and clear 3)Can be programmed electronically 4)Be sensitive and flexible
Regarding maintaining confidentiality, where does your level of comfort begin to appreciably drop off in the following data request example? Quantifying Experience Number of births in Chicago (or your jurisdiction’s most populous city)... By five-year age group of mother... Who delivered a low birthweight baby... And had no or only third trimester prenatal care... Cross-tabulated by four race categories... By census tract... For June-August 2003.
Data Request Confidential data? DRRC* What constitutes confidential data? - data that contain direct individual identifiers - aggregated data that could lead to re- identification of individuals indirectly What is NOT confidential data? - public use data sets (e.g., sterilized files) - aggregated data that could not lead to re-identification of individuals * DRRC = Data Release and Research Committee Data Release Guidelines in IDPH
The purpose of the Vital Event Data Release Scoring Criteria is to aid the determination of: - confidential data (DRRC’s purview) - data that require a minimum cell size or partial suppression* when releasing, to protect confidentiality - data that can be released with no restrictions on cell size * currently use complimentary suppression Data Release Guidelines in IDPH
The Scoring Release Criteria are based on the following principles: -the greater the degree of cross-tabulations of vital event variables, the greater the likelihood of re- identification -the more categories or detail of any one vital event variable, the more it contributes to re-identification -some vital event variables can lead towards re- identification to a greater degree than others -the greater the aggregation of time and geography, the lesser the likelihood of re-identification Data Release Guidelines in IDPH
The Scoring Release Criteria are generally based on the following research: Dr. Tiefu Shen’s (of IDPH) “uniqueness” SAS program, now used as NAACCR standard* This is a “brute force” program that checks all the combinations of variables at an individual record level to determine the uniqueness of each combination. This is a test of potential confidentiality breach. * Please see Data Release Guidelines in IDPH
The Scoring Release Criteria are generally based on the following research (continued): In general, the proportion of unique records increases as the number of variables in a cross-tabulation or combination increases. Among cancer data files, a variable of five-year age groups contributes to uniqueness more than race category and much more than sex. Data Release Guidelines in IDPH
Score Values for Vital Event Data Release Variable CharacteristicScore Sex +1 Age>10-year age groups year age groups year age groups year age groups +7 Race groupany +3 Hispanic ethnicityyes or no +2 detailed ethnicity +3 Cause of death1,000+ deaths for geography – 999 deaths +3 <100 deaths +5 Quantifying Experience
Score Values (continued) Variable CharacteristicScore GeographyIllinois, Chicago, Cook County -5 CA, city, or county > 20,000 pop. 0 CA, city, or county > 20,000 or less +5 Data year5 years aggregated years aggregated 0 1 year (e.g., 2001) +3 quarter +5 Other variables< 5 groups or categories groups groups +7 Quantifying Experience
Release Scoring Criteria Process: for the resulting most detailed cross- tabulation, add up the scores based on the point values. If the score is... < 9 data are releasable, with no minimum cell size 9-11 discuss with supervisor if the data are okay as is or if there is a need to aggregate categories 12+ cell sizes must be 12 or more before releasing data; otherwise, small-size cells’ data must be suppressed Quantifying Experience
Example 1 Births by 12 birthweight groups for Chicago for 1995 Scoring = +7 (other variables) –5 (geography) +3 (data year) = 5 Action: data can be released, as is, regardless of cell size. Quantifying Experience
Example 2 Teen suicides by race categories for Chicago Community Areas (CAs) for (combined) Scoring = +3 (age) +3 (race) +5 (geography) +0 (data year) = 11 Action: discuss with supervisor; likely CAs with less than 12 suicides would be suppressed. Quantifying Experience
Revisiting our level of comfort example, here’s where ICHS’s level drops off... Number of births in Chicago... Quantifying Experience By five-year age group of mother (age) okay to release Who delivered a low birthweight baby (other variables) still okay And had no or only third trimester prenatal care (other variables) “discuss with supervisor” Cross-tabulated by four race categories (race group) suppress small cell data By census tract (geography) suppress small cell data For June-Aug (data year) suppress small cell data
Total score = +5 (age) +3 (other var.) +3 (other var.) +3 (race) +5 (geography) +5 (data year) = 24 Possible Alternatives: - Chicago only and 2 years of data = = 9 - no age of mother and 5 years of data = = 9 - make two separate requests: a) births by census tract for June-Aug = = 10 b) Chicago births by LBW by prenatal care by race categories = = 9 Quantifying Experience
These Scoring Release Criteria are your starting point. They are designed to be flexible; experiment and compare to how you release vital event tabulations now. Add scores for variables that are frequently requested. Change variable characteristics to reflect your commonly used categories or selections (e.g., levels of geography). Consider changing the thresholds to better meet your (agency’s) “comfort levels.” Quantifying Experience
Contact information: Mark Flotow Illinois Center for Health Statistics Illinois Department of Public Health 525 West Jefferson Street Springfield, IL Telephone: (217) Fax: (217)