Public Aggregate Reporting – DHCS Business Reports Overview Linette Scott, MD, MPH Chief Medical Information Officer, DHCS July 1, 2015
Public Aggregate Reporting for DHCS Business Reports (PAR-DBR) HIPAA Standard for De-identification Overview of the PAR-DBR Steps of the PAR-DBR
HIPAA Standard for De-identification
Health Insurance Portability and Accountability Act (HIPAA) HIPAA standard for de-identification of protected health information: “Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information” DHCS is a HIPAA Covered Entity http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De- identification/guidance.html
HIPAA De-identification Standard Two methods described in the standard: Safe Harbor 18 identifiers of the individual or of relatives, employers, or household members of the individual must be removed In the context of other publicly available information Expert Determination
HIPAA Safe Harbor Names All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census: The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000 All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
HIPAA Safe Harbor Cont. Telephone numbers Fax numbers Email addresses Social security numbers Medical record numbers Health plan beneficiary numbers Account numbers Certificate/license numbers Vehicle identifiers and serial numbers, including license plate numbers Device identifiers and serial numbers Web Universal Resource Locators (URLs) Internet Protocol (IP) addresses Biometric identifiers, including finger and voice prints Full-face photographs and any comparable images Any other unique identifying number, characteristic, or code
What Usually Leads to Expert Determination? Time The time period is less than a year Geography Less than statewide Other Rare diagnosis Specific combinations of variables
Expert Determination Apply statistical or scientific principles Very small risk that the anticipated recipient could identify an individual Documents the methods and results of the analysis that justify such determination http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De -identification/guidance.html#standard
Overview of the PAR-DBR
Purpose of the PAR-DBR Establish guidelines to be used for reports and documents generated by DHCS programs for public release that include data (tables, charts, graphics) Create consistency in the analysis and presentation of data in reports and documents Protect confidentiality of personal data held by DHCS Compliance with laws that govern data release
Public Aggregate Reporting DHCS Business Reports These Guidelines provide a method and process for Expert Determination Key Steps: Evaluate the Numerator Evaluate the Denominator Use the Publication Scoring Criteria Suppress data that has higher risk Departmental document review processes
Public Aggregate Reporting http://www.dhcs.ca.gov/dataandstats/statistics/Documents/3_1_Population_Distribution_Age_Gender.pdf
A Table with Suppressed Data http://www.dhcs.ca.gov/dataandstats/statistics/Documents/RASD_Issue_Brief_MC_Births.pdf
STEPS for the Par-dbr
Defining Table Cell The Cells in the table are the boxes that have values in them, as opposed to the row and column headers Table Cell Year # of Medi-Cal Members in Fee For Service (in thousands) # of Medi-Cal Members in Managed Care (in thousands) 2012 2,775 4,853 2011 3,067 4,527 http://www.dhcs.ca.gov/dataandstats/statistics/Documents/1_6_Annual_Historic_Trend.pdf - Data in the Table
Defining Numerator & Denominator Numerator – number of events with the characteristics of the given row and column Denominator – the population from which the events arise Year # of Medi-Cal Members in Fee For Service (in thousands) # of Medi-Cal Members in Managed Care (in thousands) 2012 2,775 4,853 2011 3,067 4,527 Numerator # of Medi-Cal Members in Fee For Service (in thousands) 2,775 Denominator # Medi-Cal Members in 2012 (in thousands) 7,628 http://www.dhcs.ca.gov/dataandstats/statistics/Documents/1_6_Annual_Historic_Trend.pdf - Data in the Table
A stepwise decision tree to assess aggregate data for de-identification Serves as a tool and guideline for the Expert Determination
Reporting Assessment Decision Tree Steps 1 & 2 A minimum cell size is set for the Numerator A minimum value is set for the Denominator Both the minimum cell size for the numerator and denominator must be met DHCS has identified a minimum value of 11 for the numerator rule and a minimum value for the denominator 20,000. Both conditions must be met to release the data in the table cell, otherwise proceed to Step 3
Step 3 – Apply Publication Scoring Criteria to assess risk Step 4 – Suppress Small Cells and Complimentary Cells if score is greater than 12
Common Public Reporting Variables A symbol standing in for an unknown numerical value in an equation Common variables in health data aggregation: Age Sex Race Ethnicity Time Geography (State, County, Medical Service Study Area, ZIP Code) Diagnosis/Condition Provider (Type, Specialty, Location)
Variables - Ranges A given variable my have different ranges assigned to it Ranges assigned to the variable may be defined many ways Example – Age Groupings 0-10, 11-20, 21-30, etc. (years old) … provides equal groupings, commonly used, may not apply to a particular program 0-4, 5-11, 12-18, 19-21 (years old) … correlates to school environments: pre-school, elementary, junior high/high school, post-school
Publication Scoring Criteria – Step 3 Variable Characteristics Score Sex Male or Female +1 Age Range >10-year age range +2 6-10 year age range +3 3-5 year age range +5 1-2 year age range +7 Race Group White, Asian, Black Detailed Race Hispanic Ethnicity yes or no Detailed ethnicity Language Spoken English, Spanish, Other Language
Publication Scoring Criteria – Cont. Variable Characteristics Score Events 1000+ events in a specified population +2 (Numerator) 100-999 events +3 11-99 events +5 <11 events +8 Geography State or geography with population >2,000,000 -5 Population 560,001 - 2,000,000 -3 Population 20,001 - 560,000 Population ≤ 20,000 Data Year 5 years aggregated 2-4 years aggregated 1 year (e.g., 2001) Bi-Annual +4 Quarterly Monthly +7
Compiling the Score Release Scoring Criteria approximately quantifies two re-identification risks: size of potential population variable specificity Add the score assigned to each variable characteristic: If the score is more than 12, cell sizes must be 11 or more before releasing data
Suppression – Step 4 Complimentary Cells Any time a single cell could be calculated based on row or column totals when it is suppressed, then additional cells in that row and/or column will also need to be suppressed The total value of the cells suppressed should be 11 or more Additional Aggregation Examples Extend the time period included Group regional geography Goal – large enough groups that suppression is no longer necessary
Aggregate Variables - Examples Higher Aggregation Lower Aggregation Age Groups: 0-21, 22-64, 65 and older Ethnicity: Hispanic or Not Hispanic Geography: Medical Service Study Areas (542 in CA) Time: three years Diagnosis: Diagnosis Related Groups (DRGs), Episode Grouper Age Groups: one year increments (1, 2, 3, etc.) Race: White, Black, American Indian, Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, Native Hawaiian, Guamanian, Samoan, Other Geography: ZIP Code (2,591 in CA) Time: monthly Diagnosis: Specific ICD-9 Codes
Aggregation To achieve a minimum number in the given cell, results are combined over the associated variable: Geographic areas, Multiple years, or Subgroups (e.g., age groups) The number of variables in a table will affect the amount of aggregation necessary For example, if the results for a 5 year age group (ages 1-5 years of age) do not yield an adequate number of cases, then the age group is extended to cover more ages (1-10 years of age)
Step 5 – Approval Processes Program Management Expert Determination Office of Legal Services Privacy Team Office of Public Affairs
Public Aggregate Reporting for DHCS Business Reports A multi-step process supports public reporting Balances public reporting with protecting confidentiality Will continue to review and revise as the data landscape changes and matures DHCS is committed to supporting data publishing while being consistent with the HIPAA De-identification Standard
Thank you!