The Rocky Mountain Research Data Center Advancing the Frontiers of Social Science: Opportunities and Challenges Jani Little, Executive Director Katie Genadek, Expected Administrator The Rocky Mountain Research Data Center A local and limitless data resource Jani Little, Executive Director (jani.little@colorado.edu) Phil Pendergast, Administrator (philip.m.pendergast@census.gov) Molly Graber, Graduate Research Assistant (molly.graber@colorado.edu) https://www.colorado.edu/rocky-mountain-research-data-center/ The Rocky Mountain Federal Statistical Research Data Center (RMRDC) Jani Little Executive Director
What is a Federal Statistical Research Data Center (FSRDC)? --RMRDC is one of 30 FSRDCs around the U.S. --A secure computing lab where restricted data, collected by federal statistical agencies, can be accessed FOR STATISTICAL PURPOSES ONLY --FSRDCs are managed by an on-site Census employee—the administrator— who guides researchers on proposal development, enforces security guidelines, and serves as liaison with the research community.
9 in 2009 and 24 in 2016
RMRDC Consortium Members
UC Denver (Anschutz and Main Campus) Advantages to Faculty, Grad Students, and Affiliated Researchers: --Free access to RMRDC services and secure laboratory --NCHS Data Fees Paid for first three proposals before July 1, 2020 --Researchers with continued use are expected to write grant proposals and include lab fees
Advantages of FSRDC Research: --Microdata not available publicly firms and establishments children --Variables not available in public versions of data sets (e.g., low level geography, precise income, occupation, race, and ethnicity) --Full censuses and samples microdata (Decennial Census, ACS, CPS) better estimates of rare events, small populations, small areas
Requirements for Any FSRDC Project: --Research projects must undergo a formal approval process with the agency that owns the data, e.g., Census, NCHS, AHRQ --Researchers must go through a background investigation that qualifies them for “Special Sworn Status (SSS)” which makes them an unpaid Census Bureau employee. --Results must be formally reviewed for disclosure violation before they leave the secure facility. Currently 260 active projects, 50% are Census
Useful Data Sets for Occupation and Health Census Demographic surveys: https://www.census.gov/ces/dataproducts/demographicdata.html Longitudinal Employer-Household Dynamics : https://www.census.gov/ces/dataproducts/lehddata.html Employment History Files (EHF) Employer Characteristic File (ECF) Personal Characteristics File (ICF)
Useful Data Sets for Occupation and Health Bureau of Labor Statistics (BLS) Census of Fatal Occupational Injuries (CFOI) Population information on fatal occupational injuries occurring in the U.S. Employer: Basic establishment characteristics, industry, name and address(?) Employee: Employee status (wage/salary/self-employed), occupation, demographics Injury: Date, incident details (activity performed, cause and nature of injury) Survey of Occupational Injuries and Illnesses (SOII) Nationally representative information on establishment-level injuries Similar employer characteristics, including name and address of establishment Similar injury characteristics, including time missed, detailed illness and care records
Useful Websites for Health Data NCHS Data https://www.cdc.gov/rdc/b1datatype/dt122.htm Restricted AHRQ Data https://meps.ahrq.gov/mepsweb/data_stats/onsite_datacenter.jsp NHIS Restricted Data: https://www.cdc.gov/rdc/b1datatype/Dt1225.htm
Useful Websites for Health Data NCHS Surveys Linked to NDI Mortality Data https://www.cdc.gov/nchs/data-linkage/mortality.htm National Vital Statistics System https://www.cdc.gov/nchs/nvss/dvs_data_release.htm
National Health and Nutrition Examination Survey (NHANES) Huge amount of health data potentially related to occupational/environmental exposure Diagnoses for many different cancer types and age of diagnosis Job-related asthma, tobacco exposure, sound exposure, mineral dust exposure, exhaust fumes Treatments for specific diagnoses: surgery, radiation treatment, medicines Geographic Identifiers (restricted data) Latitude/Longitude coordinates for household Census (2000, 2010) and postal geographies Occupation/Industry Codes (restricted data) 4 digit codes for current and longest-held jobs Healthcare utilization Type of facility and how often is care utilized? Linked to National Death Index (1999-2014) ICD codes for cause of death including many related to occupation (e.g. fumes and vapors) + Family structure (restricted), demographic information, mental health, typical health behaviors (e.g. tobacco and alcohol use), etc.
National Vital Statistics System (NVSS) State and County geographic identifiers Birth, death, fetal death, linked birth/infant death microdata Exact dates of vital events Undergoes initial review by National Association for Public Health Statistics and Information Systems Basic demographic information Detailed causes of death
Geography matters People’s social and environmental contexts impact individual outcomes. To fully understand how people’s health and wellbeing are affected by their environment, we need to draw the boundaries correctly. Some processes clearly follow administrative boundaries — variation in property tax rates, for example. Many, however, do not neatly nest into these. Where is the area affected by point polluters? Where are neighborhood boundaries in the social and physical landscape?
Data quality and denominators In the arena of public health, drawing appropriate boundaries is part of what allows us to determine the correct denominators. Currently, tracts are often treated as a proxy for neighborhoods. In the case of survey data, drawing boundaries around naturally occurring clusters in social space directly improves data quality.
“Census tract 43.06 is the wealthiest in the group”
But the data are very uncertain
“Census tract 43.06 is the poorest in the group?”
Where do tracts come from? “Census tracts are small, relatively permanent geographic entities within counties (or the statistical equivalents of counties) delineated by a committee of local data users. Generally, census tracts have between 2,500 and 8,000 residents and boundaries that follow visible features. When first established, census tracts are to be as homogeneous as possible with respect to population characteristics, economic status, and living conditions.” –Geographic Areas Reference Manual, Bureau of the Census
The value of using RDC microdata What patterns would emerge if we had a map off all households in the nation? How might our understanding of context, and its influence on health outcomes, change? Master Address File — RDC microdata allows us to link demographic variables with lat + lon of (almost) all households
The value of using RDC microdata What patterns would emerge if we had a map off all households in the nation? How might our understanding of context, and its influence on health outcomes, change? Master Address File — RDC microdata allows us to link demographic variables with lat + lon of (almost) all households
New conceptualizations of neighborhoods Egocentric neighborhoods: how does context change with scale? Randomized geographic boundaries: how much do measures of context vary if we change the shape and size of enumeration units Street-based geographies: What happens if we treat streets as meaningful units of social space, rather than boundaries?
Some preliminary findings There is significant congregation of ethnoracial groups at very small scales — a concentration that isn’t visible at larger spatial units. Context varies by income group, and patterns vary by city in a way that extends beyond the overall racial composition. Standard Deviation of Individual Context: how much variation is there in local context?
The data available in the RDC, particularly the MAF, allows us to rethink how we define study areas and baselines. Based on some ongoing work, we cannot simply trust that existing units are the correct shape and scale to capture socioeconomic processes.
I can help! molly.graber@colorado.edu