Download presentation
Presentation is loading. Please wait.
Published byAshlynn Lucy Reeves Modified over 9 years ago
1
Making Sense of Census Data Robert Matthews University of Alabama at Birmingham
2
Introduction Our cohort consists of a 5% sample of the entire U.S. Medicare population from 1999-2006 Our cohort consists of a 5% sample of the entire U.S. Medicare population from 1999-2006 Zip+4 (9-digit) information available for 99.9% of beneficiaries and providers Zip+4 (9-digit) information available for 99.9% of beneficiaries and providers Task was to link our cohort with the census data to obtain demographic variables: Educational attainment, median household income, and total population Task was to link our cohort with the census data to obtain demographic variables: Educational attainment, median household income, and total population
3
Hierarchical Relationships of Census Geographic Structures Source: U.S. Census Bureau, Summary file 3 documentation
4
Summary Files “Short Form” “Short Form” Summary File 1 (SF 1) – data from the Short Form questions Summary File 2 (SF 2) – data from the Short Form questions, repeated for 249 population groups Redistricting Data – used for congressional and state redistricting
5
Summary Files “Long Form” Only asked for a sample of the U.S. population (1/6 households) “Long Form” Only asked for a sample of the U.S. population (1/6 households) Summary File 3 (SF 3) – comprehensive results from the Long Form Summary File 4 (SF 4) – comprehensive results from the Long Form, repeated for 335 population groups
6
Summary File 3 components 53 sets of files 53 sets of files 50 U.S. States 50 U.S. States District of Columbia (D.C.) District of Columbia (D.C.) Puerto Rico Puerto Rico All states combined All states combined 53 x 77 = 3,927 files 53 x 77 = 3,927 files
7
Linking Census and Medicare data Census Block Group is used to link Census and Medicare data Census Block Group is used to link Census and Medicare data Census block is a 4-character variable and the block group is identified by the value in the first position Census block is a 4-character variable and the block group is identified by the value in the first position We obtained a database containing 66 million Zip+4 zipcodes from Melissa Data so that we could get the census tract and block for each zip We obtained a database containing 66 million Zip+4 zipcodes from Melissa Data so that we could get the census tract and block for each zip
8
Variables of interest Variable description Variable description Educational attainment (Table P37) Educational attainment (Table P37) Median household income (Table P53) Median household income (Table P53) Total population (Table P1) Total population (Table P1) Tables mapped to File Segmentation Table Tables mapped to File Segmentation Table P37 File 04 P37 File 04 P53 File 06 P53 File 06 P1 File 01 P1 File 01
9
Summary File 3 components Geographic Identifier file (GeoID) Geographic Identifier file (GeoID) 76 data files containing different sets of variables 76 data files containing different sets of variables GeoID file is linked to each of the 76 data files by a variable named LogRecNo GeoID file is linked to each of the 76 data files by a variable named LogRecNo The Summary Level must be selected from the GeoID file to extract the desired stratification level. This is used to identify the specific area being tabulated. The Summary Level must be selected from the GeoID file to extract the desired stratification level. This is used to identify the specific area being tabulated.
10
Summary Level Sequence Chart (partial listing) Geographic component Summary Level 00, 01-49, 52-95 040 State 00, 01, 43, 49 050 State-County 050 State-County 00 060 State-County-County Subdivision 060 State-County-County Subdivision 00 070 State-County-County Subdivision-Place/Remainder 070 State-County-County Subdivision-Place/Remainder 00 080 State-County-County Subdivision-Place/Remainder-Census Tract 080 State-County-County Subdivision-Place/Remainder-Census Tract 00 085 State-County-County Subdivision-Place/Remainder-Census Tract- 085 State-County-County Subdivision-Place/Remainder-Census Tract- Urban/Rural Urban/Rural 00 090 State-County-County Subdivision-Place/Remainder-Census Tract- 090 State-County-County Subdivision-Place/Remainder-Census Tract- Urban/Rural-Block Group Urban/Rural-Block Group 00 067 State [Puerto Rico Only]-County-County Subdivision-Subbarrio 067 State [Puerto Rico Only]-County-County Subdivision-Subbarrio 00 140 State-County-Census Tract 140 State-County-Census Tract 00 144 State-County-Census Tract-American Indian Area/Alaska Native 144 State-County-Census Tract-American Indian Area/Alaska Native Area/Hawaiian Home Land Area/Hawaiian Home Land 00 150 State-County-Census Tract-Block Group 150 State-County-Census Tract-Block Group … more levels …
11
Subject Locator Index designed to quickly identify tables in the summary file for particular subjects or topics of interest. Index designed to quickly identify tables in the summary file for particular subjects or topics of interest. Arranged alphabetically by name of subject Arranged alphabetically by name of subject Each row contains the type of entry and the relevant table number for the data source Each row contains the type of entry and the relevant table number for the data source
12
Subject Locator Index (partial listing) Subject Description Subject Table Numbers Median Income (dollars) Families FamiliesP77 by Family Type by Presence of Own Children Under 18 Years by Family Type by Presence of Own Children Under 18 YearsPCT40 by Presence of Own Children Under 18 Years by Presence of Own Children Under 18 YearsPCT39 Households HouseholdsP53 by Age of Householder by Age of HouseholderP56 Nonfamily Households Nonfamily HouseholdsP80 by Sex of Householder by Living Alone by Age of Householder by Sex of Householder by Living Alone by Age of HouseholderPCT42 Occupied Housing Units Occupied Housing Units by Tenure by TenureHCT12 Population 15 Years and Over With Income Population 15 Years and Over With Income by Sex by Work Experience by Sex by Work ExperiencePCT45 … more subjects …
13
Summary of steps for identifying variables and merging with cohort 1. Use Subject Locator to identify variables of interest and their corresponding table numbers 2. Use File Segmentation Table to identify specific data file(s) for each table number 3. Use the Summary Level Sequence Chart to locate the desired stratification level 4. Identify SAS input statements to read each file 5. Merge census variables with existing cohort data
14
Conclusion Daunting task due to large volume of Census data and documentation Daunting task due to large volume of Census data and documentation Well organized into a manageable set of distinct components Well organized into a manageable set of distinct components Flexibility comes at cost of thousands of variables and data files Flexibility comes at cost of thousands of variables and data files Process of extracting variables from Census data becomes much easier once all the pieces are in place Process of extracting variables from Census data becomes much easier once all the pieces are in place
15
Contact information Robert Matthews University of Alabama at Birmingham Department of Epidemiology 1665 University Blvd. RPHB 517 Birmingham, AL 35294-0022 Email: rsm@uab.edu rsm@uab.edu Web: www.epi.soph.uab.edu/rsm/ www.epi.soph.uab.edu/rsm/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.