ACS Public Use Microdata Samples DataFerrett SACOG Luz M Castillo Data Dissemination Specialist Los Angeles Regional Office U.S. Census Bureau
Outline Summary Data vs. Microdata Fundamentals of PUMS Data Geography and the PUMS Accessing PUMS Data Documentation and Guidance
Summary Data Versus Microdata Premade or published tables Easy to get, even for small areas Limitations: fixed content Dataset of individual responses to questionnaire Enables custom tables and analyses Limitations: edits to protect privacy, can’t study small areas 3 3
Summary Data Source: 2010 ACS 1-year Estimates. Table B04001. FIRST ANCESTRY REPORTED 4
Microdata Source: 2010 ACS 1-year PUMS file
Microdata in SAS Source: 2010 ACS 1-year PUMS file.
Outline Summary data vs. Microdata Fundamentals of PUMS Data Geography and the PUMS Accessing PUMS Data Documentation and Guidance
What are PUMS data? Public Use anonymized, downloadable Microdata records of individual people Sample a representative sample of the population 8
PUMS Overview PUMS sample is a subsample of ACS interviews, one percent of all US households PUMS is a “weighted” sample Weighting variables must be used in analysis A set of two files - housing units and persons Available as SAS files, CSV files, via DataFerrett and redistributors such as IPUMS 9 9
Why Use PUMS? Data needed for a tabulation or a specific universe not supported by standard ACS tables (e.g., population groups by single year of age) Statistical analysis required to understand relationships between economic, demographic or housing variables (e.g., correlation analysis) Can create new measures using multiple variables or other people in household (spouse’s occupation, same-sex couples, number of kids) 10 10
ACS PUMS Availability Produced every year since 2000 Person-level files includes about 250 variables Housing unit files include about 200 variables Includes people in housing units and group quarters Includes many useful constructed variables (e.g., poverty status, subfamily identification, etc.) Includes collapsed codes for some variables (e.g., race, Hispanic origin, ancestry, place of birth, industry, occupation, etc.) 11
Person records in ACS PUMS (millions) Person records in ACS complete data (millions) Population represented 2001 1.2 285 2002 287 2003 290 2004 293 2005 2.9 4.5 296 2006 3.0 298 2007 301 2008 304 2009 307 2010 3.1 309 2011 5.0 312 12
Types of PUMS Files Released We release 3 new PUMS files every year 1 year PUMS (example: 2015 1-year PUMS) October 3-year PUMS (example: 2011-2013 3-year PUMS) Discontinued after 2013 5-year PUMS (example: 2011-2015 5-year PUMS) January Most documentation released one week prior to data 13
Modifications to Multiyear PUMS Multiyear PUMS have the same cases and geography as their component 1-year files How are multiyear PUMS different from single year? Weights are produced using latest population estimate “vintages” Coding schemes and dollar amounts are standardized Why use the multiyear PUMS files? For studying small groups, where more cases are needed When analysis is also making use of multiyear summary data 14
Outline Summary data vs. Microdata Fundamentals of PUMS Data Geography and the PUMS Accessing PUMS Data Documentation and Guidance
Limited Geographic Detail Geographic identifiers are region, division, state, PUMA PUMAs can be used to identify most cities of 100,000+ and many metropolitan areas, but not all Combinations of adjacent counties and census tracts within states Also, divisions of geo areas (counties/cities) PUMS is not designed for statistical analysis of small geographic areas
Public Use Microdata Area (PUMA) Defined after each census by the states in coordination with the Census Bureau’s Geography Division Redefined PUMAs for 2012 PUMS files Forthcoming multiyear files to have dual PUMA vintages Large enough to meet disclosure avoidance requirements An area of size 100,000 population or more To determine population, housing, or land ratio visit the Missouri State Data Center site PUMAs are identified by a five-digit number, unique within each state 17 17
Public Use Microdata Areas
PUMA Maps http://www.census.gov/geo/maps-data/maps/2010puma/st06_ca.html
PUMA Maps
2010 Census – PUMA Reference Map: Sacramento City (Central/Downtown & Midtown) 21 21 21
Outline Summary data vs. Microdata Fundamentals of PUMS Data Geography and the PUMS Accessing PUMS Data Documentation and Guidance
American FactFinder 23
American FactFinder (cont’d) 24
American FactFinder (cont’d) Main benefit of accessing PUMS via AFF: Convenient access if comfortable with AFF from regular use of summary tables
Census Bureau FTP Site
Census Bureau FTP Site (cont’d) Main benefit of accessing PUMS via FTP: Complete listing of files by year and state
DataFerrett 28
DataFerrett (cont’d) Main benefit of accessing PUMS via DF: Menu driven system doesn’t require knowledge of a stats package (i.e. SAS, SPSS, etc.) Ability to download variables individually 29
Powerful Tabulation Capabilities Simple table layout that supports: Flexible design Frequencies and trends Spreadsheet math for robust analysis Complex nesting Hide columns/rows Applies weighting variables Fast results using large datasets Save as HTML, PDF & JPEG
Highlight spreadsheet rows or columns to create: Data Visualization Highlight spreadsheet rows or columns to create: Maps Graphs
What We’re Working On Calculating variances on-the-fly for microdata tabulations Calculating margins of error for custom summations of aggregate data Integrating Google maps with DataFerrett thematic maps
Outline Summary data vs. Microdata Fundamentals of PUMS Data Geography and the PUMS Accessing PUMS Data Documentation and Guidance
PUMS Documentation Subjects in the PUMS Code Lists PUMS Top Coded and Bottom Coded Values PUMS Estimates for User Verification Accuracy of the PUMS http://www.census.gov/acs/www/data_documentation/pums_documentation/ 34
PUMS Guidance Compass Handbook on Using PUMS http://www.census.gov/acs/www/guidance_for_data_users/handbooks/ soup-to-nuts overview of getting and using the data Training PPT on Using PUMS http://www.census.gov/acs/www/guidance_for_data_users/training_presentations/ overview of PUMS basics
Exercise 1 In Placer County, how many foreign born individuals entered before 2000, between 2000 and 2009 and after 2010?
Exercise 1 – Nativity and Year of Entry Access: American Community Survey, 2015 1-Year Estimates PUMS Foreign Born and Year of Entry Variables Create a Recode for Year of Entry All PUMAS within Placer County Create a Table
Go to www.census.gov Type ‘DataFerrett’ in the Search Box
Click ‘TheDataWeb – DataFerrett’
Launch DataFerrett
CAUTION Do Not Navigate Away or Close This Window While DataFerret is Loading
Enter Your Email Address and Click ‘Ok’
Click ‘Get Data Now’
American Community Survey with PUMS and Other Datasets
Select American Community Survey Open Public Use Microdata Sample to view years Select 2015 Click View Variables (drop down)
Click ‘Selectable Geographies’ and ‘Population’ Click ‘Selectable Geographies’ and ‘Population’. Click ‘Search Variables’
Click on ‘Variable Label’ to Alphabetize Column
Select ‘Nativity’. Hold control button down and select ‘Year of Entry (YOEP)’. Click ‘Browse/Select Highlighted Variable’ (Blue Button).
Check the box next to ‘Select’ ACS Nativity’
Highlight ‘ACS YOEP’ Check the box next to ‘Select’ ACS YOEP Year of entry’ Click ‘OK’
You have added 2 variables for your DataBasket Click ‘OK’
Double Click to ‘Selectable Geographies’ Variable’ Click ‘Browse/Select Highlighted Variables’ (Blue Button)
Select ‘Public Use Microdata Area’ from ‘Types of Geographies Available’. Highlight the PUMA code in the Hierarchies section and click ‘Use Hierarchy’ Hierarchies
Double click ‘California’ from ‘Select State of current residence’ Double click ‘California’ from ‘Select State of current residence’. Highlight ‘California’ in middle box and click ‘Next Level’
Note: ALL PUMAs in California are Listed by County Double Click or Highlight and drag PUMA/s to box on far right. Click ‘Finish’
Note: There are 3 variables in DataBasket Click on ‘Step2: DataBasket/Download/Make A Table’
Highlight ‘Year of Entry’ variable Highlight ‘Year of Entry’ variable. Click ‘Recode Variable’ from right side of screen
Rename ‘Recode1’ to ‘Year of Entry Recode’
Highlight the categories from ‘1921 to 1999’ and click ‘Recode’ button below
Highlight all of the categories from ‘2000 to 2009’ and click ‘Recode’ button below
Note: there are three categories for the new recoded variable Note: there are three categories for the new recoded variable. Change the ‘Label’ Names by double clicking inside the cells. (Make sure to hit the Enter Key when completed).
Note: ‘Year of Entry Recode’ now listed Click ‘Make a Table’
Click ‘OK’
You Will Now Make A Nested Table Using the Variables
Drag the ‘Geog-101 PUMA’ to ‘C1,R2’
Drag ‘RECODE1 Year of Entry’ to ‘C2,R1’
Nest ‘Nativity’ variable by dropping it onto any of the ‘Year of Entry Labels’.
Click ‘GO Get Data’
From File, Click ‘Save As’
You Can Save to Your Desktop Save File as Text Document – Comma Delimited (Excel)
Exercise 2 In Sacramento County, what age group under 50 has a higher estimate of individuals with a disability?
Exercise 2 - Age and Disability Access: American Community Survey, 2015 1-Year Estimates PUMS Population with a Disability Create a Recode for Age Disaggregation All PUMAS within Sacramento County Create a Pivot Table
Go to the ‘Step1’ Tab and Click ‘Empty DataBasket’
Select American Community Survey Open Public Use Microdata Sample to view years Select 2015 Click View Variables (drop down)
Click ‘Selectable Geographies’ and ‘Population’ Click ‘Selectable Geographies’ and ‘Population’. Click ‘Search Variables’
Click ‘Variable Label’ to Alphabetize Column
Select ‘Age’. Hold control button down and select and ‘Disability Recode’. Click ‘Browse/Select Highlighted Variables’ (Blue Button)
Check the box next to ‘Select’ ACS AGEP’
Highlight ‘ACS Disability Recode’ Check the box next to ‘Select’ ACS DIS Disability Recode’ Click ‘OK’
You have added 2 variables for your DataBasket Click ‘OK’
Note: 2 Variables selected in Data Basket Double Click to ‘Select Geographies’ Variable Click ‘Browse/Select Highlighted Variables’ (Blue Button)
Select ‘Public Use Microdata Area’ from ‘Types of Geographies Available’. Highlight the PUMA code in the Hierarchies section and click ‘Use Hierarchy’ Hierarchies
Double click ‘California’ from ‘Select State of current residence’ Double click ‘California’ from ‘Select State of current residence’. Highlight ‘California’ in middle box and click ‘Next Level’
Note: ALL PUMAs in California are Listed by County Double Click or Highlight and drag PUMA/s to box on far right. Click ‘Finish’
Note: There are 3 variables in DataBasket Click on ‘Step2: DataBasket/Download/Make A Table’
Highlight ‘Age’ variable Highlight ‘Age’ variable. Click ‘Recode Variable’ from right side of screen
Rename ‘Recode1’ to ‘Age Recode’
Change Range to ‘1 through 17’ and Click ‘Recode’ 2 Change Range to ‘18 through 19’ and Click ‘Recode’ 3 Change Range to ‘20 through 24’ and Click ‘Recode’
Change the rest of the age groups and recode
Note: there are nine categories for the new recoded variable Note: there are nine categories for the new recoded variable. Change the ‘Label’ Names by double clicking inside the cell. (Make sure to hit the Enter Key when completed).
Note: ‘Age Recode’ now listed Click ‘Make a Table’
Click ‘OK’
You Will Now Make A Nested Table Using the Variables
Making a Pivot Table 1. Drag and Drop “Recode 1 Age” to C1, R2 2 . Drag and Drop “GEOG-101 to C2, R1 3. Drag and Drop “Disability” above R1
Click ‘GO Get Data’
From ‘File’ drop-down, Click ‘Save As’
Save Your Table
Exercise 3 For each race group, which age group has the highest estimated number of males and females?
Exercise 3 – Sex by Race and Age Accessing: American Community Survey, 2015 1-Year Estimates PUMS Add more variables Create a Race, Sex and Age Disaggregation All PUMAS within Sacramento County Create a Table, Chart and Map
Close Table and Click ‘Step1’ Tab
Select American Community Survey Open Public Use Microdata Sample to view years Select 2015 Click View Variables (drop down)
Click ‘Population’. Click ‘Search Variables’
Click on ‘Variable Label’ to Alphabetize Column
Select ‘RAC1P-Recoded Detailed Race Code’ Select ‘RAC1P-Recoded Detailed Race Code’. Hold control button down and select and ‘Sex’. Click ‘Browse/Select Highlighted Variables’ (Blue Button)
1. Click ‘Select ALL Variables’ 2. Click ‘OK’ 3. Confirm that you have modified 2 Variables by Clicking ‘OK’
Click ‘Step2’ Tab and Click ‘Make a Table’
1. Drag and Drop “RAC1P” to C1, R2 2 . Drag and Drop “GEOG-101 to C2, R1 3. Drag and Drop “Recode1 Age” to C1, R2 (On top of ‘Total RAC1P’)
Click ‘GO Get Data’
To Create a Bar Chart or Map, Change the Variable Label and Highlight the estimates in that row, Click the Chart or Map Icons
Resources: Need Assistance? Data Dissemination Branch Customer Liaison and Marketing Services Office U.S. Census Bureau (844) ASK-DATA Toll Free Census.askdata@census.gov Luz.M.Castillo@census.gov Cell: 818-515-3748 112 112