Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service 2008-12-03,

Slides:



Advertisements
Similar presentations
Get Started with GIS Mapping Part 2 of 3 Madhu Lakshmanan.
Advertisements

DLI Orientation: Concepts A Framework for Thinking about Statistical Information Train the Trainers Montreal, March 9, 2004 Chuck Humphrey Data Library.
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Accessing and Using the e-Book Collection from EBSCOhost ® When an arrow appears, click to proceed to the next slide at your own pace. To go back, click.
Chuck Humphrey Data Library University of Alberta.
GEOG Introductory GIS for the Social Sciences 15 September 2014 Neil Hanlon.
2004 OLA - E-STAT Census and CANSIM data: Comparison of providers Presentation for OLA Conference 2004 “Discovering the World of Numbers: Statistics Canada’s.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
GEOG 3P47 Mapping Stats Colleen Beard Map & Geography Librarian Room MC C306
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
Small Area Statistics Standard Census Geography and Locating Small-Area Statistics.
STATISTICS CANADA SURVEY LIFECYCLE WOLFVILLE, APRIL 2008 SURVEY LIFECYCLE Michel B. Séguin Atlantic DLI Training.
Creating Historical Digital Census Boundary Maps for Canada - a pilot project Andrey Petrov, Laine Ruus, Data and GIS Services, University of Toronto Presented.
Chasing Chilliwack: recent historical Canadian census aggregate statistics [version 2] A workshop at ACCOLEDS 2008 Laine Ruus Laine Ruus
Introduction to the Canadian Census of Population With Peter Peller Maps, Academic Data, Geographic Information Centre (MADGIC)
Using Statistics Canada Census Data in Institutional Research Karen Menard and James MacLean Presentation to CUPA – June 23, 2009.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
The Census Quartet Finding Census Data E. Hamilton November 2003 ACCOLEDS Training December 2003.
NAICS? YIKES! (North American industry classification system (NAICS)? Yearly index of constant (k) dollar estimates (YIKES)!) Jeff Moon, Queens
Adding an Address To avoid adding duplicate addresses, Always use Find first. If you cannot find an address then add it. A warning should display if an.
Microsoft Word 2000: Mail Merge Basics Peggy Serfazo Marple Molly Calvello Support Professionals Business Applications - Desktop Microsoft Corporation.
Address Refer to Slide 2 for instructions on how to view the full-screen slideshow.Slide 2.
What’s New in VRS? GUGM May 15, 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
SDA: a tool for teaching and research with microdata Laine Ruus University of Toronto. Data Library Service.
Next on OPRAH – Bringing Data Out of the Closet Walter Giesbrecht, Data Librarian York University Jeff Moon, Head, Documents Unit Queen’s University OLA.
Searching for Statistics Why can’t we find the data we need? Where should we even start?
Let VRS Work for You! ELUNA Conference 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
1 The 2001 Census PUMFS Odyssey Sponsored by HAL and PALS Presented by Chuck Humphrey.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
How to get data for small areas: Example: Regency of Bangli in the province of Bali, from the 2010 and 2000 census samples of Indonesia 1.Login 2.Browse.
DLI Boot Camp 2011 Finding Statistics: Tools and Techniques Jean Blackburn Vancouver Island University Library SDA.
POLS 328.3: Public Policy Analysis Finding data and statistics.
Colleague, Excel & Word Best of Friends Presented by: Joan Kaun & Yvonne Nelson College of the Rockies.
2006 Census Recensement de Census Geography  DLI – Wolfville, Nova Scotia April 24, 2008 Marc Melanson Eastern Region Halifax, Nova Scotia Statistics.
Framework of Statistical Information. This is a typology of the categories or classes of statistical information. Remember the relationship between statistics.
ISR Training February 12,  Types of information you’ll find  Searching the website  Finding statistics using... ◦ Browse By Subject (Summary.
Soc : Principles of Research Design LONGITUDINAL DATA Sunny Kaniyathu, Data Services Librarian.
Step by Step Instruction: How to Conduct Direct Certification using File Upload: Standard Format Released January 2014 “How to Conduct Direct Certification.
ISR Training Jan. 21,  Canada’s largest survey  Complete population count  Gathers information on the demographic, social and economic conditions.
Beyond 20/20 for Beginners. Plan Who needs Beyond 20/20 anyway? ◦ What is Beyond 20/20, and what can we do with it? Pros and cons of using 20/20 How to.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
RRM : Resource Data and Environmental Modeling DATA SOURCES Sunny Kaniyathu, Data Services Librarian.
Academic 2016 Student Enrolment Day 1 Integrated National Education Information System (iNEIS TM )
National Boot camp Vancouver Heather Dryburgh and Michel B. Séguin May 31 st, 2011 Survey Life cycle.
Handling Reference Questions DLI Orientation Session Kingston, Ontario April 5, 2004.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
1 Working with Canadian Census Microdata Martine Grenier and Mokili Mbuluyo Census Operations Division, Statistics Canada December 2007.
Atlantic DLI Training April 26, 2012 Carolyn DeLorey.
Anticipating Great Things: A 2006 Census Preview June, 2006 DLI, Ottawa, ON Paul Schwets // Stuart Fyffe.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
Soc 332.6: Principles of research design Finding statistics.
Health Statistics 2016 DLI Atlantic Training
Rural Development Finding data and statistics.  Statistics Canada: Federal statistical agency  Data released under the Data Liberation Initiative (DLI)
Regional DLI Training: Introduction to PCCF St. John’s Newfoundland Berenica Vejvoda May 5-6, 2016.
Enlisted Association of the National Guard of the United States Data Extract Instructional Guide.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Creative Create Lists Elizabeth B. Thomsen Member Services Manager
Small Area Data and Geography For the 2017 DLI Training Workshop
Geo-referenced data and DLI aggregate data sources
Navigating Your Way Through the EFT, Nesstar and Beyond 20/20 (WDS)
Finding Census Data E. Hamilton November 2003
Test Information Distribution Engine (TIDE)
The reference interview
By A.Arul Xavier Department of mathematics
Health Indicators and other Health Stats Topics
Presentation transcript:

Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service , revised

This session will cover: Setting up the original file of postal codes Decisions the researcher has to make Extracting census geography from the Postal code conversion file (PCCF) Things to consider when merging the output from the CHASS PCCF interface, and your file of postal codes

The process, briefly, is: Extract census geography from the PCCF for the area covered by the survey postal codes Merge with the original survey data, based on the postal codes. Extract required census variables (eg from profile files) with census geography ids Merge with the original survey file which now includes census geography, by census geography ids There are a number of different ways of doing this. This is one of them.

What is the PCCF? Postal codes have no direct spatial existence Postal codes represent where residents receive their mail, not where they live The PCCF contains one record for each postal code- dissemination block pair, with all other census geographic codes, and some Canada Post management variable for each dissemination block The PCCF contains no census data.

Often, the users file of postal codes looks something like this, eg an Excel file with other variables that need to be preserved in the final output file. Note that all these records include a 'Centre' variable. We will use that Information later. We start with the researcher’s file of postal codes.

First the file needs to be sorted by postal code, any errors fixed, and then sorted again…

Next, check for duplicate postal codes in the file. You just need to know they are there – you will see why later. Note, in this example, we have rural and urban postal codes as well as duplicate postal codes.

Load the file into SPSS and save it as a system file, Eg postal_codes.sav. SPSS can read an.xls file. Note the variable name of the postal code variable (‘Postalcode’), its type (string) and its width (8)

Now some decisions have to be made: Which census is closest to the time that the data were collected? (here we assume 2006) – Date of survey collection determines which census year – Date of census determines date of census geography. I.e. a survey done in 2009 needs 2006 census geography so as to link 2006 census statistics; but a survey done in 1992 needs 1991 census geography.

Decisions (cont’d) How much of Canada does the survey cover: urban areas only, or rural areas as well (we have rural areas in this example)? – 'B1A' thru 'B5A' are urban FSAs; 'B0E', 'B0H', etc are rural FSAs – If only urban areas are included in the file, the user can use census tract level statistics – If urban and rural areas are included, the user must use dissemination area, or CSD level statistics, or even FSA level

Dissemination area (DA) level: Covers all Canada Available back to 1961 (computer-readable form only) Smallest population for which statistics are released by STC Most likely to be suppressed because of population size or data quality Most susceptible to distortion when aggregating to higher levels of geography

Census tract (CT) level: In 2006, available for 33 CMAs, and 15 (of 111) CAs only Available back to 1951 (in print) and 1971 in computer-readable form Less likely to be suppressed for reasons of population size or data quality Less susceptible to distortion due to random rounding

If you are working with earlier files, with no dauid, eauid, or ctuid variables You can compute them: Eauid (pre 2001)= ((prov*1,000,000)+(fed*1,000)+ea)) Dauid (post 1996)= ((prov*1,000,000)+(cd*10,000)*da)) Ctuid=((CMACA*10,000)+ctname))

Once these decisions have been made: We know which PCCF file to use, And which geographic identifiers to use (in this example, Dauid) The CHASS census analyzer provides access to 3 postal code conversion files, containing 1996, 2001, and 2006 census geography respectively. Earlier versions (with 1981, 1986, and 1991 census geography) can be requested from UT/DLS, if they are not available from DLI

Extracting census geography ids from the Postal code conversion file (PCCF)

Select geography by eg FSAs, CDs, province, etc

Select substantive fields and an output format. Do not forget to click the 'best record' option.

Save this file to your hard drive with a.sps Extension, eg pccf_codes.sps

Load SPSS (again, if it’s not already loaded). Use Open/Data/Syntax and open pccf_codes.sps You will need to delete any lines containing angle brackets at the beginning and end of the file. Make sure that the postal code variable has the same variable name, type, and size as the postal code variable in the postal_codes.sav file. In order to match the order of the postal codes in postal_codes.sav file, sort the file on the postal code. Click on Run to create an SPSS system file, and save it as pccf.sav.

Still in SPSS, select Data/merge files/add variables to add the Dauid variable to the original postal_codes.sav file.

Because both files contain duplicates, we need to select the 'Both files provide cases' option. With no duplicates in the original file, select ‘Non active file is keyed table;

The resulting file contains a lot of postal code-dauid pairs that are not in the original postal_codes.sav file. They need to be deleted. Remember that all the records In the original file included a 'Centre' variable, coded 1, 2 or3. Use Data/Select cases to filter out the records that are not In the original sample.

We are now more than half-way. The file is currently sorted by postal code. For the next step, it needs to be sorted by dauid. Make a note of the variable name, type and size of the Dauid variable. Save the sorted file, under a new name, eg merge1.sav.

The CHASS Census Analyzer provides access to census profile files as several levels of geography (census subdivision is coming soon) and is included with your CHASS CANSIM subscription:

Using the same technique as before, select geography and subject matter from the 2006 dissemination area level profile file. Make sure you also select the Dauid identifier. Export format: SPSS Save the file with a.sps extension And a new name, eg. cc06_income.sps

Here I have retrieved the number of households, and average household income, as well as dauid and total population.

Run SPSS as before, to create a new system file. Sort by dauid, to make sure It is in the same order as the merge1.sav file. Save it with a new name, eg cc06_income.sav. Make sure the dauid variable is the same type and size as in merge1.sav. Now we need to merge the merge1.sav file and the cc06_income.sav file, by dauid.

Again, there are many records from the census profile file which are not in the original sample. These records need to be removed.

And at the end of this process -We have produced a file which contains -- the variables from the original file -- the census geography that is the closest match to the postal codes in the original file -- census substantive variables from the profile file