Public Use Microdata File (PUMF) 1. Change factors 2. Scenarios : characteristics 3. Analytic Content: additions and losses Outline DLI Ontario Training, Ryerson University, Dec. 13, 2007 Martine Grenier, Mokili Mbuluyo, Jean René Boudreau, Statistics Canada
2 1. Change Factors Improvement of the three files analytic content for greater use at the national and international levels Greater accessibility of census data Data confidentiality constraints File size Limited geography Age variable Income variable Late release of PUMFs Delay due to heavy workload of selecting, certifying and deriving variables and quality control on the files
3 Content 1. Sample sizeIndividuals: 800,000 records Families: 310,000 records Households and dwellings: 350,000 records 2. GeographyProvinces, Territories, CMAs 3. VariablesVariables extracted from the dissemination database Large number of derived variables Less detailed variables for Maritime provinces and Northern territories Variables repeated in the 3 files Reduction of disclosure risks Substantial disclosure control by the microdata file review committee Confidentiality rules applied separately to each file 3 years, expected release in 2010? Production time Considerable amount of work for SM analysts to certify derived variables 2. Scenario #1: Status Quo (2001)
4 Content 1. File sizeSingle file: 800,000 records (individuals) Some persons will represent a family or a household 2. GeographyCanada, 5 regions, 5 CMAs with a population of at least one million 3. VariablesVariables extracted from the dissemination database Derived variables of complexity level 4 or which require the use of limited data Reduction of disclosure risks Eliminate values with Canada frequency of less than 100,000. Collapse some or all of age groups. Round off or generate noise in income components Production time Projected release: Summer 2009 Reduced certification 2. Scenario #2: Single File
5 Content 1. File sizeHierarchical file: 350,000 records on households All families and persons are included and identified in the household (about 800,000 persons). 2. GeographyCanada, regions with a population of at least 2 million 3. Variables2B variables from the dissemination database Derived variables of complexity level 4 or which require the use of limited data Reduction of disclosure risks Eliminate values with Canada frequency of less than 100,000. Collapse age groups. Round off or generate noise in income components Production time Reduced certification Projected release: Summer Scenario #3: Hierarchical File
6 3. Analytic Content: additions and losses PUMF-2006 (Status Quo ) PUMF-2006 (Single File) PUMF-2006 (Hierarchical File) Content Size: 2.7% of the population Independent samples of the three universes Some people represent a family or a household All families and persons in households sampled are included Diverse geographies at the province and CMA levels Geography limited to provinces and major CMAs (pop. 1 million) Geography more limited to regions Families and households well represented Loss of information about families and households File representative of households; more varied content including all data Repetition of variables between the 3 universes; complex derived variables Variables taken from the questionnaire so that users can create their own derived variables Variables taken from the questionnaire so that users can create their own derived variables Analytic content limited to one universe at a time Analytic content extended to the three universes Analytic content extended to the three universes Greater potential for analysis and international comparison Production requirements Certification and production projected for summer 2010 Production projected for summer 2009 Production projected for summer 2009 Confidentiality Suppression level higher than in 2001 Suppression level lower than in 2001 (less geographies) Same suppression level as in 2001 (less geographies)
7 Thank you!