Download presentation
Presentation is loading. Please wait.
1
Country report Germany
Workshop “Integrating Global Census Microdata” - Lisbon, 22 August 2007- Andrea Harausz My name is Andrea Harausz. I work for the research data centre of the federal statistical office in Germany. First of all I would like to say a big thank you to the mpc for the invitation to the workshop and the great present and opportunity to attend the ISI 2007. I am very pleased to present here on our progress in creating public use microdata files for IPUMS-Europe project.
2
Content Microdata for the project “Integrated European Census Microdata” (IECM) Delivered metadata 1971 census of the former German Democratic Republic History Characteristics and metadata Preparation Anonymisation Outlook My presentation will cover the following topics: First I would like to say some words about the scope of the IPUMS Project for our country. After that I will present the metadata that we were able to provide to the project. The main emphasis of my presentation will be on the microdata of the 1971 census of the former GDR. Finally I will give short outlook on the future work. 2007 August 22
3
Microdata for the IECM project
9 anonymized microdata files 2 censuses - Federal Republic of Germany 1970 1987 2 censuses - former German Democratic Republic 1971 1981 5 microcensuses - Federal Republic of Germany 1973 1982 1991 2001 Germany will contribute to the IPUMS-Europe project with the creation of nine public use microdata files for the following statistics: The two last censuses of the former GDR and the FRG. In addition to the censuses we will create puf of 5 microcensuse of FRG. Our microcensus is a 1 % household sample survey carried out every year. with a very broad range of variables. 2007 August 22
4
Microdata for the IECM project
1970 . 1987 1973 1971 1981 1980´s 1970´s 1990´s 1982 1991 2001 This picture is to visualize the distribution of the microdata files over time. The grey-coloured symbols show the population censuses. The yellow ones the microcensuses. The microdata files are distributed over 4 decades. All these microdata were found in the statistical offices and archives in a machine-readable format. Which was not the case with the metadata. Almost all documentation, if it survived, was only available as paper copy. 2007 August 22
5
Available Metadata Enumeration forms Instructions to enumerators
GDR c81 mc73 mc82 mc87 mc91 mc01 Enumeration forms Instructions to enumerators Codebooks for micro data Occupation codes Industry codes Technical reports Other In the rows of this table you can see the most important metadata. In the colums there are the microdata files to be contributed to the project Recovering of these documents was easy for the younger microcensuses and the western german censuses, which were very accurately documented. But for some files like the older microcensuses and the 1971 GDR census we had to investigate in several institutions like archives and regional statistical offices. Today we have a nearly complete documentation to the microdata files. 2007 August 22
6
Available Metadata Enumeration forms Instructions to enumerators
GDR c81 mc73 mc82 mc87 mc91 mc01 Enumeration forms Instructions to enumerators Codebooks for micro data Occupation codes Industry codes Technical reports Other Here you can see the documentation we were able to provide to the project. We were not able to find the instructions for enumerators and the industry codes for the 1971 gdr census. We provided all our documentation in German. The translation of them to English is made at the MPC. They already translated varios documents 2007 August 22
7
Available Metadata Enumeration forms Instructions to enumerators
GDR c81 mc73 mc82 mc87 mc91 mc01 Enumeration forms Instructions to enumerators Codebooks for micro data Occupation codes Industry codes Technical reports Other They are highlighted here. After so much documentation the MPC also wanted to have some microdata of course. We began with the anonymisation of the 1971 GDR microdata. 2007 August 22
8
1971 census of the former GDR
History Characteristics and metadata Preparation of microdata Anonymisation The 1971 census microdata of the GDR. First I would like to say some words to the history of the microdata of the 1971 GDR census. 2007 August 22
9
1971 census of the former GDR
History Central State Administration for Statistics of the GDR Legal successor: Federal Statistical Office Backup and documentation of data Partly adaptation to the system of the 1987 census of the FRG Release to the Federal Archive Federal Archive Data protected by data protection act Archive law allows availability of data after 60 years Special permission for the release of data for scientific purposes The censuses of the GDR were conducted by the state administration for statistics of the GDR. The microdata of the two last censuses from 1971 and 1981 were kept in a maschine readable format. After the reunification of east an west germany the central state administration ceased to exist. Its legal succesor became the FSO. From then on the census data were in possesion of the fso. After documentation and backup, the data were partly adapted to the system of the 1987 census of the FRG. In 2000 the fso released the data to the federal archive where it is stored until it can be mada available to the public. At the moment these data are protected by the data protection act and can be made available to the public 60 after the census year. But The archive law allows the release of data for scientific purposes by the owner of the data, which is the fso. 2007 August 22
10
1971 census of the former GDR
Characteristics and metadata 2 data files Person file (demography, income, education, employment etc.) Dwelling and building file (state of repair, occupancy, equipment etc.) 16,4 mio. persons, 6,2 mio. households, 6 mio. dwellings Metadata no codebooks at FSO and Federal Archive Archives of regional statistical offices in the former GDR states (Field of study, occupation codes) For this purpose we requested the microdata file for the 1971 census from the federal archive: we received two files A person file and a dwelling and building file. The person file contain information on demography, income, education, employment and the household structure. The dwelling file provide information on the state of repaire of the building, the occupancy status, the equipment of the dwelling with heating, hot water supply and sanitary equipment. We have in total 16,4 million persons, 6,2 million households and 6 million dwellings in the datafiles. Concerning the metadata for this census, we had difficulties to find some. The fso and the federal archive had almost no documentation on this census. We searched documentation in libraries and archives of regional statistical offices in the former estern German states. After having found enough documentation to understand the microdata we began with the matching of the two microdata files. 2007 August 22
11
1971 census of the former GDR
Preparation of microdata – Matching of data files Matching by unique combination of variables in both files Deletion of vacant dwellings and dwellings used for other purposes Verification of matching method by comparison of certain variables existent in both files number of principal residents in 1st household number of children under age 17 in 1st household They were matched by a unique combination of variables existent in both files. We verified the matching method by comparison of certain variables existent in both files: -number of principal residents in 1st household -number of children under age 17 in 1st household In the case of a succesful matching the values of these variables should be identic in both files. The first variable indicated a succesfull matching the second showed differences in the two files. These differences appeared only in case of households with children in the age of 17. We suppose that these children were considered in one file as children under age 17 and in the other as children over age of 17. We can assume that the matching of the two data files was succesful. After matching the two data files we developed an anonymisation concept for the data. 2007 August 22
12
1971 census of the former GDR
Anonymisation Time Drawing of a subsample Geographic detail Recoding Sorting of households fully anonymised microdata = Public Use File The concept is determined mainly by the following components: Time – the data are almost 40 years old, what makes the reidentification of individuals very difficult, as every individual household changed its structure since than. Drawing of a subsample, reducing georgraphic detail, recoding of variables and sorting of households. The sum of these measures lead to a fully anonymised microdata file or a so called public use file. 2007 August 22
13
1971 census of the former GDR
Sampling Size: 25% household sample Method: Systematic Random Sampling Sorting of households by geographic variables First household selected randomly from the first 4 cases Then selecting every 4th household As first step of the anonymisation: We draw a 25% subsample, a household sample, where persons in coll. dwelling are considered as households The Sampling Method we used was systematic random sampling where we first household was selected randomly and than every 4th household systematically. 2007 August 22
14
1971 census of the former GDR
Geographic detail State (Nuts1) Construction of a size of place variable Categories: less than 2 000 and more. Berlin : less than and more. In addition to the drawing of a subsample we reduced the geographic detail to the nuts 1 level which are the provinces or states. Bob asked us to construct a size of place variable. We constructed a variable named classes of municipal size with the following 6 categories.: Goal was to create classes each of them covering about 25% of the population and each class has at least 3 municipalities per state. 2007 August 22
15
1971 census of the former GDR
Recoding of variables Principle: every value of a variable should have at least 3 observations in the original file Recoded variables (top coding): Age Floor space of rooms in dwelling Number of rooms in dwelling Number of secondary residents in household Regarding the recoding of variables, we determined that each value of a variable should have at least 3 observations in the original file. If a value has less then 3 observations, it has to be conbined with another value of the same variable. Variables where recoding was necessary are:… For all of them we applied top coding. 2007 August 22
16
1971 census of the former GDR
Sorting of households Random sorting of Building in state Dwelling in building Adding new running number for Household in dwelling After removing the geographic variables the microdata file was still sorted by a systematic regional order. To eliminate this additional information from the file, the buildings were sorted randomly in each state, the dwelling were sorted randomly in each building and new running numbers for building in state, dwelling in building and household in dwelling were created. 2007 August 22
17
1971 census of the former GDR
Public Use File 4,1 million persons 50,000 in collective dwellings 1,6 million households 104 variables After applying all this anonymisation measures to the microdata we produced a Public Use File with 4,1 Million persons 1,6 Million households 104 variables 2007 August 22
18
Outlook 1970 . 1987 1973 1971 1981 1980´s 1970´s 1990´s 1982 1991 2001 After having finished the 1971 census, we are planning to create the next public use files for the 1981 gdr census and afterwards for the 1987 microcensus. 2007 August 22
19
Thank you! andrea.harausz@destatis.de www.forschungsdatenzentrum.de
2007 August 22
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.