1 Archiving Michael J. Levin Harvard Center for Population and Development Studies

Slides:



Advertisements
Similar presentations
Archiving Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November 6th, 2006.
Advertisements

Harvard Center for Population and Development Studies1 Census Editing and the Art of Motorcycle Maintenance Michael J. Levin Center for Population and.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Data Collection Procedures Section A 1.
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
Information Technology IBM DB2 Content Manager “Lunch N Learn” 03/14/2007.
Measuring Ethno-Cultural Characteristics in Population Censuses United Nations Economic Commission for Europe Statistical Division Regional Training Workshop.
1 Adaptive Management Portal April
Serving up Statistics to an International Community IASSIST Conference Brian Buffett May 2003.
Computers: Tools for an Information Age
Chapter 3: The Project Management Process Groups
MR2300: MARKETING RESEARCH PAUL TILLEY Unit 10: Basic Data Analysis.
10.5 Report Performance The process of collecting and distributing performance information, including status reports, progress measurements and forecasts.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
POLICIES AND PROCEDURES FOR ARCHIVING DATA IN BURUNDI.
Country Paper on: Census Data Accessibility, Confidentiality and Copyright Policy: Ethiopia’s Experience Seminar United Nations Regional Seminar on Census.
Introduction to Genealogy By Al Barron Slidell Branch Library November 17, 2004.
Website Content, Forms and Dynamic Web Pages. Electronic Portfolios Portfolio: – A collection of work that clearly illustrates effort, progress, knowledge,
Validation and Verification
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
The 2010 World Programme on Population and Housing Censuses Paul Cheung, Director United Nations Statistics Division.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
Third Group Training Course in Application of Information and Communications Technology to Production and Dissemination of Official Statistics (06 th May’07.
Group Project Presentation Survey on Internet Use and Access in Home Country for 2007 UN-SIAP ICT Course Participants UNSIAP, Chiba, Japan July 09,2007.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
JOINT UNECE-UNFPA TRAINING WORKSHOP ON POPULATION AND HOUSING CENSUSES GENEVA, 5-6 JULY 2010 GOOD PRACTICES IN DISSEMINATING POPULATION CENSUS RESULTS.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Department of Census and Statistics - Sri Lanka The Development of Central Survey Catalogue – Department of Census and Statistics [DCS] SRI LANKA Presented.
Population Census carried out in Armenia in 2011 as an example of the Generic Statistical Business Process Model Anahit Safyan Member of the State Council.
© John M. Abowd 2007, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2007.
WHO and the Global Fund harmonized tool for Pharmaceutical Country Profiles Richard Laing & Enrico Cinnella, November 2011.
5 Marzo 2007 Census mapping and Gis Part II: dissemination Fabio Crescenzi Istat, Central Directorate on General Censuses UNECE Training Workshop on Census.
Module 5b: Measuring Household ICT Ms Sheridan Roberts, Consultant Information Society Statistics Tuesday 10 March 2009.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
Mapping in Surveys Uses of maps: Plan operations Facilitate data collection Presentation and analysis of results There are two main categories of maps:
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Collection Section A 1.
United Nations Regional Seminar on Census Data Dissemination and Spatial Analysis Amman - Jordan 16 – 19 May 2011 Determination of the scope and form of.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
Walk through the reporting process for Barcelona Convention using Reportnet Miruna Badescu, Giuseppe Aristei.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Gateway to Global Aging Data September 17 th, 2014 APRU Data Workshop Drystan Phillips.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Centre for Information & Knowledge Management INFORMATION SYSTEMS MANAGEMENT Jamie O’Brien Centre for Information & Knowledge Management University of.
Data processing of the 1999 Vietnam Population Census.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
1 Dissemination Michael J. Levin Harvard Center for Population and Development Studies
1 Coding Michael J. Levin Harvard Center for Population and Development Studies
Regional Seminar on Promotion and Utilization of Census Results and on the Revision on the United Nations Principles and Recommendations for Population.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
The Question Bank Graham Hughes & Julie Gibbs Department of Sociology University of Surrey Research Methods Festival, July 2008
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Chapter Eight Questionnaire Design Chapter Eight.
Copyright 2010, The World Bank Group. All Rights Reserved. Statistical Work Plan Development Section A 1.
First meeting of the Technical Cooperation Group for the Population and Housing Censuses in South East Europe Vienna, March 2010 POST-ENUMERATION.
Presented By Margaret Hellen Atiro Uganda Bureau of Statistics at the United Nations Regional Seminar on Census Data Archiving 20 – 23 Sep 2011, Addis.
Session 6: Data Flow, Data Management, and Data Quality.
GHANA STATISTICAL SERVICE IPUMS – Country Report: Ghana BY N.N.N. Nsowah-Nuamah (Deputy Government Statistician)
Scientific data storage: How are computers involved in the following?
CENSUS MICRODATA : THAILAND NATIONAL STATISTICAL OFFICE by PAKAMAS RATTANALANGKARN Thailand National Statistical Office.
Understanding the Value and Importance of Proper Data Documentation 5-1 At the conclusion of this module the participant will be able to List the seven.
Population and Housing Census 2011 Czech Republic
Beyond Description: Metadata for Catalogers in the 21st Century
The ultimate in data organization
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
OBSERVER DATA MANAGEMENT PRINCIPLES AND BEST PRACTICE (Agenda Item 4)
Presentation transcript:

1 Archiving Michael J. Levin Harvard Center for Population and Development Studies

2 Two types of “Archiving” I. Data II. Metadata

3 I. Data archiving Every effort must be made to keep all versions of the data set. Every effort must be made to keep all versions of the data set. Separate series of data sets need to be preserved for the pilot, the census itself, and the PES. Separate series of data sets need to be preserved for the pilot, the census itself, and the PES. For the census, this data archiving needs to start with the output of the scanning or keying operation – the completely unedited data. For the census, this data archiving needs to start with the output of the scanning or keying operation – the completely unedited data.

4 Why preserve the unedited data The most important reason to keep unedited data is because they are closest to the respondents, and, therefore represent the “thoughts and feelings” before coding, editing, and tabulation operations. The most important reason to keep unedited data is because they are closest to the respondents, and, therefore represent the “thoughts and feelings” before coding, editing, and tabulation operations. As staff edit the data, they can refer back to this data set, as needed, to see changes are being made, at the individual level, and through frequency distributions, at the aggregate level. As staff edit the data, they can refer back to this data set, as needed, to see changes are being made, at the individual level, and through frequency distributions, at the aggregate level.

5 Another reason to keep the original, unedited data As new demographic and other direct and indirect techniques are developed, they can be tested on these data. As new demographic and other direct and indirect techniques are developed, they can be tested on these data. Without the original data, techniques developed to alleviate systematic problems in these data, or census data in general, cannot be tested as easily – or, in some cases, at all. Without the original data, techniques developed to alleviate systematic problems in these data, or census data in general, cannot be tested as easily – or, in some cases, at all.

6 Keeping original responses on the records Original responses should be kept on the population and housing records as part of the editing process. Original responses should be kept on the population and housing records as part of the editing process. In this way, both original and edited responses are always available to staff and researchers In this way, both original and edited responses are always available to staff and researchers For some items – e.g., fertility – intermediate values also kept For some items – e.g., fertility – intermediate values also kept

7 Flags Countries use flags to indicate changes in individual items include these on the final, archived data Countries use flags to indicate changes in individual items include these on the final, archived data – “no/yes” flag – “no/yes” flag -- a more complicated scheme -- a more complicated scheme

8 The final data set The final data set should be named in a strong, unambiguous way for current and future staff The final data set should be named in a strong, unambiguous way for current and future staff A country may choose to have several “final” data sets. A country may choose to have several “final” data sets. For most purposes, neither the original data nor the flags are needed for daily work in the office, that is, answering user requests. For most purposes, neither the original data nor the flags are needed for daily work in the office, that is, answering user requests.

9 De facto and De Jure data sets Three groups: Three groups: (1) respondents resident in the household, (2) visitors to the household, and (3) persons usually resident in the household but away on the reference date. So, the de facto file would have the persons indicating (1) or (2) and So, the de facto file would have the persons indicating (1) or (2) and the de jure data set would have those indicating (1) and (3). the de jure data set would have those indicating (1) and (3). And, no “Universe” would need to be selected for these runs. And, no “Universe” would need to be selected for these runs.

10 II. Meta data “Meta data” – basically “data about data” of any sort in any medium. “Meta data” – basically “data about data” of any sort in any medium. Meta data – text, tables, charts, maps, and other images that describe what users want or need to know about the census or survey. Meta data – text, tables, charts, maps, and other images that describe what users want or need to know about the census or survey. The users include individuals and groups. The users include individuals and groups. The census meta data – aids in clarifying and finding the actual data. The census meta data – aids in clarifying and finding the actual data.

11 More on metadata The meta data include the definitions of the items, their use, their interactions, information about the pretest and the post-enumeration survey, daily records of progress, weekly reports, monthly reports, and reports by activity. The meta data include the definitions of the items, their use, their interactions, information about the pretest and the post-enumeration survey, daily records of progress, weekly reports, monthly reports, and reports by activity. The data-processing metadata include the structure of the data dictionary, the keying screens for keyed data and verifying screens for scanned data, structure and content edits, the tabulations, and dissemination plans and activities. The data-processing metadata include the structure of the data dictionary, the keying screens for keyed data and verifying screens for scanned data, structure and content edits, the tabulations, and dissemination plans and activities. And, the procedural history of the census below. And, the procedural history of the census below.

12 The Procedural History Crucial to the complete success of a census Crucial to the complete success of a census Without it, even the best tables could become lost Without it, even the best tables could become lost As well as the ability to make subsequent tables after the end of the initial processing. As well as the ability to make subsequent tables after the end of the initial processing.

13 Each step in the process From the very beginning of the census operations. From the very beginning of the census operations. “what we did we do the last time” “what we did we do the last time” Each operation needs to be recorded Each operation needs to be recorded when it starts, when it starts, when it ends, when it ends, what is expected to be done, what is expected to be done, what is actually done, what is actually done, problems encountered, and problems encountered, and knowledge gained. knowledge gained. Sometimes a form is created to allow for filling in the blanks as individual operations take place. Sometimes a form is created to allow for filling in the blanks as individual operations take place.

14 Dedicated staff Group of staff (or in very small operations, a single staff member) should be assigned to collect for each operation the: Group of staff (or in very small operations, a single staff member) should be assigned to collect for each operation the: questionnaires, questionnaires, forms and manuals, forms and manuals, dictionaries and screens, dictionaries and screens, edits and tabulations, and edits and tabulations, and metadata metadata These various pieces of information need to be put in a data base or umbrella directory (like the TRS) and indexed for easy access both during the census and subsequently. These various pieces of information need to be put in a data base or umbrella directory (like the TRS) and indexed for easy access both during the census and subsequently.

15 Documenting table series Include: The item or items The item or items Definitions Definitions How the question was asked How the question was asked How the information derived from this question is used for planning and policy formation. How the information derived from this question is used for planning and policy formation. Limitations of the data item or items Limitations of the data item or items Compatibility with other censuses and surveys is also helpful to users. Compatibility with other censuses and surveys is also helpful to users.

16 Finally! All metadata, including All metadata, including Publicity announcements – both on paper and electronic announcements – Publicity announcements – both on paper and electronic announcements – notes, memos, s, and so forth notes, memos, s, and so forth need to be saved and organized, by date and topic. need to be saved and organized, by date and topic. It is only by being able to see the scope and flow of work, that the best planning can be done. It is only by being able to see the scope and flow of work, that the best planning can be done.