Original dataOriginal data. (various) Reformat dataReformat data: structural issues draw sample confidentiality (general tools) Data dictionary. (txt/pdf)

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
Mobile Surveyor A Windows PDA/Mobile based survey Software for easy, fast and error free data collection.
Metadata at ICPSR Sanda Ionescu, ICPSR.
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
The Caught and Coloured website: its EMu origins Alex Chubaty – Collection Information Systems Craig Churchill – IT Software Development Museum Victoria.
Information Retrieval in Practice
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS i integration principles IPUMS i integration principles » 1. Respect absolute anonymity and confidentiality »
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
Preservation and Security IPUMS International Wendy Thomas Data Archivist.
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
WORKSHOP ON INTEGRATING GLOBAL CENSUS MICRO DATA Paris, June 7 – 10, 2006 UGANDA COUNTRY REPORT by Andrew Mukulu.
5. Integration of Microdata and Metadata (9 slides)
Evelyn Brislinger, Wolfgang Zenk-Möltgen
The IPUMS-International dynamic metadata system * * * Robert McCaa, Professor of Population History University of Minnesota.
Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota Population Center.
IPUMS-International Integration Process Matt Sobek Minnesota Population Center
Overview of Search Engines
Conference Planning An ACEware Webinar. Course Setup ◦Planning.. Planning.. And more Planning ◦Fee Structure Name & Reg UDF’s  (finding space to store.
Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
How to get data on Indian (or any other) emigrants from IPUMS-I samples: 1.Login 2.Select samples (e.g., US, UK, Canada, etc.) 3.Select variables (include.
Eric Westfall – Indiana University Jeremy Hanson – Iowa State University Building Applications with the KNS.
Design and Use of the IPUMS-International Data Series
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
IPUMS-International Steven Ruggles Minnesota Population Center.
1 Canadian Century Research Infrastructure CCRI An Interdisciplinary Census Database Project.
TheDataWeb & DataFerrett Rebecca Blash Bill Hazard The DataWeb Applications Branch U.S. Census Bureau.
POPULATION AND HOUSING CENSUSES IN SLOVAKIA ON THE WEBSITE Miroslav Hudec Pavol Büchler INFOSTAT – Bratislava MSIS Geneva
Design and Use of the IPUMS-International Data Serieshttp://international.ipums.org Matt Sobek Minnesota Population Center
The Minnesota Data Harmonization Projects Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek.
IPUMS-International Methods Matt Sobek Minnesota Population Center
Statistical Export and Tabulation System (SETS) Overview and SetX Debut Ann Aikin and Bob Sloss 2004 Data Users Conference Session #16 U.S. Department.
0 A Workable Solution for Basic Metadata January 9, 2006.
DLI Boot Camp 2011 Finding Statistics: Tools and Techniques Jean Blackburn Vancouver Island University Library SDA.
Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,
Chapter 7 Web Design.. HTML  Hypertext Markup Language  Using HTML, text is formatted by wrapping it in a tag.  The tags provide instructions to the.
United Nations Regional Seminar on Census Data Dissemination and Spatial Analysis Amman - Jordan 16 – 19 May 2011 Determination of the scope and form of.
Access Chapter 8- Integrating Access with the Internet and other Programs.
IPUMS Microdata Relation to head Marital status Literacy Occupation.
 Background Data harmonization Data output  Web: Variable documentation system  Web: Data extract system IPUMS Dissemination System.
Integrated Public Use Microdata Series IPUMSwww.ipums.org Matt Sobek Minnesota Population Center
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
The Integrated Public Use Microdata Series database IPUMSwww.ipums.org Lab 1 Background on the IPUMS and SPSS.
Integrate, check and share documents Module 3.3. Integrate, check and share documents Module 3.3.
IPUMS-International Process Matt Sobek Minnesota Population Center
What is a Census? In 1801 the British Government decided to hold a census, that is a count of how many people lived in the country at the time, and information.
Challenges of Census Data Harmonization: IPUMS-International Matt Sobek Minnesota Population Center
Accessing and Using NCHS Data: An Overview of Microdata Access Tools with SETS Demonstration Ann Aikin, Avay Dolberry, and Brady Hamilton 2004 Data Users.
Chapter – 8 Software Tools.
3. IPUMS Documentation Dynamic Metadata System: 5 “clicks” to compare any census question, in English, for any combination of years and countries in the.
Click “Browse and Select Data”:  to view integrated metadata  and to get microdata (make an “extract”) Note: the data are “pooled” into a single file–
Integrated Public Use Microdata Series IPUMS Internationalwww.ipums.org Matt Sobek Minnesota Population Center
Integrated Public Use Microdata Series IPUMSwww.ipums.org.
View Source Documents Images in the official language(s) Text in English (translated, where necessary) Press to continue tutorial Topic: Source Documents.
Integrated Health Interview Series (IHIS): Providing Free, Integrated NHIS Data over the Internet Miriam L. King, PhD, Minnesota Population Center Brian.
IPUMS-International Schedule
Press <spacebar> to continue tutorial
Questasy: Documenting and Disseminating Longitudinal Data Online with DDI 3 Edwin de Vet 11/14/2018.
Explore variables metadata (18 slides)
Enhancing ICPSR metadata with DDI-Lifecycle
IPUMS-International Integration Process
2. Applying for Access (10 slides)
Danilo Dolenc Statistical Office of the Republic of Slovenia
The IPUMS-International Dissemination System
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
WHERE TO FIND IT – Accessing the Inventory
Presentation transcript:

Original dataOriginal data. (various) Reformat dataReformat data: structural issues draw sample confidentiality (general tools) Data dictionary. (txt/pdf) Enumeration formsEnumeration forms and instructions. (pdf) Sample designs, census info, etc. (pdf) Collect metadata for input variables:metadata for input variables codes labels (original language) labels (English) frequencies (Excel, with Perl) Convert to editable files:editable files translate into English standardized layout standardized formatting (Word) Assemble codes, labels, and frequencies from source variables for harmonized trans tables. (automated) Collect relevant enumeration text for harmonized variables. (automated) Create files for public delivery. (pdf & generated HTML) Create translation tables:translation tables recoding matrix (Excel) Variable descriptionsVariable descriptions: definition of variable universe comparability general & detailed (Word) Project-wide control files: countries samples variables (Excel) Create IPUMSI data: creation: Java reporting: Java testing: SPSS extraction: Java IPUMSI web site. (Java & HTML) Export IPUMSI metadata for use by major MPC programs. (transfer responsibility to IT) Original materials Prepare samples Integration Create IPUMSI Collate sample information.sample information (Word, tagged) Collect codes, labels, and frequencies for ALL input variables. (automated) Tag enumeration text Tag enumeration text to link it specifically with input variables. Create translation tables:translation tables clean-up recoding only virtually no special programming (Excel) Variable descriptions: basic definition of variable universe cross references to enumeration text (Word) Create source variables Data improvements: allocation logical editing pointers Scripts for special programming (text)

Integrated variable list. Integrated variable description. Sample designs, etc. Enumeration files in their entirety. Codes and labels, with frequencies. Documentation: User experience of IPUMSI web site Source variable list. Source variable description (Java assembles tagged enumeration text). Translation table. Special programming. Source variable metadata: frequencies, labels, and original-language labels. Select samples. Download extract: data syntax enhanced codebook Data: Select variables: integrated general or detailed source Select features: case selection household aggregation attached characteristics Registration: more rigorous vetting more automated registration processes Access. user preferences registration expires (1 yr) Registration: Enumeration text specific to the variable. (assembled by Java)

Vice President of the U.S., Secretary of War, C.S.A, Later charged with treason, fled to Cuba How a case gets from the manuscript census into the IPUMS John C. Breckinridge of Kentucky An example from the 1860 census....

Original enumeration form from the 1860 U.S. Census

Data entry screen in Minnesota (ca. 1997)

Household and person record ready for checking (ca. 1999)

Coding dictionary for the occupation variable (ca. 2000)

Year Industry Page Wealth Age Relationship Checked and coded data, ready for release (ca. 2001) Occupation

Enumeration form: original file

Variable labels file

Data file: before reformatting

Data file: after reformatting

geographyhousing person (head) person (child) geographyhousingperson (head) geographyhousingperson (child) geographyhousingperson (child) geographyhousingperson (head) geographyhousingperson (spouse) geographyhousingperson (child) geographyhousingperson (child) geographyhousing person (head) person (spouse) person (child) Reformat Rectangular Sample (Brazil 1980) (Person records only; household data duplicated on person records)

Reformat Dwelling-Household-Person Sample dwelling household person (head) person (spouse) person (child) household person (head) person (child) person (head) person (spouse) dwelling household dwellinghousehold person (head) person (spouse) person (child) dwellinghousehold person (head) person (child) dwellinghousehold person (head) person (spouse) (Chile 1992) (Separate dwelling and household records)

dwelling 001 head spouse child head dwelling 002 head child Reformat Dwelling-Person Sample (Colombia 1993) household head spouse child household head household head child (Multi-household dwellings; no separate household record)

serial 001head serial 001spouse serial 002head serial 002child serial 003head serial 001geog & housing serial 002geog & housing serial 003geog & housing serial 001household serial 001head serial 001spouse serial 003household serial 002household serial 002head serial 002child serial 003head Household File Person File (Brazil 2000) Merge Separate Household and Person Files

Reformat Individual-level Data geogpersonhousinggeogperson geogpersonhousinggeogperson geogpersonhousinggeogperson geogpersonhousinggeogperson geogpersonhousinggeogperson household person household (Mexico 1960) geogpersonhousinggeogperson geogpersonhousinggeogperson geogpersonhousinggeogperson geogpersonhousinggeogperson geogpersonhousinggeogperson (Individuals only; not organized in households)

Enumeration form: editable file, in English

Variable description

Sample design

PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a Spouse’s Mother’sFather’s IPUMS “Pointer” Variables Location (Colombia 1985) (Simple household)

PernumRelationshipAgeSexMarstChborn 1head53femaleseparated6 2child28malesinglen/a 3child22malesinglen/a 4child21malesinglen/a 5child25femalemarried2 6child-in-law28malemarriedn/a 7grandchild3malesinglen/a 8grandchild1malesinglen/a 9non-relative32femaleseparated2 10non-relative10malesinglen/a 11non-relative5femalesinglen/a Location Spouse’sFather’sMother’s IPUMS “Pointer” Variables (Complex household) (Colombia 1985)

Project control file: variables

Translation table

Translation Matrix – Marital Status How we integrate variables across countries and time

Translation Matrix – Marital Status location of data in the original samples

Translation Matrix – Marital Status marital codes used in the 1973 Colombian census

Translation Matrix – Marital Status different original codes for “widowed” across the censuses

Translation Matrix – Marital Status final IPUMS coding scheme for marital status

Source variable translation table

Tagged enumeration form