Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013.

Slides:

Advertisements

Similar presentations

International Household Survey Network Metadata Toolkit Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November.

Advertisements

Archiving Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November 6th, 2006.

MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Data Archiving.

DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.

DLI Training Nesstar Workshop

Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.

Quantitative Data Preparation Louise Corti ESDS/ UKDA Social Science Data Archives for Social Historians: creating, depositing and using qualitative data.

Depositing Data for Archiving Libby Bishop ESDS Qualidata, University of Essex Changing Families, Changing Food Meeting University of Sheffield 15 March.

An Introduction to the UK Data Archive and the Economic and Social Data Service November 2007 Jack Kneeshaw, UKDA.

Quantitative Data Preparation Alasdair Crockett, Data Services Manager UK Data Archive.

The Economic and Social Data Service (ESDS) Karen Dennison UK Data Archive Improving access to government datasets 18 January 2007.

Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw and Alasdair Crockett MCS workshop 20 November 2003 ESDS Longitudinal.

Multiple Indicator Cluster Surveys Survey Design Workshop MICS Technical Assistance MICS Survey Design Workshop.

Metadata Management at GESIS-ZA Reiner Mauer GESIS – Data Archive and Data Analysis CESSDA-Expert Seminar Odense, September 11th 2008.

MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Data Archiving.

Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.

Business microdata dissemination at Istat Daniela Ichim Luisa Franconi

Dissemination of U.S. Census Data and Results: The role of ICPSR First Conference of Al-Khawarezmi Committee on Statistics Doha, Qatar 6-8 December 2010.

Statistical literacy from the ground up ESDS International Annual Conference London 29 November 2010 Eric Swanson, World Bank.

Access to and specifics of detailed national LFS data – the case of Slovenia Sebastian Kočar Social Science Data Archives University of Ljubljana 4th DwB.

Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.

Hannele Keckman-Koivuniemi and Mari Kleemola : Data Processing in FSD : CHALLENGES IN A NEW ARCHIVE IASSIST2003 Ottawa,

Codebook Centric to Life-Cycle Centric In the beginning….

5. Integration of Microdata and Metadata (9 slides)

Data Management: Documentation & Metadata Types of Documentation.

United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Country Practices on Census Data Archiving.

MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.

IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.

Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.

Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.

Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.

Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.

Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.

1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”

Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.

FORMS OF COOPERATION BETWEEN NATIONAL STATISTICAL INSTITUTES AND DATA ARCHIVES Sebastian Kočar (ADP, UL) First Regional Workshop – Microdata Access in.

Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.

Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.

MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Overview of MICS Tools, Templates, Resources, Technical Assistance.

NATIONAL STATISTICAL COMMITTEE OF THE KYRGYZ REPUBLIC: METADATA AND DATABASE ARCHIVE CREATION L. Tekeeva Deputy Chairman of the National Statistical Committee.

Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.

Data documentation and metadata for data archiving and sharing Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.

BLAISE to DDI Vipavc Irena, ADP, Slovenia CESSDA - Seminar, September, 2004.

Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.

Metadata management: DDI and Nesstar at the Czech Social Science Data Archive Jindrich Krejci & Yana Leontiyeva Data without Boundaries, Ljubljana 24 &

MISSY - Metadata for Official Statistics - a new service for EU microdata - European Data Access Forum Luxembourg, March 2015 Jeanette Bohr GESIS – Leibniz.

Statistical data confidentiality and micro data in Albania

Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.

Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.

Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.

Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.

OVERVIEW OF ARCHIVING OF MICRODATA SILAS M. MULWA Kenya National Bureau of Statistics United Nations Regional Seminar on Census Data Archiving for Africa.

RESEARCH METHODS IN TOURISM Nicos Rodosthenous PhD 25/04/ /4/20131Dr Nicos Rodosthenous.

DDI AND EXPERIENCES AT ICPSR Prepared for Expert Seminar Finnish Social Science Data Archive Tampere, Finland September 1-2, 2000.

Survey Data Management and the Combined Use of DDI and SDMX Arofan Gregory Chris Nelson Metadata Technology Eurostat, June

Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.

CIMES, a tool for describing European Official Statistics Microdata P resented by Raphaëlle Fleureux, Cyril Jayet/ CNRS-RQ DwB 1st Regional Workshop, Ljubljana,

SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.

Presented By Margaret Hellen Atiro Uganda Bureau of Statistics at the United Nations Regional Seminar on Census Data Archiving 20 – 23 Sep 2011, Addis.

MICS4 Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Tabulation Programs.

Ingest – Workflow Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.

Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.

R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.

Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.

Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and

Ingest and Dissemination with DAITSS

Karen Dennison Collections Development Manager

DDI for the Uninitiated

The role of metadata in census data dissemination

Data Liberation Initiative (DLI)

Presentation transcript:

Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013

SIP, AIP, DIP Submission Information Package (SIP) Archival Information Package (AIP) Dissemination Information Package (DIP) SIPDIP AIP Long term preservation

Recommended formats – input Type of materialRecommended formatOther acceptable formats Questionnaire  Rich Text Format (*.rtf)  structured metadata record of questionnaire (*.xml) by DDI or CAI programme (*.bmi)  other text formats (*.docx, *.txt, etc.)  *.pdf or other graphical formats  printed version Data material (data file)  SPSS (*.por, *.sav)  plain text data, ASCII (*.txt) + structured text or mark-up file containing metadata information (variable names, labels, categories, question text)  other statistical packages  tables (*.xlsx etc.)  data bases Textual material (study description, codebook, interviewer instructions, speech to respondents, copies of research reports)  Rich Text Format (*.rtf)  printed version  *.pdf or other graphical formats  other text formats (*.docx, *.txt, etc.)

Recommended formats – distribution STUDY DESCIPTION: DDI structured XML DATA FILE: ASCII + xml  distributed in formats that can be exported from Nesstar OTHER TEXTUAL MATERIAL: PDF

Recommended formats – archiving DATA FILE: ASCII (*.txt) + xml with DDI file and data description

Recommended formats – archiving QUESTIONNAIRE, TEXT MATERIAL: original (any format) + distribution files (PDF) STUDY DESCRIPTION: DDI structured XML

Licence Agreement Free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work to make commercial use of the work Under the following conditions: Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial — You may not use this work for commercial purposes.

Naming files and versioning File format: StudyID_MaterialType_Language_Version_Subversion.FileFormat Example: sutr1006_p1_sl_v1_r2.txt URN: URN:SI:UNI-LJ-FDV:ADP:StudyID_MaterialType_Language_Version Example: URN:SI:UNI-LJ-FDV:ADP:sutr1006_p1_sl_v1

Managing workflow Project tracking software Task for every study, with 29 subtasks covering: -general part with correspondence -managing deposited materials -preparing data file -preparing study description -publishing

Cleaning data and documentation Frequencies check Variable names, values Missing values Recode Weight Anonymisation Cumulative dataset

Anonymisation Sebastian Kočar Expert Assistant in Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013

Anonymisation in the archives - types basic anonymisation - of mostly academic research dataset anonymisation of Eurostat files anonymisation of official statistics Public Use Files (PUF)

Basic anonymisation of distributed microdata in archives deleting variables Direct identifiers (telephone numbers, addresses etc.) are removed. recoding indirect identifiers But still allowing serious researchers to receive datasets with indirect identifiers non-recoded). Recoding includes removing values and bracketing – combining the categories of a variable.

Anonymisation of Eurostat files (the case of Eurostat Labor Force Survey) deleting variables: indirect identifiers and unneeded variables are removed (municipality, wave nr. etc.) bracketing: age, marital status, education, years of residence, age of establishment of residence, duration of search of employment, professional status, country & nationality classification: income numbers are not given, respondents are divided into classes based on their income aggregation: economic activity and occupation values are aggregated at 1-digit level top-coding: restricting the upper range of a variable (nr. of hours worked)

Anonymisation of official statistics Public Use Files for distribution in archives anonymisation software: μArgus, R! (sdcMicro, bethel, sampling packages), Cornell anonymisation toolkit, synthetic data generators anonymisation technics: data reduction techniques (global coding, local suppression etc.), data perturbation techniques (micro-aggregation, PRAM etc.), sampling, generating synthetic microdata

Anonymisation – a case study PUF prepared in cooperation with SORS Sector for General Methodology and Standards anonymisation procedure which follows Eurostat LFS anonymisation criteria (in SPSS) calculating individual and global risk (R! – sdcMicro) calculating strata allocation, based on individual risk averages by strata (R! – bethel) stratified sampling, based on the inclusion probability of a certain case (R! – sampling – samplecube) sample weights recalculation LFS 2010 PUF distributed in August 2013