Data format translation and migration Future possibilities Alasdair Crockett, Data Standards Manager UK Data Archive.

Slides:



Advertisements
Similar presentations
Introducing the ELAR information system architecture
Advertisements

UK DATA ARCHIVE Louise Corti, ODAF April UK Data Archive an internationally-renowned centre of expertise in data acquisition, preservation, dissemination.
Preservation by Migration to XML Dirk Roorda. work on a preservation strategy positioning of the XML preservation strategy implementing the strategy in.
Research Data Access and Preservation Summit Panel 2 - Promoting Re-Use of Scientific Collections Some responses to the questions posed... John Harrison.
Statistical Software Packages: How do I get this into that? Gillian Byrne Memorial University of Newfoundland Atlantic DLI Training - April 23, 2004.
DLI Training Nesstar Workshop
DDI Tags Why are we here? Some of the questions Some possible solutions Need for input and discussion.
Quantitative Data Preparation Louise Corti ESDS/ UKDA Social Science Data Archives for Social Historians: creating, depositing and using qualitative data.
Preserving for the Future Mike King Systems Manager UK Data Archive (University of Essex)
New Services for Data Creators and Providers Louise Corti, Head ESDS Qualidata/ Outreach & Training Alasdair Crockett, ESDS Data Services Manager.
Quantitative Data Preparation Alasdair Crockett, Data Services Manager UK Data Archive.
Anne Etheridge Economic and Social Data Service IASSIST May 2006 METADATA MANAGEMENT THE FORGOTTEN WORLD OF THE BACK OFFICE.
ECHO Browse Reclassification Document ID: ECHO_Ops_Con_023 Version: 2.
Metadata Management at GESIS-ZA Reiner Mauer GESIS – Data Archive and Data Analysis CESSDA-Expert Seminar Odense, September 11th 2008.
Mobile Surveyor A Windows PDA/Mobile based survey Software for easy, fast and error free data collection.
Metadata at ICPSR Sanda Ionescu, ICPSR.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
& Slovak Archive of Social Data “SASD” and DDI 3 IASSIST/IFDO Conference, Tampere, Juraj.
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
Data Processing A simple model and current UKDA practice Alasdair Crockett, Data Standards Manager, UKDA.
Database Features Lecture 2. Desirable features in an information system Integrity Referential integrity Data independence Controlled redundancy Security.
Data Management: Documentation & Metadata Types of Documentation.
Chapter 1 Introduction to Databases
CESSDA Expert Seminar CESSDA Expert Seminar Odense, 11-12/9/2008 Presentation made by Dimitra Kondyli.
1 Nassau Community CollegeProf. Vincent Costa Acknowledgements: Introduction to Database Management, All Rights ReservedIntroduction to Database Management.
Confidential ODBC May 7, Features What is ODBC? Why Create an ODBC Driver for Rochade? How do we Expose Rochade as Relational Transformation.
HDF 1 NCSA HDF XML Activities Robert E. McGrath Mike Folk National Center for Supercomputing Applications.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Implementing Digital Object Identifiers at the GESIS Data Archive for the Social Sciences Workshop “Persistent Identifiers for the Social Sciences” Bonn,
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
9 Feb 2004Mikko Mäkinen & Saija Ylönen Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) Geneva, 9-11 February 2004, Topic (ii): Metadata.
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
An introduction to MEDIN Data Guidelines. What MEDIN data guidelines are not… Protocols for collection methods Prescriptive of how you have to collect.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Introduction to Metadata, the DDI and the Metadata Editor Presentation to the SERPent project team by Margaret Ward 3 March 2010.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
Managing Extended Attributes With an Enterprise Guide Add-In Larry Hoyle, Institute for Policy & Social Research, University of Kansas.
1 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Accessing Remote Data 2.4 Importing Text Files.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
© 2014 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
Corporate Data Vault Data Warehousing Workshop Sept Data Warehousing Workshop Sept
Return from Anarchy Jon Johnson 11 May 2005 Migrating from SPSS to SIR.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
Data Management Research Methods Professional Development Institute December 4, 2015.
FORSbase SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Topics Covered: File Components of file Components of file Terms used Terms used Types of business file Types of business file Operations on file Operations.
Normalizing Data for Migration Kyle Banerjee
GCSE ICT Data Transfer. Data transfer Users often need to transfer data between software packages or computers. Until relatively recently this was difficult.
Data Management and Archival Storage Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
GCSE ICT Data Transfer.
Use of CAPI for agricultural surveys
Karen Dennison Collections Development Manager
Sergiy Radyakin The World Bank
REDCap Data Migration from CSV file
SIRxs in Review aka The Joys of XS Presented at SIR-UK Conference
Introducing the ELAR information system architecture
Login Main Functions Via SAS Information Delivery Portal
Presentation transcript:

Data format translation and migration Future possibilities Alasdair Crockett, Data Standards Manager UK Data Archive

Past problems and future solutions Past/existing problems – ‘skeletons in the back catalogue’. The UKDA and other long-standing archives have old studies in column binary or other legacy formats that are difficult, time consuming and occasionally practically impossible to process/migrate. Future solutions – to ensure that we don’t store up similar problems (with vastly increased amounts of data) in 20 or 30 years time This talk covers Future Solutions

When does data format translation occur? To enable data processing (validation, etc.) From ingest format to “processing format” (this being SPSS in the case of the UK Data Archive) To ensure long-term preservation From processing format to preservation format(s), these being SPSS portable and tab-delimited text (with data dictionary) in the case of the UK Data Archive. To achieve user-friendly dissemination From preservation or processing format to dissemination format of user’s choice e.g. STATA, SAS or EXCEL, in addition to the ubiquitous SPSS. Migration when previously mainstream formats become obscure or new formats are requested by users

What are the potential problems of data format conversion? At time of processing: Rounding/truncation of numeric data Truncation of textual data Differences in handling “internal” metadata (differential label lengths, missing value handling, etc.) Corruption of specially formatted variables (especially date/time variables) Embedded special characters (line feeds, carriage returns, tabs, etc.) Migration: … all the above and added problems with Dealing with out of date, unfamiliar and/or, inaccessible formats (e.g. column binary)

The ‘Data Curation Initiative’: An XML standard and conversion utilities for survey data The ‘Data Curation Initiative’ (DCI) consists of: XML Standard: –Open standard for sharing and preserving datasets –Implemented as an XML Schema –Stores all attributes of a survey dataset – labels, missing value definitions, variable level notes, etc. Conversion software: –From proprietary formats to DEI (with no data loss) –From DEI to proprietary formats (text file +command file) –File and variable level metadata import/export to DDI XML schema

SPSS STATA SAS EXCEL ACCESS Data in DCI XML format SPSS STATA SAS EXCEL ACCESS DATA PRESERVATION DCI XML file provides preservation master copy of the dataset The key functions of the data curation software: Text file giving variable-level report of any data or metadata loss on export File and variable level metadata exported to DDI XML file RESOURCE DISCOVERY (METADATA EXPORT)

Migration strategy An approach such as the data curation initiative allows either: Traditional migration strategies – systematic migration of whole collection on preservation server Migration on request – preservation version remains the same but on-the-fly export utilities are updated to cater for new versions/formats as they become popular

Doesn’t the DDI do this? Not so far Could build onto the DDI – in any case the DCI will populate variable level of DDI Some advantages to keeping ‘data’ and ‘metadata’ separate: –Single xml file could become enormous and slow to parse –Allows communities who don’t use the DDI to use the DCI (and vice versa)