Representing and utilizing DDI in relational databases A new DDI best practices working paper Ingo Barkow, Senior researcher, Leibniz Institute for Educational.

Slides:



Advertisements
Similar presentations
Questasy Technical Overview Alerk Amin. Data Dissemination Requirements Data collection Multiple languages One system –Data and metadata –Administrators.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
Tamino – a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University.
Introduction to Databases
Plannes security for items, variables and applications NEPS User Rights Management.
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
A database-driven tool to create items, variables and questionnaires NEPS Metadata Editor.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
Präsentationstitel IAB-ITM Find the right tags in DDI IASSIST 2009, 27th-30th Mai 2009 IAB-ITM Finding the Right Tags in DDI 3.0: A Beginner's Experience.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
IST Databases and DBMSs Todd S. Bacastow January 2005.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Implementing Digital Object Identifiers at the GESIS Data Archive for the Social Sciences Workshop “Persistent Identifiers for the Social Sciences” Bonn,
Overview of the Database Development Process
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
William Block, Co-PI Warren Brown & Stefan Kramer, Senior Scientists Florio Arguillas & Jeremy Williams, Project Staff Cornell Institute for Social and.
Database Design for DNN Developers Sebastian Leupold.
DDI Profiles to Support Software Development as well as Data Exchange and Analysis NADDI , Vancouver (Canada) David Schiller, IAB Ingo Barkow,
DATABASE and XML Moussa Mané. Learning Objectives ● Learn about Native XML Databases ● Learn about the conversion technology available ● Understand New.
CSC271 Database Systems Lecture # 4.
Database Technical Session By: Prof. Adarsh Patel.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences DC Thomas Bosch GESIS – Leibniz.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
The Data Documentation Initiative (DDI) Fostering Community Engagement and Adoption Breakout 9 RDA Sixth Plenary, Paris Mary Vardigan, ICPSR, University.
New Solutions for Transnational Access and the Need for Proper Data Documentation NADDI , Vancouver (Canada) David Schiller, IAB Ingo Barkow,
Session 1 Module 1: Introduction to Data Integrity
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
©2007 Really Strategies, Inc. CONFIDENTIAL 1 Native XML Content Management Philadelphia XML Users’ Group.
1 Chapter 2 Database Environment Pearson Education © 2009.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Marion Wittenberg – DANS Merja Karjalainen – SND.
Welcome: To the fifth learning sequence “ Data Models “ Recap : In the previous learning sequence, we discussed The Database concepts. Present learning:
Writing a HOWTO Guide for DDI An approach for getting started.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
>> Metadata What is it, and what could it be? EU Twinning Project Activity E.2 26 May 2013.
Databases and DBMSs Todd S. Bacastow January
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
POOL persistency framework for LHC
IASSIST , Toronto (Canada)
Rogatus - Questionnaire and Metadata Management System
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.

Metadata Framework as the basis for Metadata-driven Architecture
An update on Rogatus Supporting the survey workflow with open standards and tools Ingo Barkow (DIPF) – Senior Researcher, Data Manager David Schiller (IAB)
Prepared by Peter Boško, Luxembourg June 2012
Modelling DDI-L into a combination of tools
EDDI Copenhagen (Denmark)
Chapter 2 Database Environment Pearson Education © 2009.
Reportnet 3.0 Database Feasibility Study – Approach
Oracle SQL Developer Data Modeler
Presentation transcript:

Representing and utilizing DDI in relational databases A new DDI best practices working paper Ingo Barkow, Senior researcher, Leibniz Institute for Educational Research and Educational Information (DIPF) David Schiller, Senior researcher, Institute for Employment Research (IAB)

Agenda Contributors Introduction Pros and cons of DDI in relational database systems Modeling DDI in relational databases Advanced cases Ensuring application compatibility An outlook to the future Q&A Representing and utilizing DDI in relational databases

Contributors The idea for this paper was formed at a workshop on mapping of DDI to relational databases in Frankfurt / Main in April 2011 Contributors are: Alerk Amin, CentERdata Ingo Barkow, Leibniz Institute for Educational Research and Educational Information (DIPF) Stefan Kramer, Cornell Institute for Social and Economic Research (CISER) David Schiller, Institute for Employment Research (IAB) Jeremy Williams, Cornell Institute for Social and Economic Research (CISER) Thanks to Jeremy Iverson (Colectica), Sansa Ionescu (University of Michigan) and Johanna Vompras (University of Bielefeld) for additional input Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Introduction Modern research needs a good documentation for reuse of data data merging international comparison of datasets DDI seems to be the most promising solution for standardized metadata documentation But DDI needs to be used practically (not only developed) Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Introduction Therefore DDI must be easy to implement and proof for future developments in the areas of data storage and data analysis Relational databases are a widely used and flexible solution for data storage Bringing DDI together with the capability of relational database systems will promote both data storage for the purpose of scientific research and the DDI standard itself This presentation and the underlying paper outlines the advantages and disadvantages of representing DDI in relational databases as an alternative to an XML structure. Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

DDI in RDBs – pros and cons Pros of relational databases in regards to DDI Structure is very good for rectangular files (e.g. SPSS or Stata) Easier combination between metadata and microdata by using the same storage structure (e.g. by referential integrity) Very common structure with high degree of optimization (e.g. indexes, file groups, stored procedures) Capability to store multiple studies in one database system (more opportunity for harmonization between studies) Internal independence of DDI version (can be adapted in the import and export processes on each individual version) Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

DDI in RDBs – pros and cons Pros of XML structures in regards to DDI XML is native to DDI therefore no compatibility issues (e.g. unknown nodes do not have necessarily to be processed) Hierarchical structure is difficult to model in relational databases Full set of DDI leads to a very complex relational database with heavy response times due to complex joins (nevertheless most DDI-XML implementations only use a subset) DDI-XML can easier be verified against the DDI schema An interesting approach is to use a hybrid relational database with XML acceleration or processing (e.g. enterprise databases like SQL Server or Oracle) Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Modelling DDI in RDBs The paper does not include a model relational database using DDI or direct implementation examples, because there are too many surrounding factors to give a complete model, e.g. Database engine (e.g. MySQL, Oracle, SQL Server) Agency requirements (e.g. DDI elements needed) Programming environment (e.g. PHP, Java, C#/.NET) Previous database knowledge or structures within the agency Old data which has to be migrated Therefore the paper is designed as a best practice guidebook derived out of the experiences in respective agencies Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Modelling DDI in RDBs The paper includes the following design best practices: DDI Elements XML Hierarchie References Recursive structures Substitution groups Controlled vocabularies Database Ids Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Advanced Cases Versioning (including late bound references) can be established the following way in a relational database Array of triggers on fitting tables Managed code / external programming Data warehouse technology (slowly changing dimensions) Modelling schemes which include another scheme Model relational database very similar to DDI-XML structure „Resolve“ all included schemes and only store the „complete“ version Two ways for multi language support Exporting translations into XLIFF files (XML translation standard) Direct injection from tables into DDI-XML files while exporting Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Advanced Cases Handling unknown or external elements in DDI can be constructed in several ways, e.g. RDB has a full set of DDI (therefore the problem does not occur) Discarding unknown elements while importing the XML-DDI structure RDB buffers unknown elements as strings or native XML (ideal solution in this case would be a database which can handle XML natively) Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Ensuring application compatibility Improving DDI-XML import and export mechanism by use of DDI Profiles Topic is important for all DDI related exchange processes (e.g. also between DDI- XML databases) DDI Profile is a collection of XPaths that describe the objects within DDI that are either used or not used for particular purposes Use of a DDI Profile is not mandatory, but when one is being used, it should be referenced in all of the DDI instances that conform to it Paper includes an XML example of this structure Structure is very useful for communication of applications between or within agencies Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

An outlook to the future DDI does not need to rely upon a particular technical representation, but is valuable as an abstract model as can be seen from previous experiences DDI 2 (until 2.5) was modeled as DTD DDI 3 (all versions) are modeled as XSD Many agencies support DDI as an import and export model, but internally use something different (e.g. relational databases or other repositories) Idea: the manifestation can be in different representations like UML or RDF Advantage: a technical representation can be generated out of the abstract model. Maybe a possible preparation for “DDI 4”? Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

The working paper The paper has been released on Friday, December 2nd, 2011 on the DDI website as part of the working paper series Please download it here: AndUtilizingDDIInRelationalDatabases.pdf AndUtilizingDDIInRelationalDatabases.pdf DOI: We would be happy for reviews, comments or other scientific discussions Göteborg, | Barkow & Schiller | 3rd European DDI Users Group Meeting (EDDI) 2011 Representing and utilizing DDI in relational databases

Any Questions? Representing and utilizing DDI in relational databases