Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

DDI Specification: Current Status and Outlook Wendy Thomas Arofan Gregory NADDI 2013.
Foundational Objects. Areas of coverage Technical objects Foundational objects Lessons learned from review of Use Case content Simple Study Simple Questionnaire.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Inside View of DDI Version 3.0: Structural Reform Group Report Presented to IASSIST 25 May 2005 Edinburgh Scotland UK.
1 ISO – Metadata Next Generation International consensus being built on structured metadata within a broader Geomatics Standard under ISO Technical.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Codebook Centric to Life-Cycle Centric In the beginning….
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Reducing Metadata Objects Dan Gillman November 14, 2014.
Framework for Model Creation and Generation of Representations DDI Lifecycle Moving Forward.
The Data Cube Vocabulary: Statistics in the Web of Linked Data Arofan Gregory Open Data Foundation WICS, Geneva, 5-7 May 2015.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
Background Data validation, a critical issue for the E.S.S.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
GSIM Stakeholder Interview Feedback HLG-BAS Secretariat January 2012.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Practical RDF Chapter 1. RDF: An Introduction
DDI Lifecycle: Moving Forward Outcome of the Recent Workshop in Dagstuhl Joachim Wackerow.
Generic Statistical Information Model (GSIM) Thérèse Lalor and Steven Vale United Nations Economic Commission for Europe (UNECE)
DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
Introduction to MDA (Model Driven Architecture) CYT.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Secure Epidemiology Research Platform (SERPent) Kick Start Meeting - April 15 th, 2010 Pascal Heus
Common Terminology Services 2 CTS 2 Submission Team Status Update HL7 Vocabulary Working Group May 17, 2011.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
DDI Discovery: An Overview of Current RDF Vocabularies Arofan Gregory Metadata Technologies NA Joachim Wackerow GESIS.
Introduction to DDI Moving Forward Dagstuhl Seminar October 22-26, 2012.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing.
The Model-Driven DDI Approach Arofan Gregory, Jon Johnson, Flavio Rizzolo, Marcel Hebing.
The Data Documentation Initiative (DDI) Fostering Community Engagement and Adoption Breakout 9 RDA Sixth Plenary, Paris Mary Vardigan, ICPSR, University.
DDI Specifications Recent Developments Wendy Thomas Joachim Wackerow Technical Committee EDDI Copenhagen.
Aim: “to support the enhancement and implementation of the standards needed for the modernisation of statistical production and services”
Wendy Thomas – DDI Technical Implementation Committee (TIC)
Strategic Priorities for DDI Spring 2013 Mary Vardigan Director, DDI Alliance METIS -- Geneva, Switzerland May 6, 2013.
Welcome To the DDI Model Review and Sprint! Schloss Dagstuhl October 19-23, 2015.
DDI Moving Forward: Sprint 1 Schloss Dagstuhl October 28-November 1, 2013.
Development of Methodology Model. Coverage Part I – Development of the Model from Weighting (SDI model) through current refinements in March 2016 Part.
Writing a HOWTO Guide for DDI An approach for getting started.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
DDI and GSIM – Impacts, Context, and Future Possibilities
DDI - A Metadata Standard for the Community
Next-Generation DDI: Building Model-Based DDI 4
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
How can DDI make the most of RDF?
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Metadata Framework as the basis for Metadata-driven Architecture
Managing Private and Public Views of DDI Metadata Repositories
Semantic Statistics DDI Lifecycle: Moving Forward Outcome of the Recent Workshops in Dagstuhl Joachim Wackerow.
Generic Statistical Information Model (GSIM)
Introducing the Data Documentation Initiative
DDI and GSIM – Impacts, Context, and Future Possibilities
Presentation transcript:

Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015

Background DDI Alliance building a model-driven specification Goals – Extend Lifecycle with new content – Expand domain coverage of DDI – Render the specification in new ways – Reduce complexity of DDI – Develop DDI more sustainably

Activity: Events 2012 – Foundational Dagstuhl Seminar 2013 – Sprint 1 at Dagstuhl 2013 – Sprint 2 at EDDI in Paris 2014 – Sprint 3 at NADDI in Vancouver 2014 – Sprint 4 at IASSIST in Toronto 2014 – Sprint 5 at Dagstuhl (included citation work) 2014 – Sprint 6 at EDDI in London 2015 – Sprint 7 at IASSIST in Minneapolis

DDI Alliance Work Products DDI-Codebook – DDI-Codebook is an XML structure for describing codebooks (or data dictionaries), for a single activity (called a study), statistical or scientific. DDI-Lifecycle – DDI-Lifecycle (also expressed in XML) expands on the coverage of a single study to include the data lifecycle and can describe several waves of data collection, and even ad hoc collections of data sets grouped for the purposes of comparison. It is very useful when dealing with serial data collection as is often seen in data production within statistical offices. Long-standing research projects are similar. RDF Vocabularies – The RDF Vocabularies for the Semantic Web currently cover three areas: one for describing statistical classifications, one for describing data sets for the purposes of discovery on the Web, and one for describing the structures of individual data sets. These vocabularies are based on the DDI-Codebook and DDI-Lifecycle XML structures, but are identical to neither. They are designed to be used in combination, and as such are non-duplicative. Controlled Vocabularies - The DDI Controlled Vocabularies are recommended sets of terms and definitions. They are used to describe various types of activities or artefacts which exist within the DDI- Codebook, DDI-Lifecycle, and DDI-RDF Discovery structures.

Why a Model-Driven Standard? There are many different ways of implementing DDI: – As XML – As RDF – In application languages (Java, C#, etc.) – In relational databases – In other types of databases/languages (JSON, Hadoop, etc.) A model is “neutral” to any of these A model can be expressed as any of these This is current best practice for standards development – Makes the standard easier to understand and implement – Emphasizes complete documentation of the classes in the model

Content Coverage Commitment to cover everything in DDI-Lifecycle Expansion to include: – Processing model – Qualitative data – Methodology and data management planning (study inception) – Broader range of quantitative data capture activities Register data Bio-markers Event data

Library Packages and Functional View Library Package – a group of classes within the Library related by a theme. The Class Library for DDI 4 encompasses the entire DDI 4.0 model, but without any specific schemas or vocabularies for Functional Views. Classes contain primitives and extended primitives and are the building blocks used to construct the Functional Views. Functional View – a set of classes that are needed to perform a specific task. It primarily consists of a set of references to specific versions of classes. Functional Views are the method used to restrict the portions of the model that are used, and as such they function very much like DDI profiles in DDI 3.*.

Library and Functional Views DDI 4 Library Functional Views References RDF Vocabulary XML Schema

Current Review: Laying the Building Blocks The content of the first review focuses on the building blocks needed to provide the foundation for more complex coverage Primitives and Complex Data Types – A consistent means of describing multi-language string – Commonly uses constructs such as Label, Description, and Definition Standard structures – Base for Identifiable classes – How to create collections (groups of classes) – Process model

Conceptual Base of Data Concepts, Objects, and Enumeration – How concepts and objects are described – How a concept is enumerated in relationship to a specific object Conceptual Variable, Represented Variable, and Instance Variable – Moving from abstraction to application – Important for supporting longitudinal studies and intended comparability

“Simple to Complex” We start with the basics – Everyone needs to describe things for understanding – There are basic requirement needed to understand data We move to the complex – Some need to describe additional detail – Some need to prescribe activities in a machine actionable way By building up from same basic structures we recognize the commonality of data from a wide array of sources

Roadmap Out for Review Content for Release Q Agent Agent (Library Package). Agent (View), ComplexDataTypes (Library Package), Conceptual (Library Package), Core Process (Library Package), Discovery (Library Package), Discovery (View), Identification (Library Package), Primitives (Library Package), Representations (LibraryPackage), Utility (Library Package) ConceptualCore ProcessDiscovery Q ClassificationClassification (View), Simple Instrument (View), Simple Data Description (View), Data Capture (Library Package), HistoricalProcess (LibraryPackage), PrescriptiveProcess (LibraryPackage)Simple InstrumentSimple Data Description Q Methodology, Simple Codebook, Enhanced Citation, Complex Instrument, Complex Data DescriptionSimple CodebookEnhanced CitationComplex InstrumentComplex Data Description As supported Data Management PlanData Management Plan, Comparison/Harmonization, Study Inception, Data Collection and Processing, Collection ManagementComparison/HarmonizationStudy InceptionData Collection and ProcessingCollection Management

Developing Functional Views Functional Views reflect the needs of sets of related Use Cases – Basic (or Simple) Codebook – Questionnaire Design – Classification Management – Managing Organizations, Individuals, and other Agents Working groups are developed around sets of Use Cases Working groups determine what metadata is needed – Select from existing classes for inclusion – Draft new classes or extension of an existing class to meet the needs of the Use Case

Modelers Group and Technical Committee Members of the Modelers Group work with the groups designing Functional Views to assist in design and identification of appropriate classes Modelers Group is responsible for – Developing modeling rules – Ensuring smooth integration into the model as a whole – Adherence to modeling rules – Resolving issues between and within functional groups Technical Committee ensures cohesion with earlier DDI specification and prepares the materials provided by the Modelers Group for review – Manages the review process and disposition of comments

Production Process The model is a simple profile of the UML notation for formal models – It has complete documentation and definition of all classes – It shows all properties and relationships of classes Once complete, the model uses automated transformations to – Create end-user documentation – Create XML schemas – Create OWL declarations for RDF vocabularies

View Content modellers define views, but create needed new objects. Formal modellers create Packages in the library.

The DDI Modeling Approach Powerful, generic modeling “patterns” are used to ensure consistency of modeling – Process model (for historical and prescriptive models, instrument flows) – Collection model (for all types of grouping) – Customization meta-model (for all types of self- describing metadata) Generic models are refined for specific uses Everything in the model must be “real” – Absence of packaging, unlike DDI 3.*

DDI 4 XML The style of DDI XML will be very different – Less deeply nested – Only one place for any given type of “class” (XML element) to be found in any given XML schema – Very consistent due to auto-generation from the model Each Functional View is its own XML document There will be an “uber-view” – an XML schema for the entire library The XML bindings from the model are now being tested in earnest as of Q1 Review – Very likely to need some adjustments

DDI 4 RDF Vocabularies Each functional view will be packaged as an RDF vocabulary described using OWL All classes, relationships, and properties will follow the DDI 4 model exactly – Some classes/relationships/properties will be mapped to common pre-existing RDF vocabularies (FoaF, DCAT, etc.) As for the XML schemas, the binding to RDF is only now being seriously tested, and may need adjustment (as of Q1 Review)

Other Technical Goodness Standard web services interfaces for DDI applications – For generic querying and retrieval of DDI metadata in XML form New possibilities for customized metadata – Will be described and exchanged using meta-model structures (similar to SDMX and GSIM) – “Custom” metadata can be made interoperable Strong alignment of model with other relevant models/vocabularies – Generic Statistical Information Model (GSIM)

Thanks!