DDI - A Metadata Standard for the Community

Slides:



Advertisements
Similar presentations
3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:
Advertisements

The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Status on the Mapping of Metadata Standards
ICPSR-SRO Shared Data Model Project Mary Vardigan Director, DDI Alliance.
Building an Operational Enterprise Architecture and Service Oriented Architecture Best Practices Presented by: Ajay Budhraja Copyright 2006 Ajay Budhraja,
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
DDI Lifecycle: Moving Forward Outcome of the Recent Workshop in Dagstuhl Joachim Wackerow.
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Metadata Management and Tools August 1, 2013 Data Curation Course.
InSPIRe Australian initiatives for standardising statistical processes and metadata Simon Wall Australian Bureau of Statistics December
Secure Epidemiology Research Platform (SERPent) Kick Start Meeting - April 15 th, 2010 Pascal Heus
DDI and the Lifecycle of Longitudinal Surveys Larry Hoyle, IPSR, Univ. of Kansas Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences.
DDI Discovery: An Overview of Current RDF Vocabularies Arofan Gregory Metadata Technologies NA Joachim Wackerow GESIS.
Introduction to DDI Moving Forward Dagstuhl Seminar October 22-26, 2012.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
SDMX IT Tools Introduction
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
The Data Documentation Initiative (DDI) Fostering Community Engagement and Adoption Breakout 9 RDA Sixth Plenary, Paris Mary Vardigan, ICPSR, University.
DDI Specifications Recent Developments Wendy Thomas Joachim Wackerow Technical Committee EDDI Copenhagen.
Strategic Priorities for DDI Spring 2013 Mary Vardigan Director, DDI Alliance METIS -- Geneva, Switzerland May 6, 2013.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Metadata standards Using DDI to Inform, Organize, and Drive Survey Data Production.
European Monitoring Platform for Mapping of QoS and QoE
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
DDI and GSIM – Impacts, Context, and Future Possibilities
Achievements in 2016 Data Integration Linked Open Metadata
The ESS vision, ESSnets and SDMX
Contents Introducing the GSBPM Links to other standards
Streamlining the Statistical Production in TurkStat Metadata Studies in TURKSTAT High Level Seminar for Eastern Europe, Caucasus and Central Asia Countries.
Capacity Building Enhance the coordination of efforts to strengthen individual, institutional and infrastructure capacities, particularly in developing.
Country use cases: Cambodia, and Tunisia
Next-Generation DDI: Building Model-Based DDI 4
.Stat Suite built by the SIS-CC
Textbook Engineering Web Applications by Sven Casteleyn et. al. Springer Note: (Electronic version is available online) These slides are designed.
Panel – The Generic Longitudinal Business Process Model NADDI 2013
11. The future of SDMX Introducing the SDMX Roadmap 2020
GSBPM, GSIM, and CSPA.
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
2. An overview of SDMX (What is SDMX? Part I)
Malte Dreyer – Matthias Razum
Question Banks, Reusability, and DDI 3.2 (Use Parameters)
The Generic Statistical Business Process Model
Introducing the GSBPM Steven Vale UNECE
Jisc Research Data Shared Service (RDSS)
Bird of Feather Session
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
MSDI training courses feedback MSDIWG10 March 2019 Busan
Exchanging Data Management Plans with DDI
Presentation to SISAI Luxembourg, 12 June 2012
Semantic Statistics DDI Lifecycle: Moving Forward Outcome of the Recent Workshops in Dagstuhl Joachim Wackerow.
The role of metadata in census data dissemination
Introducing the Data Documentation Initiative
QoS Metadata Status 106th OGC Technical Committee Orléans, France
Data Architecture project
Australian and New Zealand Metadata Working Group
DDI and GSIM – Impacts, Context, and Future Possibilities
Palestinian Central Bureau of Statistics
Presentation transcript:

DDI - A Metadata Standard for the Community Mary Vardigan ICPSR – Inter-university Consortium for Political and Social Research Joachim Wackerow GESIS – Leibniz Institute for the Social Sciences 1st North American DDI User Conference April 2-3, 2013 – Lawrence, Kansas

Today’s Presentation -- “State of the DDI“ What is DDI? Benefits and Principles Barriers and Challenges DDI of the Future How You Can Help

Application using parts of DDI What is DDI? Application using parts of DDI Reality Survey/ Register Documentation Structured Metadata in DDI

A Project of and for the Community Developed by a self-sustaining membership Alliance recently reconfigured with new Bylaws Two major development lines DDI Codebook DDI Lifecycle Metadata for both human and machine consumption Additional specifications: Controlled vocabularies RDF vocabularies for use with Linked Data DDI Alliance members having a voice in shaping and developing the standards. include data archives, data producers, data distributors, research centers, national statistical institutes, and software developers. DDI Codebook reflects the content of a traditional archival codebook and encompasses high-level study information, detailed information on variables, and support for aggregate data and basic geography. DDI Lifecycle takes a broader approach, documenting the full life cycle of research data, with an emphasis on metadata reuse from the inception of a study through data publication and beyond. Human consumption: descriptive metadata permit human researchers to find, understand, and assess the quality of the data. Machine consumption: structural and administrative metadata in the DDI specification are intended to enable software processes to read, manipulate, and exchange data files

Primary Benefits Rich content (currently over 800 items) Metadata reuse across the life cycle Machine-actionability Data management and curation Support for longitudinal data and comparison Support for preservation and platform-independent software Support for a global network Rich content Metadata reuse across the life cycle Machine-actionability Data management and curation metadata serve as the bridge between the data producer and the researcher importance of good metadata in ensuring long-term access to data Support for longitudinal data and comparison Focus on comparison, both by design and after-the-fact (harmonization) Generic Longitudinal Business Process Model (GLBPM) Support for preservation and platform-independent software expressed in the mark-up language XML, which is stored in the system-independent encoding standard Unicode Support for a global network Globally unique identifier of major metadata items enables metadata sharing in a network, i.e. classifications in a repository.

Metadata Reuse

Machine-Actionability

Longitudinal Data and Comparison Ingo Barkow, William Block, Jay Greenfield, Arofan Gregory, Marcel Hebing, Larry Hoyle, Wolfgang Zenk-Möltgen. “Generic Longitudinal Business Process Model.” http://dx.doi.org/10.3886/DDILongitudinal05

Paradigm Shift Using structured metadata in standardized form -- a paradigm shift for social science research infrastructure We might compare this to the change in the 1990s when people began to understand the value of linked documents

Distributed Approach Shift from centralized maintenance and control of metadata by the researcher or archive to a distributed approach More efficient and easier to capture information about an event at the time of its occurrence rather than after the fact

Barriers and Challenges Complexity Level of researcher buy-in Need for tools Changes in workflow Access to metadata Complexity The complexity is not a specific property of the standard but of the area that the standard describes. DDI Lifecycle is intended for machine-processing and thus must be approached from the perspective of a technologist. lack of data model, only implicit in the XML Schemas Documentation should address different audiences: study designer, data producer, software designer, software programmer Low level of researcher buy-in Primary goal are data and analyses not metadata Metadata as often received more as “friction” of the research process high quality metadata in an accessible form could be the “lubrication” of the research process Lack of tools No easy-to-use standard software available like Word for text processing Several tools are only developed for use in one institution Established and emerging tools: Nesstar as data dissemination portal, Colectica as metadata editor, QDDS as Blaise add-on for CAI, StatTransfer for transformation of metadata between statistical packages Software has to address the complexity of the field. Market of metadata tools is small. Rapid development by big commercial software companies is unlikely. Changes in workflow “After-the-fact” documentation requires substantial resources and typically leads to a considerable amount of information loss and sparsely documented data. A change in typical workflow patterns to integrate documentation when it happens at the source is needed. Access to metadata Currently free access to metadata is not common practice. Institutions need to create policies to provide metadata for public reuse. Appropriate licenses are required. Standardized technical interfaces for enabling access for programs to metadata repositories like question banks is also required.

Measuring Success Adoption around the world Projects range from the General Social Survey in the U.S., to the Research Data Centres in Canada, to Statistics New Zealand Through IHSN, DDI is now being used in over 70 countries DDI is a centerpiece of proposed CESSDA-ERIC (European Research Infrastructure Consortium)

Global DDI

Moving Forward: RDF Vocabularies for Semantic Web DDI-RDF Discovery Vocabulary For publishing metadata about datasets into the Web of Linked Data Based on DDI Codebook and DDI Lifecycle XKOS RDF vocabulary for describing statistical classifications, which is an extension of the popular SKOS vocabulary Publication expected in second half of 2013 XKOS SKOS - Simple Knowledge Organization System

RDF Discovery Vocabulary XKOS SKOS - Simple Knowledge Organization System

DDI of the Future Robust and persistent data model (for the metadata), with extension possibilities, variety of technical expressions Complete data life cycle coverage Broadened focus for new research domains Simpler specification that is easier to understand and use including better documentation

Why Now? Experiences in developing and using the initial DDI Lifecycle structure Lack of a data model proved limiting in: Further improvement of the development line 3.* Development of sustainable software based on different versions of DDI Pressure for changes from several directions at once New content from substantive working groups Being approached by new communities

Specific Changes Envisioned I Abstraction of data capture/collection/source to handle different types of data New content on sampling, survey implementation, weighting, and paradata New content developed by the Qualitative Working Group Framework for data and metadata quality Framework for access to data and metadata Current data collection module is questionnaire-centric. Register data and data in the natural and health sciences (i.e., from technical devices or from laboratory analyses).

Specific Changes Envisioned II Process (work flow) description across the data life cycle, including support for automation and replication Integration with GSBPM/GSIM, SDMX, CDISC, Triple-S Disclosure review and remediation Data management planning Development of standard queries and/or interface specifications (such as REST), needed to enable interoperable services

High-level Design Goals I Coming out of Dagstuhl 2012 workshop: Interoperability and Standards Simplicity User Driven Terminology Iterative Documentation Lifecycle Orientation Interoperability and Standards Simplicity – – The model is optimized to facilitate interoperability with other relevant standards. User Driven – User perspectives inform the model to ensure that it meets the needs of the international DDI user community. Terminology – The model uses clear terminology and when possible, uses existing terms and definitions. Iterative Development – The model is developed iteratively, bringing in a range of views from the user community. Documentation – The model includes and is supplemented by robust and accessible documentation. Lifecycle Orientation – The model supports the full research data lifecycle and the statistical production process, facilitating replication and the scientific method.

High-level Design Goals II Reuse and Exchange Modularity Stability Extensibility Tool Independence Innovation Actionable Metadata Reuse and Exchange – The model supports the reuse, exchange, and sharing of data and metadata within and among institutions. Modularity – The model is modular and these modules can be used independently. Stability – The model is stable and new versions are developed in a controlled manner. Extensibility – The model has a common core and is extensible. Tool Independence – The model is not dependent on any specific IT setting or tool. Innovation – The model supports both current and new ways of documenting, producing, and using data and leverages modern technologies. Actionable Metadata – The model provides actionable metadata that can be used to drive production and data collection processes.

DDI in Relation to GSIM and Other Standards

Alignment with GSIM Simplified view of GSIM Objects

DDI Development Lines DDI Codebook DDI Lifecycle DDI 2.1 (DTD) DDI 2.5 (XML Schema) DDI Lifecycle DDI 3.1, 3.2 (XML Schema) DDI X (model-based) Bindings as XML Schema RDF Data as a Service, Service Oriented Architecture (SOA), Web services, REST Recommendations for schemas for database management systems

Time Line Moving Forward New work initiated at Dagstuhl workshops on “Moving Forward” in Autumn 2012 Planned to continue in October 2013 Ongoing virtual work involving the community in progress – roadmap developed yesterday User stories and core being defined Mapping among DDI, SDMX, GSIM under way First draft of model expected by end of 2014

You Can Contribute! Please join us on a working group Participate in technical work, including modeling Join the DDI Alliance!