Introduction to DDI Mogens Grosen Nielsen,

Slides:



Advertisements
Similar presentations
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Advertisements

Mogens Grosen Nielsen Statistics Denmark
Implementation of GSBPM, DDI and SDMX reference metadata at Statistics Denmark UNECE workshop 5-7 May 2015 Mogens Grosen Nielsen
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
 Name and organization  Have you worked with DDI before? (2 or 3)  If not, are you familiar with XML?  What kind of CAI systems do you use?  Goals.
The use and convergence of quality assurance frameworks for international and supranational organisations compiling statistics The European Conference.
Background Data validation, a critical issue for the E.S.S.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
Quality assurance activities at EUROSTAT CCSA Conference Helsinki, 6-7 May 2010 Martina Hahn, Eurostat.
IMS Proof of Concept for Data Capture using Metadata Bryan Fitzpatrick Rapanea Consulting Limited June 2014.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
Implementation of Eurostat Quality Declarations with Cost- Effective Use of Standards Q European conference on quality in statistics Vienna 2-5 June.
Statistical Metadata System in the State Statistical Committee Baku, Azerbaijan, 2013 State Statistical Committee of the Republic of Azerbaijan 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
SDMX IT Tools Introduction
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Aim: “to support the enhancement and implementation of the standards needed for the modernisation of statistical production and services”
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 1 Quality management Produced in Collaboration between.
13 November, 2014 Seminar on Quality Reports QUALITY REPORTS EXPERIENCE OF STATISTICS LITHUANIA Nadiežda Alejeva Head, Price Statistics.
1 European Statistics Code of Practice. I.Institutional Environment Principle II.Statistical processes Principle III.Statistical Output Principle.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
1 Recent developments in quality related matters in the ESS High level seminar for Eastern Europe, Caucasus and Central Asia countries Claudia Junker,
>> Metadata What is it, and what could it be? EU Twinning Project Activity E.2 26 May 2013.
Quality declarations Study visit from Ukraine 19. March 2015
Metadata models to support the statistical cycle: IMDB
UNECE-CES Work session on Statistical Data Editing
Prepared by: Galya STATEVA, Chief expert
The ESS vision, ESSnets and SDMX
Mogens Grosen Nielsen Statistics Denmark
SDMX Information Model
Documentation of statistics
Generic Statistical Business Process Model (GSBPM)
SDMX: A brief introduction
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
11. The future of SDMX Introducing the SDMX Roadmap 2020
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Statistics Denmark’s presentation of metadata
2. An overview of SDMX (What is SDMX? Part I)
Towards common metadata using GSIM and DDI 3
2. An overview of SDMX (What is SDMX? Part I)
Introduction to DDI and Colectica at Statistics Denmark
DDI-L in the Production of Official Statistics
SDMX Information Model: An Introduction
Colectica 5 A New Generation of Open Metadata Tools
Statistical Information Technology
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
The European Statistics Code of Practice - a Basis for Eurostat’s Quality Assurance Framework Marie Bohatá Deputy Director General, Eurostat ... Strategic.
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Mapping Data Production Processes to the GSBPM
Presentation to SISAI Luxembourg, 12 June 2012
The role of metadata in census data dissemination
Karin Blix, Statistics Denmark,
Metadata on quality of statistical information
M. Henrard, B5 N. Buysse and H. Linden, B6 Eurostat
Generic Statistical Information Model (GSIM)
Petr Elias Czech Statistical Office
Introduction to reference metadata and quality reporting
The Role of Metadata in Census Data Dissemination
ESS conceptual standards for quality reporting
GSBPM Giorgia Simeoni, Istat,
GSIM overview Mauro Scanu ISTAT
Lecture 1: Definition of quality in statistics
Presentation transcript:

Introduction to DDI Mogens Grosen Nielsen, Statistics Denmark, mgn@dst.dk Alessio Cardacino ISTAT, alcardac@istat.it ESTP Training Course “Information standards and technologies for describing, exchanging and disseminating data and metadata” Rome, 19–22 June 2018 THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Agenda Part 1 Objectives and program Metadata on stastistical information and processes Vision and strategy Users, principles and architecture Introduction to DDI Part 2 DDI use cases: DDI used for questionnaires DDI used to describe a unit dataset DDI used for Cubes DDI used for editing and reporting quality

The Vision Statistical information must help users in the “turbulent information-sea” Metadata about content and quality must a) help users in their knowledge processes b) give users precise information about our products International standards and standard software must enable a) Cost efficient solution with few resources b) Sustainable long term solutions c) Common terminology

Strategy on quality and metadata Fulfil user-needs, comply with quality requirements and increased efficiency Principles: a) Metadata integrated into GSBPM, b) reuse of metadata c) metadata used actively Standards: GSBPM, GSIM, DDI, SDMX/SIMS

Reusable and active metadata Active use and reuse of metadata requires improved understanding of the role of metadata in relation to users metadata in relation to production processes metadata-terminology

Users of a Statistical Metadata System

Business Principles – Code of Practice and Quality Assurance Framework Institutional environment P1: professional independence P2: mandate for data collection P3: adequacy of ressource P4: quality commitment P5: statistical confidentiality P6: impartiality and objectivity. Statistical procedures P7: sound methodology P8: appropriate statistical procedures P9: non-excessive burden on respondents P10: cost effectiveness. Statistical results P11: relevance P12: accuracy and reliability P13: timeliness and punctuality P14: coherence and comparability P15: accessibility and clarity

Principle 7. Sound methodology Indicator 7.1: The overall methodological framework used for European Statistics follows European and other international standards, guidelines, and good practices. Standard methodological document. The methodological framework and the procedures for implementing statistical processes are integrated into a standard methodological document and periodically reviewed. Explanation of divergence from international recommendations. Divergence from existing European and international methodological recommendations are explained and justified.

Principle 7. Sound methodology Indicator 7.2: Procedures are in place to ensure that standard concepts, definitions and classifications are consistently applied throughout the statistical authority Concepts, definitions, and classifications are defined by the Statistical Authority, are applied in accordance with European and/or national legislation and are documented A methodological infrastructure.

Principle 7. Sound methodology Indicator 7.4: Detailed concordance exists between national classifications systems and the corresponding European systems. Consistency of national classifications. National classifications are consistent with the corresponding European classification systems. Correspondence tables. Correspondence tables are documented and kept up-to-date. Explanatory notes or comments are made available to the public.

Principle 10 Cost effectiveness Indicator 10.4: Statistical authorities promote and implement standardized solutions that increase effectiveness and efficiency. Standardization programmes and procedures for statistical processes A strategy to adopt or develop standards. There is a strategy to adopt or develop standards in various fields e.g. quality management, process modeling, software development, software tools, project management and document management.

Principle 15. Accessibility and Clarity Indicator 15.1. Statistics and the corresponding metadata are presented, and archived, in a form that facilitates proper interpretation and meaningful comparisons. Dissemination policy Consultations of users about dissemination. Training courses for writing interpretations and press releases. A policy for archiving statistics and metadata.

Principle 15. Accessibility and Clarity Indicator 15.5. Metadata are documented according to standardized metadata systems. Dissemination of statistical results and metadata. Metadata linked to the statistical product Accordance of metadata with European Standards Metadata independent of the format of publication. Procedures to update and publish metadata Ability to clarify metadata issues. Training courses for staff on metadata..

Selected business Principles on metadata Reuse: Reuse metadata where possible for statistical integration as well as efficiency reasons Statistical business process model: Manage metadata with a focus on the overall statistical business process model (GSBPM) Active metadata: Metadata driven production ensures metadata are up-to-date

Business goals (for the on-going project) General purpose to support the modernization and integration of work at EU and national level through the use of GSBPM , GSIM , SDMX and DDI Specific objectives Improve and standardise work Improved metadata system through the use of GSBPM , GSIM , DDI and SDMX Improved exchange of statistical documentation with EU

METADATA DISSEMINA-TION Research portal Edit and use metadata (Subject matter) Edit and use metadata (Customer and research service) Integration in CMS (dst.dk) Solution concept METADATA INTERNAL METADATA DISSEMINA-TION Metadata portal at Statistics Denmark Intranet (internal metadataportal) Integration of metadata in applications Quality reporting to Eurostat

Enterprise architec-ture: Users, Business processes, applications and techno-logy

Simplified definition of statistical metadata (from SDMX glossary) Reference metadata: Conceptual metadata (e.g. definition of income) Methodological and processing metadata (e.g. description of data processing) Quality metadata (e.g. Availability) Structural metadata: Metadata act as identifiers and descriptors of the data (e.g. name on variables, dataset etc)

Information objects in GSIM

Selected information objects from GSIM*) *) From Standardisation of Variables and Concept Systems in European Social Statistics

Selected information objects from GSIM CONCEPT conceptual domain defined by Consists of Consists of VARIABLE CATEGORY LIST CATEGORY POPULATION UNITTYPE Person defined by defined by defined by Value domain described by REPRESENTED VARIABLE CODELIST representation decribed by representation decribed by representation decribed by Consist of CODE-ELEMENT IDENTIFICATION COMPOENT MEASURE-COMPONENT INSTANCE VARIABLE Cat LOGICAL RECORD UNITDATASET Defined by consists of UNIT-DATASTRUCTURE decribed by Consists of REGISTER

Introduction to DDI

DDI: Data Documentation Initiative What is it? Documentation standard, expressed in open XML standard Many years of experience including use in NSI’s Advantages Common language and understanding Integration of concepts, variables, classifications quality Both for schema and register based statistics Model currently used in Australia, New Zealand, Canada etc. (together with SDMX) Tools available

Why DDI Reusability in the definition of metadata Referenced metadata Support to: metadata banks (Questions, Variables, Codelists, Concepts,...) statistical metadata driven processes survey lifecycle statistical information discovery and documentation multilanguage approach in documenting metadata

Statistics and DDI in 60 seconds Study using Survey Instruments made up of measures about Concepts Questions Universes

Statistics and DDI in 60 seconds with values of Categories/ Codes, Numbers Variables Questions Dimensions Measures and attributes collect used for made up of Used for resulting in used for N-Cubes Data Files Responses

History Concept of DDI and definition of needs grew out of the data archival community Established in 1995 Members: Social Science Data Archives (US, Canada, Europe) Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance Membership based alliance Formalized development procedures

DDI-C and DDI-L DDI has 2 development lines DDI Codebook (DDI-C) DDI Lifecycle (DDI-L) Both lines will continue to be improved DDI-C focusing just on single study codebook structures DDI-L focusing on a more inclusive lifecycle model and support for machine actionability

Early DDI: Characteristics of DDI-C Focuses on the static object of a codebook Designed for limited uses End user data discovery via the variable or high level study identification (bibliographic) Only heavily structured content relates to information used to drive statistical analysis Coverage is focused on single study, single data file, simple survey and aggregate data files Variable contains majority of information (question, categories, data typing, physical storage information, statistics)

Limitations in DDI-C Treated as an “add on” to the data collection process Focus is on the data end product and end users (static) Limited tools for creation or exploitation The Variable must exist before metadata can be created Producers hesitant to take up DDI creation because it is a cost and does not support their development or collection process

DDI-L: Designed for Modern Metadata Systems DDI-L was designed to meet a broad set of requirements typical of modern practices for metadata management and use These practices involve: Centralization of metadata systems (registries, repositories) Emphasis on reuse of metadata for consistency and quality Leveraging metadata assets using “metadata-driven” systems and processes

DDI-L: model From DDI Alliance

Types of metadata in DDI-L Metadatatypes: Concepts (“terms”) Studies (“surveys”, “collections”, “data sets”, “samples”, “censuses”, “trials”, “experiments”, etc.) Variables – instance, represented and conceptual (“data elements”, “columns”) Codes & categories (“classifications”, “codelists”) Universes (“populations”, “samples”) N-Cube (“cubes”, “matrices”) Data files (“data sets”, “databases”) For questionnaires Survey instruments (“questionnaire”, “form”) Questions (“observations”) Responses

Identification, versioning and maintainability Identification and versioning a prerequisite for active use and reuse of metadata DDI (and SDMX) follow ISO 11179 All items has a global unique identifier composed of 1) Agency Identifier 2) Item Identifier and 3) Item Version E.g. Codelist: Agency: ‘dk.dst’ a a GUID (Global Unique Identifier) and a version

DDI in Colectica - a Glance

Use cases Use case study 1: How to build metadata for a simple questionnaire Use case study 2: How build metadata for a unit dataset Use case study 3: How to build metadata and data for an aggregated dataset using N-cube Use case study 4: How metadata can be used to support work on quality-reporting

Questions and discussion

Use case 1: metadata for implementing a questionnaire Credits: input from meeting at Eurostat July 2014 and part of presentations by Bryan Fitzpatrick and by Colectica

Why use metadata for questionnaires? Define metadata once Generate documentation PDF, Word, HTML Populate CAI systems Out of the box: Blaise, CASES, CSPro, RedCAP, queXML Custom systems: possible with addins

Using DDI Metadata for Questionnaires DDI has metadata for Questions a simple question goes in a Question Item What is your age in years? a complex question goes in a Multiple Question Item Did you do paid work last week? Full Time or Part Time? How many hours? A Multiple Question Item can contain Question Items or other Multiple Question Items

Using DDI Metadata for Questionnaires Questions can link to one or more Concepts to indicate what the question is seeking to cover Age, Sex, Country, Income, Occupation, ... perhaps to qualify what is being covered E.g. Non-farm income, Tertiary qualifications

Using DDI Metadata for Questionnaires Questions have: Name just a multi-lingual name, not used in questionnaires Text the question that is asked can be conditional, multi-lingual, formatted can even have mixed language Question Intent some elaboration about what is being sought multi-lingual, formatted

Using DDI Metadata for Questionnaires Questions have Response Domains what sort of answer is expected or valid Numeric domain can specify integer of decimal, valid formats and ranges, etc Text domain can specify format, length Category Domain valid list of multi-lingual values not really very much use Code Domain valid list of multi-lingual values with codes a classification

Using DDI Metadata for Questionnaires Questions do not go directly into a questionnaire DDI calls a questionnaire an Instrument questions constitute a library available for use a “Question Bank” questions are selected and assembled into an Instrument the assembling of questions is done with Control Constructs an Instrument identifies a single Control Construct that builds the questionnaire

Control constructs Control Constructs are the critical component in building a questionnaire they select the questions they control the flow of the questions branching and looping they insert non-question text “Now I want to ask you about other people in the household” they can compute values they link to Interviewer Instructions structured DDI Interviewer Instructions unstructured external interviewer instructions material

Control constructs Several types of Control Constructs Question Construct selects a Question Item or Multiple Question Item Sequence selects a sequence of other control constructs of any type If-Then-Else defines an If condition with optional ElseIf clauses (multiple) and optional Else clause each condition selects a single Control Construct to include

Control constructs Several types of Control Constructs Loop, Repeat-Until, Repeat-While E.g. to loop over people in a household Statement Item inserts non-question multi-lingual text (conditional, formatted) Computation Item a calculation in some language that is assigned to a Variable

Instrument Identifies a single Control Construct to assemble the questionnaire probably a Sequence construct Instruments can have multiple Software specifications basically just identifying “software” used with instrument Colectica: generate code for Blaise, Redcap etc

Interviewer instructions A formal DDI metadata type Organised, structured instructions formatted multi-lingual text may be conditional May link to external, non-DDI material E.g. PDF, Word documents

Interviewer instructions A formal DDI metadata type Organised, structured instructions formatted multi-lingual text may be conditional May link to external, non-DDI material E.g. PDF, Word documents

Questionnaire template *from UNECE

DDI modelling in practice: study unit *from UNECE

DDI questionnaire modelling in practice: resource package *from UNECE

DDI questionnaire modelling in practice: module and submodule *from UNECE

DDI questionnaire modelling in practice: statements Comment Instruction *from UNECE

DDI questionnaire modelling in practice: statements Help Warning *from UNECE

DDI modelling in practice: statements Conditional statement *from UNECE

DDI modelling in practice: questions with a single response domain *from UNECE

DDI modelling in practice: questions with a multiple response domain *from UNECE

DDI modelling in practice: questions with a single choice *from UNECE

Steps for creating and publishing a questionnaire Create check-out and go to metadata package Define concepts (i.e. Gender, Age, Education level and Schooltype) Define categories and codes (used as response domains) Create questions and insert reference to response domains Create instrument and insert defined questions in a simple sequence Connect questions to concepts Generate documentation (for survey designer etc) Show in portal Publish survey: Paper form, Blaise etc

Use case study 2: metadata for unit dataset Credits: input from presentations by Colectica from European DDI conference, Copenhagen, 2015

Variable cascade in GSIM, DDI and Colectica ConceptualVariable RepresentedVariable Variable Variable

Selected elements from DDI

Logical record A Logical record consists of a sequence of Variables that groups data values for a purpose Data from a questionnaire goes into one or more Logical Records. Logical Records can be linked. E.g. Households and Persons Logical Records are independent of any storage or stored format

Physical Instance Holds information about actual data sets produced links to Physical Structures, Record Layouts, and Logical records provides a central management of data from a collection Physical Instance used to manage data

Simple classifications and code lists DDI holds Classifications as linked Code Schemes and Category Schemes a Category Scheme is a list of Categories flat list of multi-lingual names and descriptions e.g., Country names, Occupation names, etc a Code Schemes selects Categories from Category Schemes, assigns a Code (not multi-lingual), and may specify a hierarchy a Code Scheme may select Categories from multiple Category Schemes multiple Code Schemes may select the same Categories

GSIM compliant classification The GSIM Classification model was drawn from the terminology in the Neuchâtel model In 2012, the first GSIM model including classifications was released. Version 1.0. In December 2013, a version 1.1 update to GSIM was released The Neuchâtel model is now an annex to the GSIM model, and released with it

Codebook example

Codebook example

Study description

Study Description: NESSTAR Publisher

Study Description: NESSTAR Publisher

Study Description: NESSTAR Publisher

File Description: Variables groups

File Description: Variables groups - NESSTAR Publisher

Variable description

Variable description NESSTAR Publisher

Use case 3: DDI used for cubes Credits: input from presentations by Colectica from European DDI conference, London, 2014

3 Dimensional NCube

2 Dimensional NCube

Properties of an aggregate Dimensions Measures Attributes Can append footnotes to the aggregate Attach to the overall structure or to individual cells or to groups of cells

Ncubes and variables NCube: re-usable definition of an aggregate structure Dimensions ordered list of Variable references Measures List of measures for each intersection of Dimensions Variable reference Type (count, %, mean, etc.) Attributes Attributes that are applicable to re-usable NCube definition

Use case 4: exchange of reference metadata

Single Integrated Metadata Structure (SIMS) and reporting formats: ESMS and ESQRS

Single Integrated Metadata Structure (SIMS) and reporting formats: ESMS and ESQRS ESMS: European SDMX Metadata Structure Oriented towards Users ESQRS: European standard for Quality Report Structure

Where do I find more information about DDI? DDI-alliance (www.ddialliance.org) Specification: find user-guide, technical documentation guide and online-field documentation and more (both on DDI-L and DDI-C) Tools: find tools searching by purpose, DDI version and availability Training: find use-cases, glossary etc Colectica support (www.colectica.com/support) Find information about colectica tools, how to manage content in Colectica Designer etc.