Download presentation
Presentation is loading. Please wait.
1
Introduction to DDI Mogens Grosen Nielsen,
Statistics Denmark, Alessio Cardacino ISTAT, ESTP Training Course “Information standards and technologies for describing, exchanging and disseminating data and metadata” Rome, 19–22 June 2018 THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
2
Agenda Part 1 Objectives and program
Metadata on stastistical information and processes Vision and strategy Users, principles and architecture Introduction to DDI Part 2 DDI use cases: DDI used for questionnaires DDI used to describe a unit dataset DDI used for Cubes DDI used for editing and reporting quality
3
The Vision Statistical information must help users in the “turbulent information-sea” Metadata about content and quality must a) help users in their knowledge processes b) give users precise information about our products International standards and standard software must enable a) Cost efficient solution with few resources b) Sustainable long term solutions c) Common terminology
4
Strategy on quality and metadata
Fulfil user-needs, comply with quality requirements and increased efficiency Principles: a) Metadata integrated into GSBPM, b) reuse of metadata c) metadata used actively Standards: GSBPM, GSIM, DDI, SDMX/SIMS
5
Reusable and active metadata
Active use and reuse of metadata requires improved understanding of the role of metadata in relation to users metadata in relation to production processes metadata-terminology
6
Users of a Statistical Metadata System
7
Business Principles – Code of Practice and Quality Assurance Framework
Institutional environment P1: professional independence P2: mandate for data collection P3: adequacy of ressource P4: quality commitment P5: statistical confidentiality P6: impartiality and objectivity. Statistical procedures P7: sound methodology P8: appropriate statistical procedures P9: non-excessive burden on respondents P10: cost effectiveness. Statistical results P11: relevance P12: accuracy and reliability P13: timeliness and punctuality P14: coherence and comparability P15: accessibility and clarity
8
Principle 7. Sound methodology
Indicator 7.1: The overall methodological framework used for European Statistics follows European and other international standards, guidelines, and good practices. Standard methodological document. The methodological framework and the procedures for implementing statistical processes are integrated into a standard methodological document and periodically reviewed. Explanation of divergence from international recommendations. Divergence from existing European and international methodological recommendations are explained and justified.
9
Principle 7. Sound methodology
Indicator 7.2: Procedures are in place to ensure that standard concepts, definitions and classifications are consistently applied throughout the statistical authority Concepts, definitions, and classifications are defined by the Statistical Authority, are applied in accordance with European and/or national legislation and are documented A methodological infrastructure.
10
Principle 7. Sound methodology
Indicator 7.4: Detailed concordance exists between national classifications systems and the corresponding European systems. Consistency of national classifications. National classifications are consistent with the corresponding European classification systems. Correspondence tables. Correspondence tables are documented and kept up-to-date. Explanatory notes or comments are made available to the public.
11
Principle 10 Cost effectiveness
Indicator 10.4: Statistical authorities promote and implement standardized solutions that increase effectiveness and efficiency. Standardization programmes and procedures for statistical processes A strategy to adopt or develop standards. There is a strategy to adopt or develop standards in various fields e.g. quality management, process modeling, software development, software tools, project management and document management.
12
Principle 15. Accessibility and Clarity
Indicator Statistics and the corresponding metadata are presented, and archived, in a form that facilitates proper interpretation and meaningful comparisons. Dissemination policy Consultations of users about dissemination. Training courses for writing interpretations and press releases. A policy for archiving statistics and metadata.
13
Principle 15. Accessibility and Clarity
Indicator Metadata are documented according to standardized metadata systems. Dissemination of statistical results and metadata. Metadata linked to the statistical product Accordance of metadata with European Standards Metadata independent of the format of publication. Procedures to update and publish metadata Ability to clarify metadata issues. Training courses for staff on metadata..
14
Selected business Principles on metadata
Reuse: Reuse metadata where possible for statistical integration as well as efficiency reasons Statistical business process model: Manage metadata with a focus on the overall statistical business process model (GSBPM) Active metadata: Metadata driven production ensures metadata are up-to-date
15
Business goals (for the on-going project)
General purpose to support the modernization and integration of work at EU and national level through the use of GSBPM , GSIM , SDMX and DDI Specific objectives Improve and standardise work Improved metadata system through the use of GSBPM , GSIM , DDI and SDMX Improved exchange of statistical documentation with EU
16
METADATA DISSEMINA-TION
Research portal Edit and use metadata (Subject matter) Edit and use metadata (Customer and research service) Integration in CMS (dst.dk) Solution concept METADATA INTERNAL METADATA DISSEMINA-TION Metadata portal at Statistics Denmark Intranet (internal metadataportal) Integration of metadata in applications Quality reporting to Eurostat
17
Enterprise architec-ture: Users, Business processes, applications and techno-logy
18
Simplified definition of statistical metadata (from SDMX glossary)
Reference metadata: Conceptual metadata (e.g. definition of income) Methodological and processing metadata (e.g. description of data processing) Quality metadata (e.g. Availability) Structural metadata: Metadata act as identifiers and descriptors of the data (e.g. name on variables, dataset etc)
19
Information objects in GSIM
20
Selected information objects from GSIM*)
*) From Standardisation of Variables and Concept Systems in European Social Statistics
21
Selected information objects from GSIM
CONCEPT conceptual domain defined by Consists of Consists of VARIABLE CATEGORY LIST CATEGORY POPULATION UNITTYPE Person defined by defined by defined by Value domain described by REPRESENTED VARIABLE CODELIST representation decribed by representation decribed by representation decribed by Consist of CODE-ELEMENT IDENTIFICATION COMPOENT MEASURE-COMPONENT INSTANCE VARIABLE Cat LOGICAL RECORD UNITDATASET Defined by consists of UNIT-DATASTRUCTURE decribed by Consists of REGISTER
22
Introduction to DDI
23
DDI: Data Documentation Initiative
What is it? Documentation standard, expressed in open XML standard Many years of experience including use in NSI’s Advantages Common language and understanding Integration of concepts, variables, classifications quality Both for schema and register based statistics Model currently used in Australia, New Zealand, Canada etc. (together with SDMX) Tools available
24
Why DDI Reusability in the definition of metadata Referenced metadata
Support to: metadata banks (Questions, Variables, Codelists, Concepts,...) statistical metadata driven processes survey lifecycle statistical information discovery and documentation multilanguage approach in documenting metadata
25
Statistics and DDI in 60 seconds
Study using Survey Instruments made up of measures about Concepts Questions Universes
26
Statistics and DDI in 60 seconds
with values of Categories/ Codes, Numbers Variables Questions Dimensions Measures and attributes collect used for made up of Used for resulting in used for N-Cubes Data Files Responses
27
History Concept of DDI and definition of needs grew out of the data archival community Established in 1995 Members: Social Science Data Archives (US, Canada, Europe) Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance Membership based alliance Formalized development procedures
28
DDI-C and DDI-L DDI has 2 development lines
DDI Codebook (DDI-C) DDI Lifecycle (DDI-L) Both lines will continue to be improved DDI-C focusing just on single study codebook structures DDI-L focusing on a more inclusive lifecycle model and support for machine actionability
29
Early DDI: Characteristics of DDI-C
Focuses on the static object of a codebook Designed for limited uses End user data discovery via the variable or high level study identification (bibliographic) Only heavily structured content relates to information used to drive statistical analysis Coverage is focused on single study, single data file, simple survey and aggregate data files Variable contains majority of information (question, categories, data typing, physical storage information, statistics)
30
Limitations in DDI-C Treated as an “add on” to the data collection process Focus is on the data end product and end users (static) Limited tools for creation or exploitation The Variable must exist before metadata can be created Producers hesitant to take up DDI creation because it is a cost and does not support their development or collection process
31
DDI-L: Designed for Modern Metadata Systems
DDI-L was designed to meet a broad set of requirements typical of modern practices for metadata management and use These practices involve: Centralization of metadata systems (registries, repositories) Emphasis on reuse of metadata for consistency and quality Leveraging metadata assets using “metadata-driven” systems and processes
32
DDI-L: model From DDI Alliance
33
Types of metadata in DDI-L
Metadatatypes: Concepts (“terms”) Studies (“surveys”, “collections”, “data sets”, “samples”, “censuses”, “trials”, “experiments”, etc.) Variables – instance, represented and conceptual (“data elements”, “columns”) Codes & categories (“classifications”, “codelists”) Universes (“populations”, “samples”) N-Cube (“cubes”, “matrices”) Data files (“data sets”, “databases”) For questionnaires Survey instruments (“questionnaire”, “form”) Questions (“observations”) Responses
34
Identification, versioning and maintainability
Identification and versioning a prerequisite for active use and reuse of metadata DDI (and SDMX) follow ISO 11179 All items has a global unique identifier composed of 1) Agency Identifier 2) Item Identifier and 3) Item Version E.g. Codelist: Agency: ‘dk.dst’ a a GUID (Global Unique Identifier) and a version
35
DDI in Colectica - a Glance
36
Use cases Use case study 1: How to build metadata for a simple questionnaire Use case study 2: How build metadata for a unit dataset Use case study 3: How to build metadata and data for an aggregated dataset using N-cube Use case study 4: How metadata can be used to support work on quality-reporting
37
Questions and discussion
38
Use case 1: metadata for implementing a questionnaire
Credits: input from meeting at Eurostat July 2014 and part of presentations by Bryan Fitzpatrick and by Colectica
39
Why use metadata for questionnaires?
Define metadata once Generate documentation PDF, Word, HTML Populate CAI systems Out of the box: Blaise, CASES, CSPro, RedCAP, queXML Custom systems: possible with addins
40
Using DDI Metadata for Questionnaires
DDI has metadata for Questions a simple question goes in a Question Item What is your age in years? a complex question goes in a Multiple Question Item Did you do paid work last week? Full Time or Part Time? How many hours? A Multiple Question Item can contain Question Items or other Multiple Question Items
41
Using DDI Metadata for Questionnaires
Questions can link to one or more Concepts to indicate what the question is seeking to cover Age, Sex, Country, Income, Occupation, ... perhaps to qualify what is being covered E.g. Non-farm income, Tertiary qualifications
42
Using DDI Metadata for Questionnaires
Questions have: Name just a multi-lingual name, not used in questionnaires Text the question that is asked can be conditional, multi-lingual, formatted can even have mixed language Question Intent some elaboration about what is being sought multi-lingual, formatted
43
Using DDI Metadata for Questionnaires
Questions have Response Domains what sort of answer is expected or valid Numeric domain can specify integer of decimal, valid formats and ranges, etc Text domain can specify format, length Category Domain valid list of multi-lingual values not really very much use Code Domain valid list of multi-lingual values with codes a classification
44
Using DDI Metadata for Questionnaires
Questions do not go directly into a questionnaire DDI calls a questionnaire an Instrument questions constitute a library available for use a “Question Bank” questions are selected and assembled into an Instrument the assembling of questions is done with Control Constructs an Instrument identifies a single Control Construct that builds the questionnaire
45
Control constructs Control Constructs are the critical component in building a questionnaire they select the questions they control the flow of the questions branching and looping they insert non-question text “Now I want to ask you about other people in the household” they can compute values they link to Interviewer Instructions structured DDI Interviewer Instructions unstructured external interviewer instructions material
46
Control constructs Several types of Control Constructs
Question Construct selects a Question Item or Multiple Question Item Sequence selects a sequence of other control constructs of any type If-Then-Else defines an If condition with optional ElseIf clauses (multiple) and optional Else clause each condition selects a single Control Construct to include
47
Control constructs Several types of Control Constructs
Loop, Repeat-Until, Repeat-While E.g. to loop over people in a household Statement Item inserts non-question multi-lingual text (conditional, formatted) Computation Item a calculation in some language that is assigned to a Variable
48
Instrument Identifies a single Control Construct to assemble the questionnaire probably a Sequence construct Instruments can have multiple Software specifications basically just identifying “software” used with instrument Colectica: generate code for Blaise, Redcap etc
49
Interviewer instructions
A formal DDI metadata type Organised, structured instructions formatted multi-lingual text may be conditional May link to external, non-DDI material E.g. PDF, Word documents
50
Interviewer instructions
A formal DDI metadata type Organised, structured instructions formatted multi-lingual text may be conditional May link to external, non-DDI material E.g. PDF, Word documents
51
Questionnaire template
*from UNECE
52
DDI modelling in practice: study unit
*from UNECE
53
DDI questionnaire modelling in practice: resource package
*from UNECE
54
DDI questionnaire modelling in practice: module and submodule
*from UNECE
55
DDI questionnaire modelling in practice: statements
Comment Instruction *from UNECE
56
DDI questionnaire modelling in practice: statements
Help Warning *from UNECE
57
DDI modelling in practice: statements
Conditional statement *from UNECE
58
DDI modelling in practice: questions with a single response domain
*from UNECE
59
DDI modelling in practice: questions with a multiple response domain
*from UNECE
60
DDI modelling in practice: questions with a single choice
*from UNECE
61
Steps for creating and publishing a questionnaire
Create check-out and go to metadata package Define concepts (i.e. Gender, Age, Education level and Schooltype) Define categories and codes (used as response domains) Create questions and insert reference to response domains Create instrument and insert defined questions in a simple sequence Connect questions to concepts Generate documentation (for survey designer etc) Show in portal Publish survey: Paper form, Blaise etc
62
Use case study 2: metadata for unit dataset
Credits: input from presentations by Colectica from European DDI conference, Copenhagen, 2015
63
Variable cascade in GSIM, DDI and Colectica
ConceptualVariable RepresentedVariable Variable Variable
64
Selected elements from DDI
65
Logical record A Logical record consists of a sequence of Variables that groups data values for a purpose Data from a questionnaire goes into one or more Logical Records. Logical Records can be linked. E.g. Households and Persons Logical Records are independent of any storage or stored format
66
Physical Instance Holds information about actual data sets produced
links to Physical Structures, Record Layouts, and Logical records provides a central management of data from a collection Physical Instance used to manage data
67
Simple classifications and code lists
DDI holds Classifications as linked Code Schemes and Category Schemes a Category Scheme is a list of Categories flat list of multi-lingual names and descriptions e.g., Country names, Occupation names, etc a Code Schemes selects Categories from Category Schemes, assigns a Code (not multi-lingual), and may specify a hierarchy a Code Scheme may select Categories from multiple Category Schemes multiple Code Schemes may select the same Categories
68
GSIM compliant classification
The GSIM Classification model was drawn from the terminology in the Neuchâtel model In 2012, the first GSIM model including classifications was released. Version 1.0. In December 2013, a version 1.1 update to GSIM was released The Neuchâtel model is now an annex to the GSIM model, and released with it
69
Codebook example
70
Codebook example
71
Study description
72
Study Description: NESSTAR Publisher
73
Study Description: NESSTAR Publisher
74
Study Description: NESSTAR Publisher
75
File Description: Variables groups
76
File Description: Variables groups - NESSTAR Publisher
77
Variable description
78
Variable description NESSTAR Publisher
79
Use case 3: DDI used for cubes
Credits: input from presentations by Colectica from European DDI conference, London, 2014
80
3 Dimensional NCube
81
2 Dimensional NCube
82
Properties of an aggregate
Dimensions Measures Attributes Can append footnotes to the aggregate Attach to the overall structure or to individual cells or to groups of cells
83
Ncubes and variables NCube: re-usable definition of an aggregate structure Dimensions ordered list of Variable references Measures List of measures for each intersection of Dimensions Variable reference Type (count, %, mean, etc.) Attributes Attributes that are applicable to re-usable NCube definition
84
Use case 4: exchange of reference metadata
85
Single Integrated Metadata Structure (SIMS) and reporting formats: ESMS and ESQRS
86
Single Integrated Metadata Structure (SIMS) and reporting formats: ESMS and ESQRS
ESMS: European SDMX Metadata Structure Oriented towards Users ESQRS: European standard for Quality Report Structure
87
Where do I find more information about DDI?
DDI-alliance ( Specification: find user-guide, technical documentation guide and online-field documentation and more (both on DDI-L and DDI-C) Tools: find tools searching by purpose, DDI version and availability Training: find use-cases, glossary etc Colectica support ( Find information about colectica tools, how to manage content in Colectica Designer etc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.