Wendy Thomas November 28-29, 2012

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
/ /17 32/ / /
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
Chapter 7 System Models.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
Addition and Subtraction Equations
Workshop on Metadata Standards and Best Practices November th, 2007 Session 4 The Data Documentation Initiative Technical Overview Pascal Heus Open.
Disability status in Ethiopia in 1984, 1994 & 2007 population and housing sensus Ehete Bekele Seyoum ESA/STAT/AC.219/25.
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
1 When you see… Find the zeros You think…. 2 To find the zeros...
Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Summative Math Test Algebra (28%) Geometry (29%)
ASCII stands for American Standard Code for Information Interchange
1 Making Changes to Existing Name and Work/Expression Authority Records Module 7. Making Changes to Existing Name and Work/Expression Authority Records.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
The basics for simulations
© 2010 Concept Systems, Inc.1 Concept Mapping Methodology: An Example.
MM4A6c: Apply the law of sines and the law of cosines.
2002 Prentice Hall, Inc. All rights reserved. Outline 25.1Introduction 25.2Basic HTML Tables 25.3Intermediate HTML Tables and Formatting 25.4Basic HTML.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Copyright © [2002]. Roger L. Costello. All Rights Reserved. 1 XML Schemas Reference Manual Roger L. Costello XML Technologies Course.
Progressive Aerobic Cardiovascular Endurance Run
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
School Census Spring 2011 Application Version
PRU Census 2011 Application Version PRU Census 2011 Open the Application 2.
1 GIS Maps and Tax Roll Submission. 2 Exporting A New Shapefile.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
Static Equilibrium; Elasticity and Fracture
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Doc.: IEEE /0333r2 Submission July 2014 TGaj Editor Report for CC12 Jiamin Chen, HuaweiSlide 1 Date: Author:
CDI and SIM Section EDB Nov & Dec Programme Description Objectives Helpdesk Gentle Reminder Students’ Access to SLP Module SLP Module – JUPAS.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
© Copyright 2011 John Wiley & Sons, Inc.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
[Meta]-Data Management using DDI
Reusable!? Or why DDI 3.0 contains a recycling bin.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
 Name and organization  Have you worked with DDI before? (2 or 3)  If not, are you familiar with XML?  What kind of CAI systems do you use?  Goals.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
Just Enough DDI 3 for the “DDI: Managing Metadata for Longitudinal Data — Best Practices”
DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas.
TIC Updates EDDI 2010 Wendy Thomas – 6 Dec Schedule and Process Changes Production schedule is moving to: – Summer / Winter release schedule January.
Wendy Thomas NADDI 2012 University of Kansas
Presentation transcript:

Wendy Thomas November 28-29, 2012 DDI TRAINING Workshop Wendy Thomas November 28-29, 2012

Overview of Workshop – Day 1 DDI Use Cases Identification, Versioning, and Referencing Modules (structural overview) Questionnaire content and layout Concepts, Variables, Logical Record, Physical Store Use of DDI within a research process Use of DDI within a archival/management system

Overview of Workshop – Day 2 SND Issue areas / information and discussion Geography and DDI DDI 3.1 changes and the future of DDI-L Tools and resources The status of DDI-RDF

Credits Unspecified slides - Wendy Thomas (MPC) DDI in 60 Seconds – Arofan Gregory, (ODaF) OAIS diagram - Herve L’Hours (UKDA) Remainder of Slides (source indicator in upper left): The slides were developed for several DDI workshops at IASSIST conferences and at GESIS training in Dagstuhl/Germany Major contributors Wendy Thomas, Minnesota Population Center Arofan Gregory, Open Data Foundation Further contributors Joachim Wackerow, GESIS – Leibniz Institute for the Social Sciences Pascal Heus, Open Data Foundation Attribute: http://creativecommons.org/licenses/by-sa/3.0/legalcode

S01 License Details on next slide.

S01 License (cont.) Ask about doing the XML tutorial at this point and then fit into today’s schedule if needed On-line available at: http://creativecommons.org/licenses/by-sa/3.0/ This is a human-readable summary of the Legal Code at: http://creativecommons.org/licenses/by-sa/3.0/legalcode 6

S03 DDI-L Lifecycle Model Metadata Reuse 7

Learn DDI-L in 60 Seconds

using Survey Study Instruments made up of measures about Concepts Questions Universes Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Categories/ with values of Codes, Numbers Questions Variables collect made up of Responses Data Files resulting in Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

That’s Pretty Much It.

Studies Concepts Variables Categories Concepts Codes Summary Statistics Physical Location

DDI-L Use Cases S06 Learning DDI: Pack S06 Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Archival Ingestion and Metadata Value-Add This use case concerns how DDI 3 can support the ingest and migration functions of data archives and data libraries. 14

of processing if good DDI metadata is captured upstream Dissemination Systems Supports automation of processing if good DDI metadata is captured upstream Provides a neutral format for data migration as analysis packages are versioned <DDI 3> [Full meta- data set] (?) Data Archive Data Library Ingest Processing + Microdata/ Aggregates Can package Data and metadata for preservation purposes – populate other standard formats <DDI 3> [Full or additional metadata] Archival events Preservation Systems Provides good format & foundation for value- added metadata by archive 15

<g:LocalHoldingPackage> S06 <g:LocalHoldingPackage> <s:StudyUnit> with full content OR <g:Group> <s:StudyUnit> new value added content <a:Archive> <a:LifeCycleEvents> capture ingest processing events +

Data Dissemination/Data Discovery This use case concerns how DDI-L can support the discovery and dissemination of data. 17

+ Rich metadata supports <DDI-L> auto-generation of websites, packages of specific, related materials, and other delivery formats and applications <DDI-L> Can add archival events meta-data Codebooks <DDI-L> [Full meta- data set] Websites + Databases, repositories Research Data Centers Microdata/ Aggregates Data-Specific Info Access Systems Registries Catalogues Question/Concept/ Variable Banks 18

Store as separate resources <c:ConceptScheme> <c:UniverseScheme> <c:GeographicStructureScheme> <c:GeographicLocationScheme> <d:QuestionScheme> <d:ControlConstructScheme> <l:VariableScheme> <l:CategoryScheme> <l:CodeScheme> <p:PhysicalStructureScheme> <p:RecordLayoutScheme> <a:OrganizationScheme> <s:StudyUnit> [descriptive content] Store as separate resources Use content to feed a different registry structure

Question/Concept/Variable Banks This use case describes how DDI 3 can support question, concept, and variable banks. These are often termed “registries” or “metadata repositories” because they contain only metadata – links to the data are optional, but provide implied comparability. The focus is metadata reuse. 20

Question Bank <DDI 3> Questions Flow Logic Codings <DDI 3> Because DDI has links, each type of bank functions in a modular, complementary way. Question Bank <DDI 3> Questions Flow Logic Codings <DDI 3> Questions Flow Logic Codings Users and Applications Variable Bank <DDI 3> Variables Categories Codes <DDI 3> Variables Categories Codes Users and Applications <DDI 3> Concepts <DDI 3> Concepts Users and Applications Concept Bank Supports but does not require ISO 11179 21

<g:ResourcePackage> Question Bank <d:QuestionScheme> <d:ControlConstructScheme> Variable Bank <l:CategoryScheme> <l:CodeScheme> <l:VariableScheme> Concept Bank <c:ConceptScheme>

Questionnaire Generation, Data Collection, and Processing This use case concerns how DDI 3 can support the creation of various types of questionnaires/CAI, and the collection and processing of raw data into microdata. 23

+ + Types of Metadata: Concepts (conceptual module) Universe (conceptual module) Questions (datacollection module) Flow Logic (datacollection module) Variables (logicalproduct module) Categories/Codes (logicalproduct module) Coding (datacollection module) Paper Questionnaire <DDI 3> Concepts Universes Questions Flow Logic Online Survey Instrument Final CAI Instrument Raw Data Microdata DDI captures the content – XML allows for each application to do its own presentation <DDI 3> Concepts Universes Questions Flow Logic <DDI 3> Variables Coding <DDI 3> Categories Codes Physical Data Product Physical Instance + + 24

S06 studyunit.xsd conceptualcomponent.xsd datacollection.xsd logicalproduct.xsd physicaldataproduct.xsd physicalinstance.xsd Previous structure PLUS <l:LogicalProduct> <l:DataRelationship> <l:VariableScheme> <p:PhysicalDataProduct> <p:PhysicalStructureScheme> <p:RecordLayoutScheme> <pi:PhysicalInstance>

DDI For Use within a Research Project This use case concerns how DDI-L can support various functions within a research project, from the conception of the study through collection and publication of the resulting data. 26

$ € £ + + + + Prinicpal Investigator Research Staff Collaborators <DDI-L> Variables Physical Stores <DDI-L> Questions Instrument + <DDI-L> Concepts Universe Methods Purpose People/Orgs <DDI-L> Funding Revisions + + + <DDI-L> Data Collection Data Processing $ € £ Data Archive/ Repository Submitted Proposal Publication Presentations + 27

Version 1.0.0 Preparing the proposal for funding <s:StudyUnit> <s:Abstract> <s:Purpose> <r:FundingInformation> <c:ConceptualComponents> <c:Concepts> <c:Universe> <d:DataCollection> <d:Methodology> <d:QuestionScheme> <d:ControlConstructScheme> <l:LogicalProduct> <l:DataRelationship> <l:CategoryScheme> <l:CodeScheme> <l:VariableScheme> <p:PhysicalDataProduct> <pi:PhysicalInstance> <a:Archive> <a:OrganizationScheme>

Version 1.0.0 Preparing the proposal for funding Version 1.1.0 Entering funding information and revising/versioning earlier content <s:StudyUnit> <s:Abstract> <s:Purpose> <r:FundingInformation> <c:ConceptualComponents> <c:Concepts> <c:Universe> <d:DataCollection> <d:Methodology> <d:QuestionScheme> <d:ControlConstructScheme> <l:LogicalProduct> <l:DataRelationship> <l:CategoryScheme> <l:CodeScheme> <l:VariableScheme> <p:PhysicalDataProduct> <pi:PhysicalInstance> <a:Archive> <a:OrganizationScheme>

Version 1.0.0 Preparing the proposal for funding Version 1.1.0 Entering funding information and revising/versioning earlier content Version 2.0.0 Preparing for data collection <s:StudyUnit> <s:Abstract> <s:Purpose> <r:FundingInformation> <c:ConceptualComponents> <c:Concepts> <c:Universe> <d:DataCollection> <d:Methodology> <d:QuestionScheme> <d:ControlConstructScheme> <l:LogicalProduct> <l:DataRelationship> <l:CategoryScheme> <l:CodeScheme> <l:VariableScheme> <p:PhysicalDataProduct> <pi:PhysicalInstance> <a:Archive> <a:OrganizationScheme>

Version 1.0.0 Preparing the proposal for funding Version 1.1.0 Entering funding information and revising/versioning earlier content Version 2.0.0 Preparing for data collection Version 3.0.0 Completing the study and preparing the data <s:StudyUnit> <s:Abstract> <s:Purpose> <r:FundingInformation> <c:ConceptualComponents> <c:Concepts> <c:Universe> <d:DataCollection> <d:Methodology> <d:QuestionScheme> <d:ControlConstructScheme> <l:LogicalProduct> <l:DataRelationship> <l:CategoryScheme> <l:CodeScheme> <l:VariableScheme> <p:PhysicalDataProduct> <pi:PhysicalInstance> <a:Archive> <a:OrganizationScheme>

Metadata Mining for Comparison, etc. This use case concerns how collections of DDI-L metadata can act as a resource to be explored, providing further insight into the comparability and other features of a collection of data to help researchers identify data sets for re-use. 32

? Questions Types of Metadata Universe (conceputualcomponent module) Concept (conceputualcomponent module) Question (datacollection module) Variable (logicalproduct module) Variable Metadata Repositories/ Registries Concepts Universe <DDI-L> Instances <DDI-L> Comparison Questions Categories Codes Variables Universe Concepts Recodes Harmonizations ? Data Sets 33

Register/Administrative Data This use case concerns how DDI-L can support the retrieval, organization, presentation, and dissemination of register data 34

Generation Instruction (data collection module) Lifecycle Events (Archive module) Other Data Collection Register/ Administrative Data Store Query/ Request Processing (Data Collection module) Register Admin. Data File Variables, Categories, Codes, Concepts, Etc. Integrated Data Set Comparison/mapping (Comparison module) [Lifecycle continues normally]

<cm:Comparison> <s:StudyUnitReference> <s:StudyUnit> <g:Group> <cm:Comparison> <s:StudyUnitReference> <s:StudyUnit> <d:DataCollection> <d:Methodology> <d:ProcessEvent> <l:LogicalProduct> <l:DataRelationship> <l:VariableScheme> <p:PhyscialDataProduct> <pi:PhyscialInstance> Emphasis is on the process of collection May include NCube Logical Product If data is obtained from multiple studies, Group and comparison may be used

Implementing GSBPM Content This use case concerns the use of DDI as an underlying model within GSBPM and how DDI can be used to implement the model

The Generic Staistical Business Process Model (GSBPM) The METIS group is a part of UN/ECE which addresses metadata issues for national statistical agencies (and other producers of official statistics) This community uses both SDMX and DDI They have produced a reference model of the statistical production process The DDI 3 Lifecycle Model was a major input GSBPM has a much greater level of detail

S20

Getting into the details Some technical basics Identification, Versioning, and Reference Overall structures for organizing and packaging metadata Modules and Schemes Data capture Questionnaire structure Data description and storage Concepts, Variables, Records, Data files (physical stores) View from the bottom up

Identification, Versioning and Reference

S08 Rationale Because several organizations are involved in the creation of a set of metadata throughout the lifecycle flow: Rules for maintenance, versioning, and identification must be universal Reference to other organization’s metadata is necessary for re-use – and very common

S08 Maintenance Rules A maintenance agency is identified by a reserved code based on its domain name (similar to it’s website and e-mail) There is a register of DDI agency identifiers which we will look at later in the course Maintenance agencies own the objects they maintain Only they are allowed to change or version the objects Other organizations may reference external items in their own schemes, but may not change those items You can make a copy which you change and maintain, but once you do that, you own it!

S08 Versioning Rules If a “published” object changes in any way, its version changes This will change the version of any containing maintainable object Typically, objects grow and are versioned as they move through the lifecycle Versionables inherit their agency from the maintainable object they live in at the time of origin

Versioning: Changes ConceptScheme X V 1.0.0 Concept A v 1.0.0 Concept B v 1.0.0 Concept C v 1.0.0 ConceptScheme X V 1.1.0 Concept A v 1.1.0 Concept B v 1.0.0 Concept C v 1.1.0 Add: Concept D v 1.0.0 ConceptScheme X V 2.0.0 Concept A v 1.2.0 Concept B v 1.0.0 Concept C v 1.2.0 Concept D v 1.1.0 Add: Concept E v 1.0.0 references references references ConceptScheme X V 3.0.0 Concept D v 1.1.0 Concept E v 1.0.0 Note: You can also reference entire schemes and make additions references

S08 Identifiable Rules Identifiers are assigned to each identifiable object, and are unique within their maintainable parent Identifiable objects inherit their version from their containing versionable parent (if any) at their time of origin Identifiable objects inherit their maintaining agency from the maintainable object they live in at the time of origin

Maintainable, Versionable, and Identifiable DDI 3 places and emphasis on re-use This creates lots of inclusion by reference! This raises the issue of managing change over time The Maintainable, Versionable, and Identifiable scheme in DDI was created to help deal with these issues An identifiable object is something which can be referenced, because it has an ID A versionable object is something which can be referenced, and which can change over time – it is assigned a version number A maintainable object is something which is maintained by a specified agency, and which is versionable and can be referenced – it is given a maintenance agency 47

Basic Element Types Differences from DDI 1/2 --Every element is NOT identifiable --Many individual elements or complex elements may be versioned --A number of complex elements can be separately maintained 48

S08 DDI 3.1 Identifiers There are two ways to provide identification for a DDI 3 object: Using a set of XML fields Using a specially-structured URN The structured URN approach is preferred URNs are a very common way of assigning a universal, public identifier to information on the Internet However, they require explicit statement of agency, version, and ID information in DDI 3 Providing element fields in DDI 3 allows for much information to be defaulted Agency can be inherited from parent element Version can be inherited or defaulted to “1.0.0” 49

Parts of the Identification Series Identifiable Element Identifier: ID Identifying Agency Version Version Date Version Responsibility Version Rationale UserID Object Source Variable Identifier: V1 us.mpc 1.1.0 [default is 1.0.0] 2007-02-10 Wendy Thomas Spelling correction 50

S08 URN Detailed Example This is a URN From DDI In a variable scheme The scheme agency is us.mpc urn=“urn:ddi:us.mpc:VariableScheme. VarSch01.1.4.0:Variable.V1.1.1.0” For a variable With identifier VarSch01 Version 1.1.0 Version 1.4.0 Variable ID is V1 51

Referencing When referencing an object, you must provide: The maintenance agency The identifier The version Often, these are inherited from a maintainable object This is part of their identification

S08 DDI References References in DDI may be within a single instance or across instances Metadata can be re-packaged into many different groups and instances “Internal” references are made to objects in the same instance “External” reference are made to objects in other DDI instances Identifiers must provide: The containing maintainable (a module or a scheme) Agency, ID, and Version The identifiable/versionable object ID (and version if versionable) Like identifiers, DDI references may be made using URNs or element fields 53

Reference Examples Internal <VariableReference isReference=“true” isExternal=“false” lateBound=“false”> <Scheme isReference=“true” isExternal=“false” lateBound=“false”> <ID>VarSch01</ID> <IdenftifyingAgency>us.mpc</IdentifyingAgency> <Version>1.4.0</Version> </Scheme> <ID>V1</ID> <Version>1.1.0</Version> </VariableReference>

S08 Reference Examples External <VariableReference isReference=“true” isExternal=“true” lateBound=“false”> <urn>urn:ddi:us.mpc:VariableScheme.VarSch01.1.4.0:Variable.V1.1.1.0</urn> </VariableReference>

DDI XML Schemas and Main Structures Learning DDI: Pack S09 Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

DDI-L Main Structures and Concepts XML Schemas DDI Modules DDI Schemes DDI Profiles A Simple Example

XML Schemas, DDI Modules, and DDI Schemes <file>.xsd XML Schemas DDI Modules May Correspond DDI Schemes May Contain Correspond to a stage in the lifecycle

XML Schemas archive comparative conceptualcomponent datacollection dataset dcelements DDIprofile ddi-xhtml11 ddi-xhtml11-model-1 ddi-xhtml11-modules-1 group inline_ncube_recordlayout instance logicalproduct ncube_recordlayout physicaldataproduct physicalinstance proprietary_record_layout reusable simpledc20021212 studyunit tabular_ncube_recordlayout xml set of xml schemas to support xhtml 59

Reminder: DDI Modules and Schemes DDI has two important structures: “Modules” “Schemes” A module is a package of metadata corresponding to a stage of the lifecycle or a specific structural function A scheme is a list of reusable metadata items of a specific type Many DDI modules contain DDI schemes

XML Schemas, DDI Modules, and DDI Schemes Instance Study Unit Physical Instance DDI Profile Comparative Data Collection Logical Product Physical Data Structure Archive Conceptual Component Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset

XML Schemas, DDI Modules, and DDI Schemes Instance Study Unit Physical Instance DDI Profile Comparative Data Collection Logical Product Physical Data Structure Archive Conceptual Component Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset

XML Schemas, DDI Modules, and DDI Schemes Instance Study Unit Physical Instance DDI Profile Comparative Data Collection Question Scheme Control Construct Scheme Interviewer Instruction Scheme Logical Product Category Scheme Code Scheme Variable Scheme NCube Scheme Physical Data Structure Physical Structure Scheme Record Layout Scheme Archive Organization Scheme Conceptual Component Concept Scheme Universe Scheme Geographic Structure Scheme Geographic Location Scheme Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset

S09 Why Schemes? You could ask “Why do we have all these annoying schemes in DDI?” There is a simple answer: reuse! DDI-L supports the concept of metadata registries (e.g., question banks, variable banks) DDI-L also needs to show specifically where something is reused Including metadata by reference helps avoid error and confusion Reuse is explicit

Packaging structures

Translation Information DDI Instance Citation Coverage Other Material / Notes Translation Information Study Unit Group 3.1 Local Holding Package Resource Package

Study Unit Citation / Series Statement Abstract / Purpose Coverage / Universe / Analysis Unit / Kind of Data Other Material / Notes Funding Information / Embargo Conceptual Components Data Collection Logical Product Physical Data Product Archive DDI Profile Physical Instance

Group Citation / Series Statement Abstract / Purpose Coverage / Universe Other Material / Notes Funding Information / Embargo Conceptual Components Data Collection Logical Product Physical Data Product DDI Profile Sub Group Study Unit Comparison Archive

Resource Package Citation / Series Statement Abstract / Purpose Coverage / Universe Other Material / Notes Funding Information / Embargo Any Scheme: Organization Concept Universe Geographic Structure Geographic Location Question Interviewer Instruction Control Construct Category Code Variable NCube Physical Structure Record Layout Any module EXCEPT Study Unit, Group Or Local Holding Package

Local Holding Package (3.1 and later) S04 Local Holding Package (3.1 and later) Citation / Series Statement Abstract / Purpose Coverage / Universe Other Material / Notes Funding Information / Embargo Local Added Content: [This contains all content available in a Study Unit whose source is the local archive.] Depository Study Unit OR Group Reference: [A reference to the stored version of the deposited study unit.]

Study Unit Study Unit Identification Coverage Conceptual Components Topical Temporal Spatial Conceptual Components Universe Concept Representation (optional replication) Purpose, Abstract, Proposal, Funding Identification is mapped to Dublin Core and basic Dublin Core is included as an option Geographic coverage mapped to FGDC / ISO 19115 bounding box spatial object polygon description of levels and identifiers Universe Scheme, Concept Scheme link of concept, universe, representation through Variable also allows storage as a ISO/IEC 11179 compliant registry 71

S04 Archive An archive is whatever organization or individual has current control over the metadata Contains persistent lifecycle events Contains archive specific information local identification local access constraints 72

Data Collection Methodology Question Scheme Instrument Response domain Instrument using Control Construct Scheme Coding Instructions question to raw data raw data to public file Interviewer Instructions Question and Response Domain designed to support question banks Question Scheme is a maintainable object Organization and flow of questions into Instrument Used to drive systems like CASES and Blaise Coding Instructions Reuse by Questions, Variables, and comparison 73

Logical Product Category Schemes Coding Schemes Variables NCubes Variable and NCube Groups Data Relationships Categories are used as both question response domains and by code schemes Codes are used as both question response domains and variable representations Link representations to concepts and universes through references Built from variables (dimensions and attributes) Map directly to SDMX structures More generalized to accommodate legacy data 74

Physical storage Physical Data Structure Physical Instance Links to Data Relationships Links to Variable or NCube Coordinate Description of physical storage structure in-line, fixed, delimited or proprietary Physical Instance One-to-one relationship with a data file Coverage constraints Variable and category statistics 75

Group Resource Package Allows packaging of any maintainable item as a resource item Group Up-front design of groups – allows inheritance Ad hoc (“after-the-fact”) groups – explicit comparison using comparison maps for Universe, Concept, Question, Variable, Category, and Code Local Holding Package Allows attachment of local information to a deposited study without changing the version of the study unit itself 76

DDI Lifecycle Model and Related Modules Groups and Resource Packages are a means of publishing any portion or combination of sections of the life cycle Local Holding Package Physical Data Product Logical Product Study Unit Data Collection Physical Instance Archive 77

Building from Component Parts UniverseScheme CategoryScheme NCube Scheme CodeScheme ConceptScheme Variable Scheme QuestionScheme RecordLayout Scheme [Physical Location] ControlConstructScheme Instrument LogicalRecord PhysicalInstance 78

Study Unit Example: Schematic Conceptual component Logical product Physical data product Concepts Variables Record Layout Universes Codes Physical instance Categories Data collection Category Stats Questions 79

S09 DDI’s “Meta-Module” One module is unlike all of the others in DDI – the DDI Profile This is a “meta-module” – it talks about how the DDI-L is being used by a specific application or organization

S09 DDI Profiles The DDI Profile module lets you describe which fields you use in your institution’s flavor of DDI It is useful for performing machine validation of received instances It is useful documentation for human users You provide a set of information for each element allowed in a complete DDI instance If it is used or not used If optional fields (per the XML schema) are required Provides the ability to describe DDI Templates Element AlternateName, Description and Instructions Required, default, fixed values 81

<pr:DDIProfile xmlns="ddi:profile:3_1" id="DDIProfileSTUDYNO"> <pr:XPathVersion>1.0</pr:XPathVersion> <pr:DDINamespace>3.1</pr:DDINamespace> <pr:XMLPrefixMap> <pr:XMLPrefix>s</pr:XMLPrefix> <pr:XMLNamespace>ddi:studyunit:3_1<pr:/XMLNamespace> </pr:XMLPrefixMap> <pr:Used path="/DDIInstance/VersionResponsibility"/> <pr:Used path="/DDIInstance/Citation/Title“/> <pr:Used path="/DDIInstance/Citation/Creator" required="true" > <pr:AlternateName>Author</pr:AlternateName> <pr:Used path="/DDIInstance/StudyUnit/Citation/Title"/> ..... <pr:NotUsed path="/DDIInstance/StudyUnit/FundingInformation"/> </pr:DDIProfile>

Content details Questionnaire content and design Breaking up content into its component parts Separating processes that occur at different points in the lifecycle Sharing common components between different points and objects within the lifecycle Data Dictionary basics Conceptual components Variables Organization of variables into records Physical data stores A quick look from the bottom up

Questions and Instruments DDI 3 separates the questions which make up a survey instrument from the survey instrument itself Questions can be re-used! There are several different types of question text Many of these are the normal string types found throughout DDI 3

Questionnaires Questions Statements Instructions Question Flow Question Text Response Domains Statements Pre- Post-question text Instructions Routing information Explanatory materials Question Flow 85

Simple Questionnaire Please answer the following: 1. Sex (1) Male (2) Female 2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4) 3. How old are you? ______ 4. Who do you live with? __________________ 5. What type of school do you attend? (1) Public school (2) Private school (3) Do not attend school 86

Simple Questionnaire Please answer the following: 1. Sex (1) Male (2) Female 2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4) 3. How old are you? ______ 4. Who do you live with? __________________ 5. What type of school do you attend? (1) Public school (2) Private school (3) Do not attend school Questions 87

Simple Questionnaire Please answer the following: 1. Sex (1) Male (2) Female 2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4) 3. How old are you? ______ 4. Who do you live with? __________________ 5. What type of school do you attend? (1) Public school (2) Private school (3) Do not attend school Questions Response Domains Code Numeric Text 88

Representing Response Domains There are many types of response domains Many questions have categories/codes as answers Textual responses are common Numeric responses are common Other response domains are also available in DDI 3 (time, mixed responses)

Category and Code Domains Use CategoryDomain when NO codes are provided for the category response [ ] Yes [ ] No Use CodeDomain when codes are provided on the questionnaire itself 1. Yes 2. No 90

Category Schemes and Code Schemes Use the same structure as variables Create the category scheme or schemes first (do not duplicate categories) Create the code schemes using the categories A category can be in more than one code scheme A category can have different codes in each code scheme 91

Numeric and Text Domains Numeric Domain provides information on the range of acceptable numbers that can be entered as a response Text domains generally indicate the maximum length of the response and can limit allowed content using a regular expression Additional specialized domains such as DateTime are also available Structured Mixed Response domain allows for multiple response domains and statements within a single question, when multiple response types are required 92

Simple Questionnaire Please answer the following: 1. Sex (1) Male (2) Female 2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4) 3. How old are you? ______ 4. Who do you live with? __________________ 5. What type of school do you attend? (1) Public school (2) Private school (3) Do not attend school Questions Response Domains Code Numeric Text Statements 93

Simple Questionnaire Please answer the following: 1. Sex (1) Male (2) Female 2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4) 3. How old are you? ______ 4. Who do you live with? __________________ 5. What type of school do you attend? (1) Public school (2) Private school (3) Do not attend school Questions Response Domains Code Numeric Text Statements Instructions 94

Simple Questionnaire Please answer the following: 1. Sex (1) Male (2) Female 2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4) 3. How old are you? ______ 4. Who do you live with? __________________ 5. What type of school do you attend? (1) Public school (2) Private school (3) Do not attend school Questions Response Domains Code Numeric Text Statements Instructions Flow Skip Q3 95

Is Q2 = 0 (yes) Question 1 Question 2 Statement 1 No Yes Question 3 96

Approach to Survey Analysis Identify Question Text Statements Instructions or informative materials Response Domains (by type) Determine the universe structure and concepts Walk through the flow logic 97

Completing Question Items Create CodeSchemes reusing common categories Determine range for NumericDomains Determine maximum length of TextDomains Write up control constructs (easiest is to list all QuestionConstruct, all Statement Items) 98

Example: Reusing Categories Full list of all categories: Shorter list of reusable categories: Yes No Don’t know Yes, always Sometimes Some do, some don’t Not to my knowledge Never – I don’t let them Never – I don’t have a television Yes No Don’t know Yes, always Sometimes Some do, some don’t Not to my knowledge Never – I don’t let them Never – I don’t have a television BECOMES 99

Flow Logic Master Sequence Question and statement order Every instrument has one top-level sequence Question and statement order Routing – IfThenElse (see next slide) After Statement 2 (all respondents read this) After Q2 Else goes to statement After Q5 Else goes back to a sequence 100

IfThenElse 1 Else SI 1 Q1 SI 2 end Then IfThenElse 2 Q2 Else SI 3 Then 101

Example: Master Sequence Statement 1 Question 1 Statement 2 IFThenElse 1 Then Sequence 1 Question 2 IFThenElse 2 Then SEQuence 2 Question 3, Question 4, IFThenElse 3, Question 8, Statement 4 [Then SEQuence 3 (Question 6,Question 7)] Else Statement 3 102

Process Items General Coding Instruction Generation Instructions Missing Data (left as blanks) Suppression of confidential information such as name or address Generation Instructions Recodes Review of text answers where items listed as free text result in more than one nominal level variable Create variable for each with 0=no 1=yes Or a count of the number of different items provided by a respondent Aggregation etc. The creation of new variables whose values are programmatically populated (mostly from existing variables) 103

Conceptual Components Conceptual components are defined early in the study process. They are the who, what, where, and when of the study. 104

Difference Between Conceptual Components and Coverage Spatial Coverage Topical Coverage Temporal Coverage For use by the study, organization, community High level search and links to geographic systems High level search and links to broader world of knowledge High level search

S10 Concepts A concept may be structured or unstructured and consists of a Name, a Label, and a Description. A description is needed if you want to support comparison. Concepts are what questions and variables are designed to measure and are normally assigned by the study (organization or investigator).

S10 Universe This is the universe of the study which can combines the who, what, when, and where of the data Census top level universe: “The population and households within Kenya in 2010” Sub-universes: Households, Population, Males, Population between 15 and 64 years of age, …

Universe Structure Hierarchical Makes clear that “Owner Occupied Housing Units” are part of the broader universe “Housing Units” Can be generated from the flow logic of a questionnaire Referenced by variables and question constructs Provides implicit comparability when 2 items reference the same universe

Males, 15 years of age and older in Kenya in 2010 Variable A Population and Housing Units in Kenya in 2010 Housing Units Population Males Persons 15 years and Older Males, 15 years of age and older in Kenya in 2010 Variable A Universe Reference:

New in 3.2 Data Element ISO/IEC 11179-1 Universe Concept Variable OR Question Construct Data Element Concept New in 3.2 Data Element Variable Representation Question Response Domain ISO/IEC 11179-1 International Standard ISO/IEC 11179-1: Information technology – Specification and standardization of data elements – Part 1: Framework for the specification and standardization of data elements Technologies de l’informatin – Spécifiction et normalization des elements de données – Partie 1: Cadre pout la specification et la normalization des elements de données. First edition 1999-12-01 (p26) http://metadata-standards.org/11179-1/ISO-IEC_11179-1_1999_IS_E.pdf

S12 Variables Variables are created as a result of data processing, either from questions or other data collection/harvesting activities. 111

General Variable Components VariableName, Label and Description Links to Concept, Universe, Question, and Embargo information Provides Analysis and Response Unit Provides basic information on its role: isTemporal isGeographic isWeight Describes Representation

Representation Detailed description of the role of the variable References related weights (standard and variable) References all instructions regarding coding and imputation Describes concatenated values Additivity and aggregation method Value representation Specific Missing Value description (proposed DDI 3.2) Can be used in combination with any representation type

S12 Value Representation Provides the following elements/attributes to all representation types: classification level (“nominal”, “ordinal”, “interval”, “ratio”, “continuous”) blankIsMissingValue (“true” “false”) missingValue (expressed as an array of values) These last 2 may be replaced in 3.2 by a missing values representation section Is represented by one of four representation types (numeric, text, code, date time) Additional types are under development (i.e., scales)

S12 Code Representation Code schemes link category labels and content to a code used in the data file Codes can be numeric or text Hierarchies are described by level, completeness, and relationship of items contained in a level

Code Scheme Options Use in its entirety Use only specified levels Use only most discrete items (higher levels are treated as group labels) Use only the specified codes or code range

<l:CodeScheme id=”CS_1”> <l:CategorySchemeReference> <r:ID>CatScheme_1</r:ID> </l:CategorySchemeReference> <l:HierarchyType>Irregular</l:HierarchyType> <l:Level levelNumber=”1”> <l:Name>2 digit code</l:Name> </l:Level> <l:Level levelNumber=”2” > <l:Name>4 digit code</l:Name> .....

<l:Code isDiscrete=”false” levelNumber=”1” > .... <l:Code isDiscrete=”false” levelNumber=”1” > <l:CategoryReference><r:ID>C_1</r:ID></l:CategoryReference> <l:Value>10</l:Value> <l:Code isDiscrete=”true” levelNumber=”2”> <l:CategoryReference><r:ID>C_2</r:ID></l:CategoryReference> <l:Value>1010</l:Value> </l:Code> <l:Code isDiscrete=” true” levelNumber=”2” > <l:CategoryReference><r:ID>C_3</r:ID></l:CategoryReference> <l:Value>1020</l:Value> <l:Code isDiscrete=” true” levelNumber=”1” > <l:CategoryReference><r:ID>C_4</r:ID></l:CategoryReference> <l:Value>20</l:Value> </l:CodeScheme>

S12 Numeric Use for variables where numeric response is self explanatory (e.g., age in years) Continuous or discrete Specific valid levels or ranges Missing value codes can be identified Data is intended to be analyzed as numbers

Text Data is intended to be analyzed as text Content can be any text Geographic codes may be numbers but are analyzed as text or string (leading zeros used) Content can be any text Constrain length Constrain regular expression A US ZIP Code is text 5 characters numeric characters 0-9 only

Date Time Allows specification of format Allows statistical software to handle appropriately

Data Relationship S14 Learning DDI: Pack S14 Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

S14 What we’re covering How Data Relationship provides the link between the physical record storage and their logical intellectual content How Variables and NCubes are grouped into Logical Records How Logical Records define complex file relationships

Understanding Data Relationships Data files can be described as following a structure What are the record types? What variables make up each record type? How do I know which record type I have? How can I find a unique record of a specific type? How are records related? DDI provides the information to automate processing of the data files themselves

Logical vs. Physical Every data file has one or more “logical records” (a record of analysis rather than a physical record) The logical description separates the support provided in the variable content from the physical structure DDI provides both human readable and machine actionable information to support programming Minimal information is REQUIRED even for single record type simple files. The LogicalRecord ID is the link between the physical store of the data and the logical description of its content S14 125

Data Relationship Logical Record: Record Relationship Assigns an ID to the logical record Provides information on the logical record type Identifies support for breaking the logical record into 2 or more physical segments in a storage structure Explains unique case identification Provides the content of logical record (Variables and NCubes) Record Relationship Provides links or “keys” between logical record types

Logical Record Identification Description hasLocator [boolean] and Variable Value Reference To a variable that declares the record type Support for Multiple Segments Specifies variable for this information Case Identification Options for identifying a unique case within a record type Variables OR NCubes in record and variableQuantity [integer]

Logical Record Minimum Requirements Identification Description hasLocator [boolean] and Variable Value Reference Support for Multiple Segments Case Identification Variables OR NCubes in record and variableQuantity [integer]

Record Type Locator EXAMPLE 1: variable rectype = “H” S14 Record Type Locator EXAMPLE 1: Household Record variable rectype = “H” Person Record variable rectype = “P” EXAMPLE 2: Record Type A variable chariter = [blank] Record Type B variable chariter ≥ ‘000’

Case Identification Simple case examples: Complex case identification: Case Number Survey form number Any single variable unique number within a record type Complex case identification: Concatenated keys Conditional concatenated keys

Complex Files and Record Relationships Complex files consist of more than one record type stored in one or more files Contains the complete Logical Record description for each record type Provides information on the relationship between records Provides the link(s) to other records Provides the link(s) between waves 131

S14 RecordRelationship A pairwise relationship of a source and target record Describes the relationship: Source and target record Type of relationship (=, >, <, ≠, ≤, ≥) Notice that the case identification of a record type is frequently used as a key for the relationship link

Logical Record Structure Person ID Age Gender Household ID Data File: Persons Note: this is a logical relationship – the fact that the records are in two files instead of one is unimportant. Household Type Household ID Household Income Housing Unit Type Data File: Households Logical Record Structure S14

Describing Data Storage To describe how data is stored, DDI-L separates the storage structures from the file actually containing the data The storage structures are reused The storage structure is called a physical data product The data files are called physical instances

Physical Instance (full file) Study Unit Data Collection Logical Data File Physical Structure 1 Physical Structure 2 Physical Instance (full file) Physical Instance (subset of records) 135

S15 Linkages Step 1 Define and identify the LogicalRecord within Data Relationship in the Logical Product PhysicalDataProduct – Physical Structure Format Default values Link to LogicalRecord Declaration of its physical segments (in terms of its storage in this structure) 136

Linkages Step 2 PhysicalDataProduct – Record Layout PhysicalInstance Link from RecordLayout to PhysicalRecordSegment Link from DataItem to a Variable or NCube description and to the physical location of the data in the data file PhysicalInstance Link to the RecordLayout(s) found in the file Link to the actual file of data 137

Logical Product Physical Instance LogicalRecord Variables PhysicalDataProduct PhysicalStructure REF: LogicalProduct Defines PhysicalRecordSegments RecordLayout REF: PhysicalRecordSegment DataItem REF: Variable Gives physical location in record 1..n Technically a 1..n but the additional data files must be the equivalent of an identical backup copy n..n Physical Instance REF: RecordLayout REF: Data File Summary Statistics REF: Variables Data File 1..1 S15 138

S15 Complexity May seem like a lot of referencing and indirection for a simple example Structure is designed to handle much more complex structures in a consistent manner For example health interview surveys may have records for multiple person types, incidence or event records, biomarkers, and relationship or situational change records stored and linked in many different ways Same structure handles all levels of complexity

Describing the Physical Store Link to a LogicalRecord Different structures to describe different storage formats We use XML Schema substitutions ASCII, internal, proprietary, etc. Information on relational links between record types stored in one or many data files (physical relationship) Links to Variables and NCube cells 140

Physical Description Physical Data Product Can describe any number of physical stores of data Describes the gross record layout Reference to a LogicalRecord Information on the use of multiple physical segments to store the data in the LogicalRecord Provides default values for various data typing information Describes the record layout in detail Links to a GrossRecordStructure Provides a detailed link between a specific variable and its physical storage location 141

PhysicalStructureScheme: PhysicalDataProduct PhysicalStructureScheme: Reference to LogicalRecord GrossRecordStructure Identifies PhysicalSegments RecordLayoutScheme Uses XML Schema “substitution groups” RecordLayoutScheme: Reference to PhysicalStructure BaseRecordLayout ncube_recordlayout proprietary Alternates for BaseRecordLayout inline_ncube_recordlayout DataSet tabular_ncube_recordlayout 142

PhysicalDataProduct ncube_recordlayout inline_ncube_recordlayout Allows for a record per aggregation case containing multiple ncubes listed in a fixed or comma delimited layout (used by the example) inline_ncube_recordlayout Allows the data to be listed as a table in-line in the PhysicalDataProduct tabular_ncube_recordlayout Describes a 2-dimensional tabular layout as used by spreadsheets 143

Proprietary Record Layout S15 Proprietary Record Layout Used for describing data files for proprietary software packages Statistical packages (SPSS, SAS, etc.) Relational databases (Oracle, SQL Server, etc.) Uses a “handle” (DataItemAddress) to define the variable location, instead of a known location within the file The files are typically binary, so a positional or delimited location does not work Examples: variable name, column name Allows for proprietary datatypes, outputs, and properties These describe software-specific parameters that can be defined by the user, according to the software package they use

S15 Data Set Allows for capturing the data in a DDI-specific XML format, as part of the DDI file Useful for archival storage of the data, where the data and metadata live in the same file/package Useful for feeding temporary data files to visualization packages/Web services Usually subsets of the full data file Many visualization packages expect data in XML format Web services demand that the communications are performed in an XML format This is a very verbose way of expressing the data – files get much larger!

Physical Instance S16 Learning DDI: Pack S16 Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

S16 Files of Data Data files are represented in the DDI metadata with a module called a physical instance This is just a metadata object which represents the existence of a physical file It also carries summary and category statistics because these change from data file to data file

S16 Physical Instance Has a one-to-one relationship with a physical file of data (plus a back-up if one exists) Allows for full record subsets of a large data set using record selection Houses summary and category statistics that are specific to a particular file Note that these can be in-line, referenced in another physical instance, or referenced as a separate data file (with complete logical product, physical data structure, and physical instance) 148

Record of a Physical Instance Link to a physical storage structure Specifics of the range of records in the file Record type selection Geographic selection Topical selection Summary statistics Identification and location of the actual data file 149

Record Subsets PhysicalRecordSegment Geography Date/Time Topic 79 record segments with each segment in its own file (US 2000 Census SF3) Geography Using SpatialCoverage to limit to a single country (Eurobarometer – Germany) Date/Time Use TemporalCoverage to limit to a single year (General Social Survey – 1998) Topic Use TopicalCoverage to limit to a single topical definition (Female cases only) 150

S16 NHGIS Processing: NHGIS project separates physical data files by geographic type Alabama Alaska Arizona Arkansas State County Place Tract State County Place Tract Alabama Alaska Arizona Arkansas One file per state with all geographic levels One file per geographic level with all states

Refeference to RecordLayout(s) ID/location of Data File PhysicalInstance Identfication Refeference to RecordLayout(s) ID/location of Data File Data Fingerprint Coverage limitations GrossFileStructure [Check sums and processing info] Summary Statistics [Variables] Category Statistics [allows for single level filters] 152

S17 From the Bottom Up This section summarizes what we have learned starting from the data item and working up to the full metadata description Learning DDI: Pack S17 Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

DDI-L from the data item up S17 DDI-L from the data item up LogicalProduct DDI-L breaks down a data file into three major components: The LogicalProduct describes the data dictionary The PhysicalDataProduct describes the file structure The PhysicalInstance describes an actual instance of the file PhysicalDataProduct PhysicalInstance 154

DDI-L from the data item up S17 DDI-L from the data item up LogicalProduct The PhysicalInstance refers to a record layout in the PhysicalDataProduct (the same data can be stored in different formats/locations or the same record can contain different data) PhysicalDataProduct The PhysicalInstance identifies the file (name, path/uri), holds statistics (#recs, #vars, freq, min, max, etc.) and other applicable proprietary info PhysicalInstance DataFileIdentification, GrossFileStructure, Statistics, ProprietaryInfo 155

DDI-L from the data item up S17 DDI-L from the data item up LogicalProduct The PhysicalDataProduct describes the PhysicalStructure of the file and (what are the data components) and its RecordLayout (variable location, formatting, etc). The PhysicalStructure refers to a logical record in the LogicalProduct (the same set of variables can be stored in different ways) PhysicalDataProduct PhysicalStructureScheme/PhysicalStructure The same structure can be used by multiple layouts Different layouts are used to describe text and proprietary files. RecordLayoutScheme/RecordLayout OR RecordLayoutScheme/ProprietaryRecordLayout PhysicalInstance DataFileIdentification, GrossFileStructure, Statistics, ProprietaryInfo 156

DDI-L from the data item up S17 DDI-L from the data item up The LogicalProduct result from earlier life cycle stages. (this information is not available in traditional data files) LogicalProduct CategoryScheme/Category The LogicalProduct describes the data dictionary Variables (name,label, formats, etc.), the Codes & Categories (classifications) as well as the the Logical Record for storage. CodeScheme/Code VariableScheme/Variable DataRelationship/LogicalRecord The Data Relationship can describe complex hierarchical structures and indexes. PhysicalDataProduct PhysicalStructureScheme/PhysicalStructure RecordLayoutScheme/RecordLayout OR RecordLayoutScheme/ProprietaryRecordLayout PhysicalInstance DataFileIdentification, GrossFileStructure, Statistics, ProprietaryInfo 157

DDI-L from the data item up S17 DDI-L from the data item up DDIInstance/StudyUnit Abstract, Coverage, Purpose, … The XML is contained by a StudyUnit wrapped by a DDIInstance. ConceptualComponents ConceptScheme/Concept UniverseScheme/Universe The ConceptualComponents describes the concepts, universe and the DataCollecion module captures the questionnaire and survey instrument. DataCollection QuestionScheme/Question ControlConstruct, Instruction, Instrument,… LogicalProduct CategoryScheme/Category CodeScheme/Code VariableScheme/Variable DataRelationship/LogicalRecord PhysicalDataProduct PhysicalStructureScheme/PhysicalStructure RecordLayoutScheme/RecordLayout OR RecordLayoutScheme/ProprietaryRecordLayout PhysicalInstance DataFileIdentification, GrossFileStructure, Stattistics, ProprietaryInfo 158

DDI-L from the data item up S17 DDI-L from the data item up DDIInstance/StudyUnit Abstract, Coverage, Purpose, … ConceptualComponents ConceptScheme/Concept UniverseScheme/Universe DataCollection QuestionScheme/Question ControlConstruct, Instruction, Instrument,… LogicalProduct CategoryScheme/Category CodeScheme/Code VariableScheme/Variable DataRelationship/LogicalRecord PhysicalDataProduct PhysicalStructureScheme/PhysicalStructure RecordLayoutScheme/RecordLayout OR RecordLayoutScheme/ProprietaryRecordLayout PhysicalInstance DataFileIdentification, GrossFileStructure, Stattistics, ProprietaryInfo 159

DDI in context Managing Research Managing digital resources Individual research Large research projects – longitudinal multi-researcher Managing digital resources

Managing research

Individual researchers Tools – using the software they know Clarifying what metadata needs to be captured for future preservation and discovery Building/locating metadata resources that support comparison Getting metadata from individual researchers is not a new problem – DDI can’t solve it but can provide some direction

The Longitudinal Version of GSBPM In 2011 at a Dagstuhl workshop on Longitudinal metadata a modification of the GSBPM was developed to describe data production for large on-going research projects This work is still under development but may result in a more detailed lifecycle model for DDI moving forward

Note the similarity to the DDI Combined Lifecycle Model and the top level of the GSBPM

S01

S01

Upstream Metadata Capture Because there is support throughout the lifecycle, you can capture the metadata as it occurs It is re-useable throughout the lifecycle It is versionable as it is modified across the lifecycle It supports production at each stage of the lifecycle It moves into and out of the software tools used at each stage

Metadata Driven Data Capture S05 Metadata Driven Data Capture Questions can be organized into survey instruments documenting flow logic and dynamic wording This metadata can be used to create control programs for Blaise, CASES, CSPro and other CAI systems Generation Instructions can drive data capture from registry sources and/or inform data processing post capture

S05 Reuse of Metadata You can reuse many types of metadata, benefitting from the work of others Concepts Variables Categories and codes Geography Questions Promotes interoperability and standardization across organizations Can capture (and re-use) common cross-walks

Managing digital resources

Management of Data and Metadata Managing metadata: Capture – goal is to capture at point of origin Reuse – reduce burden, reduce error, comparison Quality control – reuse, replication, paradata Preservation – metadata in a non-proprietary format Provenance – how the data was created Processing – metadata driven processing Discovery and access Analysis support and information Digital objects Data as a unique object – without metadata its just a number

Data/Metadata Mgmt Activities Data Capture determining what is to be collected from whom and how Data Processing cleaning, normalizing, aggregating, harmonizing, creation of data products Process evaluation and revision quality control, process improvement, evaluation and analysis Data Discovery Finding data, accessing data Preservation short term and archival Administrative tracking who has control, where in the process

Data/Metadata Management Downside: There’s a lot more to manage Greater depth than many other digital objects Greater detail that can be leveraged for discovery, access, and application Costly to translate into a standard format Upside: We’ve been managing digital data for over 40 years No need to reinvent the wheel DDI as a metadata structure is not an “all or nothing” approach DDI uptake has moved out of the archives and is moving into the production process

Working within a Library/Archive System Actionable and informational metadata What do you need to “do” with the metadata? Discovery How deep do you want to go? How integrated do you want the results to be? Visualization / Manipulation Analysis Preservation / Archive

Archive/Data Discovery and Delivery Data and Metadata are generally received from external organizations Focus is on moving data and metadata to a preservation format and supporting discovery and delivery tools Management of ingest process (process management) “Value Added” material

Archive/Data Discovery and Delivery Capturing full content Machine actionable Information for discovery Retaining links to other materials, collections and grouping Added value metadata from archive Variable, question, and data element groups related to subject and keyword access Linking to a common geography description Linking to an overall organization description Tracking archival management activities and processes

Working with producers/researchers How much can you influence depositors? Ingest tools that result in DDI metadata Provision of reusable materials (schemes) or controlled vocabularies metadata management tools Training What can be pushed back to long term depositors? Resource package material? Metadata of deposited data so that only differences are reported? Tools to manage change over time?

General use statements

Upstream Metadata Capture Because there is support throughout the lifecycle, you can capture the metadata as it occurs It is re-useable throughout the lifecycle It is versionable as it is modified across the lifecycle It supports production at each stage of the lifecycle It moves into and out of the software tools used at each stage

S05 Reuse of Metadata You can reuse many types of metadata, benefitting from the work of others Concepts Variables Categories and codes Geography Questions Promotes interoperability and standardization across organizations Can capture (and re-use) common cross-walks

Metadata Driven Data Capture S05 Metadata Driven Data Capture Questions can be organized into survey instruments documenting flow logic and dynamic wording This metadata can be used to create control programs for Blaise, CASES, CSPro and other CAI systems Generation Instructions can drive data capture from registry sources and/or inform data processing post capture

Management of Information, Data, and Metadata S05 Management of Information, Data, and Metadata An organization can manage its organizational information, metadata, and data within repositories using DDI 3 to transfer information into and out of the system to support: Controlled development and use of concepts, questions, variables, and other core metadata Development of data collection and capture processes Support quality control operations Develop data access and analysis systems

DAY 2

DDI-C and DDI-L DDI has 2 development lines DDI Codebook (DDI-C) DDI Lifecycle (DDI-L) Both lines will continue to be improved DDI-C focusing just on single study codebook structures DDI-L focusing on a more inclusive lifecycle model and support for machine actionability

S02 Background Concept of DDI and definition of needs grew out of the data archival community Established in 1995 as a grant funded project initiated and organized by ICPSR Members: Social Science Data Archives (US, Canada, Europe) Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance Membership based alliance Formalized development procedures 186

Early DDI: Characteristics of DDI-C Focuses on the static object of a codebook Designed for limited uses End user data discovery via the variable or high level study identification (bibliographic) Only heavily structured content relates to information used to drive statistical analysis Coverage is focused on single study, single data file, simple survey and aggregate data files Variable contains majority of information (question, categories, data typing, physical storage information, statistics) 187

Limitations of these Characteristics Treated as an “add on” to the data collection process Focus is on the data end product and end users (static) Limited tools for creation or exploitation The Variable must exist before metadata can be created Producers hesitant to take up DDI creation because it is a cost and does not support their development or collection process 188

Origins of the DDI Alliance DDI-C was developed by an informal network of individuals from the social science community and official statistics Funding was through grants It was decided that a more formal organization would help to drive the development of the standard forward Many new features were requested The DDI Alliance was born to facilitate the development in a consistent and on-going fashion

DDI Alliance Structure DDI-L specifications are created by committees drawn from among the member organizations Some outside experts are invited to attend The Steering Committee governs the organization The Expert Committee votes to approve all published work One representative per member organization The Technical Implementation Committee (TIC) creates the technical work products (XML schemas, UML models, documentation, etc.) Working Groups are short term groups working on future DDI topical content (i.e., Survey Design & Implementation) Tools Catalog Group describing tools and software to work with DDI Web Site Maintenance Group

Moving from DDI-C to DDI-L S03 Moving from DDI-C to DDI-L DDI Alliance members wished to support current DDI-C users and will continue to support this specification The limitations of DDI-C needed to be addressed in order to move the standard forward to a broader audience and user base Requirements for DDI-L came out of the original committee as well as the broader data archive community The development of the first wave of software for DDI-C raised additional requirements

Requirements for DDI-L Improve and expand the machine-actionable aspects of the DDI to support programming and software systems Support CAI instruments through expanded description of the questionnaire (content and question flow) Support the description of data series (longitudinal surveys, panel studies, recurring waves, etc.) Support comparison, in particular comparison by design but also comparison-after-the fact (harmonization) Improve support for describing complex data files (record and file linkages) Provide improved support for geographic content to facilitate linking to geographic files (shape files, boundary files, etc.) 192

S03 DDI Lifecycle Model Metadata Reuse 193

Relationship to Other Standards: Archival Dublin Core Basic bibliographic citation information Basic holdings and format information METS Upper level descriptive information for managing digital objects Provides specified structures for domain specific metadata OAIS Reference model for the archival lifecycle PREMIS Supports and documents the digital preservation process 194

Relationship to Other Standards: Non-Archival ISO 19115 – Geography Metadata structure for describing geographic feature files such as shape, boundary, or map image files and their associated attributes ISO/IEC 11179 International standard for representing metadata in a Metadata Registry Consists of a hierarchy of “concepts” with associated properties for each concept ISO 17369 SDMX Exchange of statistical information (time series/indicators) Supports metadata capture as well as implementation of registries 195

S05 Mining the Archive With metadata about relationships and structural similarities You can automatically identify potentially comparable data sets You can navigate the archive’s contents at a high level You have much better detail at a low level across divergent data sets

Metadata Coverage [Packaging] Citation Geographic Coverage S20 Metadata Coverage [Packaging] Citation Geographic Coverage Temporal Coverage Topical Coverage Structure information Physical storage description Variable (name, label, categories, format) Source information Methodology Detailed description of data Processing Relationships Life-cycle events Management information Tabulation/aggregation Dublin Core ISO/IEC 11179 ISO 19115 Statistical Packages METS PREMIS SDMX DDI

S03 Moving from DDI 1/2 to DDI 3 DDI Alliance members wished to support current DDI 1/2 users and will continue to support this specification The limitations of DDI 1/2 needed to be addressed in order to move the standard forward to a broader audience and user base Requirements for DDI 3 came out of the original committee as well as the broader data archive community The development of the first wave of software for DDI 1/2 raised additional requirements

S03 Requirements for 3.0 Improve and expand the machine-actionable aspects of the DDI to support programming and software systems Support CAI instruments through expanded description of the questionnaire (content and question flow) Support the description of data series (longitudinal surveys, panel studies, recurring waves, etc.) Support comparison, in particular comparison by design but also comparison-after-the fact (harmonization) Improve support for describing complex data files (record and file linkages) Provide improved support for geographic content to facilitate linking to geographic files (shape files, boundary files, etc.) 199

DDI 1 / 2 Document Description Citation of the codebook document Guide to the codebook Document status Source for the document Study Description Citation for the study Study Information Methodology Data Accessibility Other Study Material File Description File Text (record and relationship information) Location Map (required for nCubes optional for microdata) Data Description Variable Group and nCube Group Variable (variable specification, physical location, question, & statistics) nCube Other Material

S03 Our Initial Thinking… The metadata payload from DDI 1/2 was re-organized to cover these areas. We are starting to break these out based on a Data Life Cycle Model, keeping in mind not only those sections that are highly related to each other, but also looking at those sections that stay consistent even when the data itself is moved to a different physical storage structure, or when it moves from one archive to another, or even when it encounters various methods of obtaining access to the data. There is obviously a lot more detail behind this, but what we are trying to provide with this model is a base for discussion within the Expert Committee and DDI community. All of this is a prelude to the creation of Version 3.0. 201

Variable specification nCubes Variable & nCube Groups File Text Location Map Physical Location Statistics Study Citation Document Source Study Information Data Accessibility Study Methodology Questions Variable specification nCubes Variable & nCube Groups We are starting to break these out based on a Data Life Cycle Model, keeping in mind not only those sections that are highly related to each other, but also looking at those sections that stay consistent even when the data itself is moved to a different physical storage structure, or when it moves from one archive to another, or even when it encounters various methods of obtaining access to the data. There is obviously a lot more detail behind this, but what we are trying to provide with this model is a base for discussion within the Expert Committee and DDI community. All of this is a prelude to the creation of Version 3.0. Other Material 202

S03 For later parts of the lifecycle, metadata is reused heavily from earlier Modules. Wrapper We are starting to break these out based on a Data Life Cycle Model, keeping in mind not only those sections that are highly related to each other, but also looking at those sections that stay consistent even when the data itself is moved to a different physical storage structure, or when it moves from one archive to another, or even when it encounters various methods of obtaining access to the data. There is obviously a lot more detail behind this, but what we are trying to provide with this model is a base for discussion within the Expert Committee and DDI community. All of this is a prelude to the creation of Version 3.0. The discovery and analysis itself creates data and metadata, re-used in future cycles. 203

S03 Realizations Many different organizations and individuals are involved throughout this process This places an emphasis on versioning and exchange between different systems There is potentially a huge amount of metadata reuse throughout an iterative cycle We needed to make the metadata as reusable as possible Every organization acts as an “archive” (that is, a maintainer and disseminator) at some point in the lifecycle When we say “archive” in DDI 3, it refers to this function 204

DDI 3 and the Data Life Cycle A survey is not a static process: It dynamically evolves across time and involves many agencies/individuals DDI 1/2 is about archiving, DDI 3 across the entire “life cycle” DDI 3 focuses on metadata reuse (minimizes redundancies/discrepancies, support comparison) Also supports multilingual, grouping, geography, and others DDI 3 is extensible S03 205

Approach Shift from the codebook centric model of early versions of DDI to a lifecycle model, providing metadata support from data study conception through analysis and repurposing of data Shift from an XML Data Type Definition (DTD) to an XML Schema model to support the lifecycle model, reuse of content and increased controls to support programming needs Redefine a “single DDI instance” to include a “simple instance” similar to DDI 1/2 which covered a single study and “complex instances” covering groups of related studies. Allow a single study description to contain multiple data products (for example, a microdata file and aggregate products created from the same data collection). Incorporate the requested functionality in the first published edition S03 206

Development of DDI 3 2007 2004 – Acceptance of a new DDI paradigm 2008 S03 Development of DDI 3 2007 Vote to move to Candidate Version Establishment of a set of use cases to test application and implementation 2008 April: DDI 3.0 published 2009 DDI 3.1 approved for publication in May 2009 Published October 2009 Bugs and feature corrections identified during the first year of use, some were backward incompatible 2004 – Acceptance of a new DDI paradigm Lifecycle model Shift from the codebook centric / variable centric model to capturing the lifecycle of data Agreement on expanded areas of coverage 2005 Presentation of schema structure Focus on points of metadata creation and reuse 2006 Presentation of first complete 3.0 model Internal and public review 207

S03 DDI 3.2 Currently working on DDI 3.2 which will address bug and feature corrections Publication for review in 2011 Noted areas of correction: Broader support for controlled vocabularies Clarification of record relationship Clarification of ID and URN structures Missing value declarations Expanded Response Domain/Representation options

S05 Change DDI 3 is a major change from DDI 1/2 in terms of content and structure. Lets step back and look at: Basic differences between DDI 1/2 and DDI 3 Applications for DDI 1/2 and DDI 3 Differences that allow DDI 3 to do more How these differences provide support for better management of information, data, and metadata

Differences Between DDI 1/2 and 3 Codebook based Format XML DTD After-the-fact Static Metadata replicated Simple study Limited physical storage options DDI 3 Lifecycle based Format XML Schema Point of occurrence Dynamic Metadata reused Simple study, series, grouping, inter-study comparison Unlimited physical storage options

DDI 1/2 Applications Simple survey capture High level study description with variable information for stand alone studies Descriptions of basic nCubes (individual statistical tables) Replicating the contents of a codebook including the data dictionary Collection management beyond bibliographic records

S05 DDI 3 Applications Describing a series of studies such as a longitudinal survey or cross-cultural survey Capturing comparative information between studies Sharing and reusing metadata outside the context of a specific study Capturing data in the XML Capturing process steps from conception of study through data capture to data dissemination and use Capturing lifecycle information as it occurs, and in a way that can inform and drive production Management of data and metadata within an organization for internal use or external access

Why can DDI 3 do more? It is machine-actionable – not just documentary It’s more complex with a tighter structure It manages metadata objects through a structured identification and reference system that allows sharing between organizations It has greater support for related standards Reuse of metadata within the lifecycle of a study and between studies

Reuse Across the Lifecycle This basic metadata is reused across the lifecycle Responses may use the same categories and codes which the variables use Multiple waves of a study may re-use concepts, questions, responses, variables, categories, codes, survey instruments, etc. from earlier waves

S05 Reuse by Reference When a piece of metadata is re-used, a reference can be made to the original In order to reference the original, you must be able to identify it You also must be able to publish it, so it is visible (and can be referenced) It is published to the user community – those users who are allowed access

S05 Change over Time Metadata items change over time, as they move through the data lifecycle This is especially true of longitudinal/repeat cross-sectional studies This produces different versions of the metadata The metadata versions have to be maintained as they change over time If you reference an item, it should not change: you reference a specific version of the metadata item

DDI Support for Metadata Reuse DDI allows for metadata items to be identifiable They have unique IDs They can be re-used by referencing those IDs DDI allows for metadata items to be published The items are published in resource packages Metadata items are maintainable They live in “schemes” (lists of items of a single type) or in “modules” (metadata for a specific purpose or stage of the lifecycle) All maintainable metadata has a known owner or agency Maintainable metadata may be versionable Versions reflect changes over time The versionable metadata has a version number

Study A Study B Study B Ref= “Variable X” Ref= “Variable X” uses uses re-uses by reference Variable ID=“X” Variable ID=“X” published in Resource Package

Variable Scheme ID=“123” Agency=“GESIS” contained in Variable ID=“X” Version=“1.0” changes over time Variable ID=“X” Version=“1.1” changes over time Variable ID=“X” Version=“2.0”

Management of Information, Data, and Metadata S05 Management of Information, Data, and Metadata An organization can manage its organizational information, metadata, and data within repositories using DDI 3 to transfer information into and out of the system to support: Controlled development and use of concepts, questions, variables, and other core metadata Development of data collection and capture processes Support quality control operations Develop data access and analysis systems

Upstream Metadata Capture Because there is support throughout the lifecycle, you can capture the metadata as it occurs It is re-useable throughout the lifecycle It is versionable as it is modified across the lifecycle It supports production at each stage of the lifecycle It moves into and out of the software tools used at each stage

Metadata Driven Data Capture S05 Metadata Driven Data Capture Questions can be organized into survey instruments documenting flow logic and dynamic wording This metadata can be used to create control programs for Blaise, CASES, CSPro and other CAI systems Generation Instructions can drive data capture from registry sources and/or inform data processing post capture

S05 Reuse of Metadata You can reuse many types of metadata, benefitting from the work of others Concepts Variables Categories and codes Geography Questions Promotes interoperability and standardization across organizations Can capture (and re-use) common cross-walks

S05 Virtual Data When researchers use data, they often combine variables from several sources This can be viewed as a “virtual” data set The re-coding and processing can be captured as useful metadata The researcher’s data set can be re-created from this metadata Comparability of data from several sources can be expressed

S05 Mining the Archive With metadata about relationships and structural similarities You can automatically identify potentially comparable data sets You can navigate the archive’s contents at a high level You have much better detail at a low level across divergent data sets

DDI - Codebook

Nesstar – HANDS ON

Data Collection/Processing Data collection in the lifecycle Representing question text Questions and questionnaires Representing response domains Processing collected data

Production Evaluate current processes What is done? Who does it? How is it done (existing software, processes)? Where do sections of DDI 3 fit into the process? Where does metadata first come into existence? What metadata can be reused? What sections of metadata be “produced” directly from existing metadata? Time/cost savings Consistency

Internal Consistency Standards within an organization or community Concept schemes Question schemes Coding schemes Interoperability between different proprietary software systems allows forward flexibility for software decisions allows specialized software for sub-processes

Data Collection / Production S11 Data Collection / Production End use is no longer the only focus Major selling point of DDI 3 to production organizations is its ability to “inform and drive the process” Metadata content is reused in DDI 3 so capturing it early is an advantage to the producer Metadata captured early can drive the production process

Metadata-Driven Processing: An Example What is your socio- economic status? I’m very, very wealthy! Survey Design Tool CAI Tool Survey Documentation Interviewer Respondent DDI 3 Question Bank Generated from DDI 3 This replaces older processes where surveys/CAI were created by hand, and documented after-the-fact. S11

Capturing and Reusing Metadata Whether captured at inception or created after-the-fact some sections must be completed before other sections can be completed The capture of metadata at point of inception in a non-proprietary structure that can be transferred out-of and into process software provides incentive for metadata creation during the life cycle of the data 233

S11 Metadata Flow DDI is built on the life cycle of the data and some information naturally occurs earlier than other information Reuse of and reference to certain types of information such as universe, concepts, categories, and coding schemes prescribe a creation order 234

S11 Universe Scheme Concept Organization / Individual StudyUnit Citation Coverage Category Coding Question Variable Processing Event (coding) Data Relationships NCube Record Structure Remaining Physical Data Product Items Physical Instance Archive / Group / etc. STEP 1 STEP 2 STEP 3 optional Remaining Logical Product items STEP 4 STEP 5 STEP 6 STEP 7 Instrument Control Construct Scheme 235

Questions to Variables REGISTRY Questions to Variables Question Development Software Identifying Universe and Concepts Building or Importing Question Text and Response Domains Instrument Development Software CAI Organizing questions and flow logic Capturing raw response data and process data Data Processing Software Data cleaning and verification Recoding and/or deriving new data elements using existing or new categories or coding schemes DDI DDI S11 236

DAY 2

Geographic Structure Level Parent OR Geographic Layer Code, Name, coverage limitation, description Parent Reference to a single parent geography This is used to describe single hierarchies OR Geographic Layer References multiple base levels where multiple hierarchies are layered to create a resulting polygon

S10 STATE County

S10 COUNTY County Subdivision Census Tract Place

Hierarchies and Layers State (040) County (050) County Subdivision (060) Census Tract (140) Place (160) Portion of a Census Tract within a County Subdivision within a Place Layer References: 140 060 160

S10 Geographic Location Level description and/or a reference to the level description in the Geographic Structure Reference to the variable containing the identifier of the geographic location Description of a specific geographic location: Code Name Geographic time Bounding Polygon Excluding Polygon

Structure and Location Level: 040 Name: State U.S. State or state equivalent including Legal Territories and the District of Columbia Parent: 010 [country] LOCATION: Level Reference: 040 Variable Reference: STATEFP Name: Minnesota Code Value: 27 Geographic Time: Start: 1857 End: 9999 Bounding Polygon or Shape File Reference: for each boundary over time

DDI Basics (continued) Study level information (continued) Data capture Questions, question flow Collection and processing events Variables Data dictionary contents Record relationships Physical storage Statistics From the bottom up Grouping and comparison

Comparison There are two types of comparison in DDI 3: Comparison by design Ad-hoc (after-the-fact) comparison Comparison by design can be expressed using the grouping and inheritance mechanism Ad-hoc comparison can be described using the comparison module The comparison module is also useful for describing harmonization when performing case selection activities 245

S18 Data Comparison To compare data from different studies (or even waves of the same study) we use the metadata The metadata explains which things are comparable in data sets When we compare two variables, they are comparable if they have the same set of properties They measure the same concept for the same high-level universe, and have the same representation (categories/codes, etc.) For example, two variables measuring “Age” are comparable if they have the same concept (e.g., age at last birthday) for the same top-level universe (i.e., people, as opposed to houses), and express their value using the same representation (i.e., an integer from 0-99) They may be comparable if the only difference is their representation (i.e., one uses 5-year age cohorts and the other uses integers) but this requires a mapping

DDI Support for Comparison For data which is completely the same, DDI provides a way of showing comparability: Grouping These things are comparable “by design” This typically includes longitudinal/repeat cross-sectional studies For data which may be comparable, DDI allows for a statement of what the comparable metadata items are: the Comparison module The Comparison module provides the mappings between similar items (“ad-hoc” comparison) Mappings are always context-dependent (e.g., they are sufficient for the purposes of particular research, and are only assertions about the equivalence of the metadata items)

S18 Comparability The comparability of a question or variable can be complex. You must look at all components. For example, with a question you need to look at: Question text Response domain structure Type of response domain Valid content, category, and coding schemes The following table looks at levels of comparability for a question with a coded response domain More than one comparability “map” may be needed to accurately describe comparability of a complex component

Detail of question comparability Comparison Map Textual Content of Main Body Category Code Scheme Same Similar Different Question X

Tools and resources

S20 Tools/Projects DDI-L has only been an official standard since April 2008 Despite this, many tools are being developed Some useful tools already exist Some tools are available, others are projects which would be willing to share code (or partner) as the basis for further development The list may not be complete IASSIST has a DDI Tools panel every year – see online presentations There is an online tools database at the DDI Alliance site

Tools/Projects (cont.) Nesstar (developed by Norwegian Social Sciences Data Services) Commercial product supporting DDI 1.*/2.* (Editor is free.) Provides an editing interface, visualization/tabulation, and server-to-server data exchange Nesstar editor is used by the IHSN Metadata Toolkit, which adds publishing functionality for HTML, PDF, and CD-ROMs Useful for migration to DDI-L

Tools/Projects (cont.) DDI Foundation Tools Program Joint initiative by several organizations to develop open-source tools for DDI-L Includes DeXT (UKDA) and GESIS-developed tools for transformations to and from DDI 1.0 – 3.0 and statistical packages (SAS, SPSS. Stata) Provides a utilities package for Java development, including validation, XML beans, URN resolution Now developing a suite of tools for editing DDI-L instances based on a common application framework (work is lead out of the Danish Data Archive)

Tools/Projects (cont.) Canadian RDC Network Producing DDI-L-based tools for many DDI use cases Editing Migration from DDI-Codebook Registries Repositories Metadata mining All tools will be open-source when completed (over next 2 years) Some available now on request

Tools/Projects (cont.) Colectica (by Algenta) Commercial tool supporting survey instrument creation, and other editing functions of DDI Has a repository component Has Web and PDF publishing functionality Supports DDI-C, DDI-L, Blaise, Cases, and CSPro files DDI 3.1 is the native file format CSPro Is currently developing support for DDI-L Already supports DDI-C Free product

Tools/Projects (Cont.) Space-Time Research Has DDI-C and DDI-L support in their line of products (SuperCross, SuperWeb, etc.), for loading micro-data into their proprietary databases Commercial tool providing point-and-click functionality for tabulation of microdata Support for SDMX expression of tabulations Uses SDMX RESTful Web services (sort of…) Questacy Based on an online documentation tool for the LISS panel study at CentERdata Willing to partner to productize the code base Database-driven application using PHP and other easy Web development technologies

Tools/Projects (Cont.) Exanda Online tabulation system based on DDI-L Intended to be released as open source, but no committed delivery date Uses freely available software components (Flex, Apache Cocoon, etc.) QDDS Documentation system for questionnaires developed by GESIS - Leibniz Institute for the Social Sciences Uses DDI-C, plans for supporting DDI-L in future Freely available, but not open source

Tools/Projects (Cont.) University of Tokyo Producing a multi-lingual DDI editor English-language interface not yet available (2012/13?) Will be open-source Stat Transfer Has implemented support for going to/from statistical packages to DDI 3.1

Tools/Projects (Cont.) Blaise Has support for exporting DDI-L descriptions of surveys Developed at University of Michigan (ISR - SRO) Various (GESIS, University of Kansas, etc.) Code for exporting DDI from statistical packages (SAS, SPSS) Generally available free if you know who to ask

DDI Resources DDI Alliance Site Tools/Resources Page http://www.ddialliance.org General link to all resources/news Link to Sourceforge for standards distributions Link to prototype page – good for examples There is a DDI newsletter you can subscribe to Tools/Resources Page http://tools.ddialliance.org Best place for tools, slides, and resources

DDI Resources (cont.) Mailing Lists Open Data Foundation Site www.icpsr.umich.edu/mailman/admin/ All of the lists starting with “DDI” are related to DDI topics General list List for each sub-committee Not all groups are active User list is the best general place Open Data Foundation Site www.opendatafoundation.org White papers, other resources/tools

DDI Resources (cont.) DDI Agency Registry http://tools.ddialliance.org/?lvl1=community&lvl2=agencyid Sign up for unique global agency identifier – helps provide interoperability between organizations Currently deploying permanent registry International Household Survey Network http://surveynetwork.org DDI-C-based toolkit available for developing countries (some free tools) Catalog of surveys, many documented in DDI (NADA) – open source

Best Practices (available at DDI Alliance website) Implementation and Governance Work flows - Data Discovery and Dissemination: User Perspective Work flows - Archival Ingest and Metadata Enhancement Work flows for Metadata Creation Regarding Recoding, Aggregation and Other Data Processing Activities Controlled Vocabularies Creating a DDI Profile DDI 3.0 Schemes Versioning and Publication DDI as Content for Registries Management of DDI 3.0 Unique Identifiers DDI 3.0 URNs and Entity Resolution High-Level Architectural Model for DDI Applications

Use Cases (available at DDI Alliance website) Questasy: Documenting and Disseminating Longitudinal Data Online Using DDI 3 Building a Modular DDI 3 Editor Using DDI 3 for Comparison Extracting Metadata From the Data Analysis Workflow Questionnaire Management and DDI: The QDDS Case Grouping of Survey Series Using DDI 3 An Archive's Perspective on DDI 3

DDI Events IASSIST www.iassistdata.org Not an official DDI event, but many DDI-related presentations and meetings DDI Alliance Expert Committee meets before or after every year 38th Meeting in Washington DC, was hosted by NORC, June 2012 39th Meeting in Köln, Germany, hosted by GESIS - Leibniz Institute for the Social Sciences DDI Workshops often given day before the meeting Annual meetings go US-Canada-US-Outside North America-US-Canada-US-Outside North America etc.

DDI Events (cont.) European DDI User’s Group 3rd Meetings was last December at Gothenburg, Sweden 4th Meeting will be in Bergen, Norway, December 2012 Preceded by a DDI Implementers workshop North American User Group now being formed GESIS-Sponsored Autumn Events Schloss Dagstuhl workshops Open Data Foundation meetings Spring meeting in Europe Winter meeting in the US DDI is a major topic of discussion