Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007.

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

DLI Training Nesstar Workshop
Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Metadata Management at GESIS-ZA Reiner Mauer GESIS – Data Archive and Data Analysis CESSDA-Expert Seminar Odense, September 11th 2008.
Foundational Objects. Areas of coverage Technical objects Foundational objects Lessons learned from review of Use Case content Simple Study Simple Questionnaire.
Overview of key concepts and features
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
Inside View of DDI Version 3.0: Structural Reform Group Report Presented to IASSIST 25 May 2005 Edinburgh Scotland UK.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
Creating DDI Compliant Codebooks Wendy L. Thomas William C. Block Robert P. Wozniak Joshua J. Buysse A workshop presented at IASSIST 2001 Amsterdam NL.
Demonstration of a Blaise Instrument Documentation System “BlaiseDoc” Gina-Qian Cheung May 25, 2005 Institution for Social Research University of Michigan.
Reusable!? Or why DDI 3.0 contains a recycling bin.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Codebook Centric to Life-Cycle Centric In the beginning….
Geospatial standards Beyond FGDC Geog 458: Map Sources and Errors March 3, 2006.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Metadata : Setting the Scene or a Basic Introduction Wendy Duff University of Toronto, Faculty of Information Studies.
 Name and organization  Have you worked with DDI before? (2 or 3)  If not, are you familiar with XML?  What kind of CAI systems do you use?  Goals.
ISO Standards: Status, Tools, Implementations, and Training Standards/David Danko.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
15 November 2005Linking Outside the Box1 Cross referencing between XML documents Bob Stayton Sagehill Enterprises
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
AIXM 5.1 Seminar 12 – 13 December 2011
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Chapter 7 Structuring System Process Requirements
DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
2008 EPA and Partners Metadata Training Program: 2008 CAP Project Geospatial Metadata: Introduction Module 2: FGDC CSDGM Metadata Compliancy.
DLI Training April 2004 Kingston Ontario. DDI What, Why, How?
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Introduction to Metadata, the DDI and the Metadata Editor Presentation to the SERPent project team by Margaret Ward 3 March 2010.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, for UNECE.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
Metadata Management and Tools August 1, 2013 Data Curation Course.
Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.
Introduction to the Semantic Web and Linked Data
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
DDI AND EXPERIENCES AT ICPSR Prepared for Expert Seminar Finnish Social Science Data Archive Tampere, Finland September 1-2, 2000.
5. Applying metadata standards: Application profiles Metadata Standards and Applications Workshop.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
A look to the past for the future- The North American Profile Sharon Shin Metadata Coordinator Federal Geographic Data Committee.
TIC Updates EDDI 2010 Wendy Thomas – 6 Dec Schedule and Process Changes Production schedule is moving to: – Summer / Winter release schedule January.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Metadata models to support the statistical cycle: IMDB
XML QUESTIONS AND ANSWERS
Data Management: Documentation & Metadata
Enhancing ICPSR metadata with DDI-Lifecycle
Logical information model LIM Geneva june
SDMX Information Model: An Introduction
Metadata in Digital Preservation: Setting the Scene
Database Design Hacettepe University
The role of metadata in census data dissemination
Presentation transcript:

Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007

DDI Version 3.0 Radically different. More complex… (…but certainly doable!) Brings important benefits.

Workshop Schedule 14:30 – 15:10Overview (40) 15:10 – 15:35Structure and Technical Mechanisms (25) 15:35 – 15:45 Break (10) 15:45 – 16:10Study Unit – Modules Content (25) 16:10 – 16:30 Variable Markup Example (20) 16:30 – 16:40 Break (10) 16:40 – 17:10 Grouping – Modules Content and Examples (30) 17:10 – 17:30Getting Started (20)

DDI 3.0 Overview

DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create a new standard for documenting social science data, to replace OSIRIS tagged codebooks. First drafts used SGML, then converted to Web- friendly XML – DDI Version 1.0 published as a mainly document- and codebook-centric standard.

DDI Background Development History 2003 – DDI Version 2.0 published with extended scope: –Aggregate data coverage (based on matrix structure) –Additional geographic representation to assist geographic search systems and GIS users Versions 1.0 through 2.1 (latest published) are backwards compatible, and based on the same structure.

DDI Background Development History February 2003 – Formation of the DDI Alliance, a self-sustaining membership organization whose members have a voice in the development of the DDI specification.

DDI Background Development History Version 3.0: : Planning and Development November 2006: Internal Review February 2007: Public Review July 2007: Candidate Draft Release

Benefits of using DDI as an XML-based standard Interoperability: –Enables seamless exchange and reuse by other systems. Repurposing: –Provides a core document from which different types of outputs can be generated. Value-added documentation: –Tagging carries “intelligence” in the document by describing content. Enhanced Data Discovery: –Increases precision and granularity of searches. Support for Data Analysis: –Variables description is accepted as input by online analysis systems. Multiple presentation formats: –ASCII – text; PDF; HTML; RTF. Preservation-friendly: –Non-proprietary format.

Why DDI 3.0? DDI 3.0 presents new features in response to: Perceived needs of: -Data users -Data producers -Data archivists/librarians Developments in documenting and archiving data Advances in XML technology

DDI 3.0 and the Data Life Cycle Model DDI Versions 1/2 were codebook-centric: Closely followed the structure of traditional print codebooks. Captured data documentation at a single, “frozen” point in time – archiving.

DDI 3.0 and the Data Life Cycle Model Version 3.0 is Life Cycle oriented: -Designed to cover all stages in the life cycle of a data collection: pre-production production post-production secondary use

Life Cycle Coverage in DDI 3.0 Planning for the Study: Proposal / Design Study Purpose / Outline Concepts Study Population Author(s) Funding Sources Version 3.1 Survey / Sample Design Pre-testing

Life Cycle Coverage in DDI 3.0 Proposal becomes reality… Data Collection methodology: sampling, time, etc. Instrument characteristics Questionnaire Data cleaning, weighting, coding, etc.

Life Cycle Coverage in DDI 3.0 Publishing the data… Intellectual content: Variables, Categories, Codes. Physical representation: Data format, Record structure, Statistics.

Life Cycle Coverage in DDI 3.0 Archiving / (Re)Distributing the data collection… Processing checks Holdings, availability and access conditions

Life Cycle Coverage in DDI 3.0 DDI becomes “visible” to the outside world… DDI Instance: Pulls together all life cycle stages Acquires its own identity as an object Becomes a tool for data discovery and analysis

Life Cycle Coverage in DDI 3.0 Secondary use of data – new conceptual framework… New DDI Instance: New Purpose New Logical Product New Physical Description of Data

DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: Allows capture and preservation of metadata generated by different agents at different points in time. Facilitates tracking changes and updates in both data and documentation.

DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: Enables investigators, data collectors and producers to document their work directly in DDI, thus increasing the metadata’s visibility and usability. Benefits data users, who need information from the full data life cycle for optimal discovery, evaluation, interpretation, and re-use of data resources.

New / Extended Functionalities in DDI 3.0: Questionnaire Versions 1/2: -No instrument coverage. -Question text only as part of variable description. -No documentation for question flow / conditions. Version 3.0: -Full description of instrument as a separate entity. -Documents specific use of questions: flow, conditions, loops. -Compatible with Computer Assisted Interviewing software.

New / Extended Functionalities in DDI 3.0: Complex Data Versions 1/2: -Inadequate representation of complex / hierarchical data Version 3.0: -Detailed documentation for complex / hierarchical data Logical structure of records Record Types and Relationships Relevant variables: key-link, case identification, record type locator Physical layout of records Single “hierarchical” file for all records, multiple rectangular files, relational database, etc.

New / Extended Functionalities in DDI 3.0: Aggregate Data Versions 1/2: -Initially designed for microdata only -Aggregate data section added in V 2.1 to support limited representation (Census-type data, delimited files) Version 3.0: -Adds support for tabular, spreadsheet-type, representation of aggregate data -Aggregate data transport option: cell content may be included inline with the data item description

New / Extended Functionalities in DDI 3.0: Data Transport Versions 1/2: -None Version 3.0: -In-line inclusion enabled for both aggregate data and microdata

New / Extended Functionalities in DDI 3.0: Longitudinal / Time Series / Cross-national Data Comparability Versions 1/2: -None Version 3.0: -Grouping structure documents studies related on one or several dimensions (time, geography, language, etc.) as well as their comparability

New / Extended Functionalities in DDI 3.0: Increased Multilingual Support Versions 1/2: -Limited Version 3.0: -Support for multiple language use and translations Geburtsjahr Year of Birth

DDI 3.0 Specification: Schema-based Versions 1/2: -DTD-based Version 3.0: -Schema-based: Data typing supports machine actionability Use of namespaces supports -Modularity -Extensibility and reuse -Alignment with / use of other standards

DDI 3.0 Specification: Machine-actionable Versions 1/2: -Machine-readable Version 3.0: -Machine-actionable: 1. Data typing: increased use of controlled vocabularies and standard codes 2. Larger set of required elements Predictable content = a more consistent base for programming

DDI 3.0: Modular Structure Version 1/2: -Single file, hierarchical design Version 3.0: -Modular design: - Facilitates reuse - Facilitates versioning and maintenance - Supports life cycle model - Allows flexibility in organizing the DDI Instance - Supports grouping and comparing studies - Supports creation of metadata registries

DDI 3.0: Alignment with other metadata standards Versions 1/2: -MARC, Dublin Core (bibliographic standards) Version 3.0: -MARC, DC, but also… -SDMX (Statistical Data and Metadata Exchange) -ISO (Metadata Registries) -FGDC (Digital Geospatial Metadata) - ISO (Geographic Information Metadata)

DDI 1/2 or DDI 3.0? DDI 3.0 will not supersede DDI 2.1. Both versions will –coexist –continue to be maintained –be used according to specific needs. All DDI 1/2 markup will not have to be migrated to Version 3.0.

DDI 3.0 Structure and Mechanisms

DDI 3.0 – Modular Structure Building blocks of DDI 3.0: » Modules » Schemes

DDI 3.0 – Modular Structure Modules: Document different aspects of a study, or group of studies, following the data through their life cycle (Conceptual Components, Data Collection, Logical Product, Physical Instance, etc.) Schemes: Include collections of sibling “objects” that are traditionally components of a variable description: Concepts, Universes, Questions, Variable Labels and Names, Categories, Codes.

DDI 3.0 – Modular Structure Modules: Can live independently (have their own schemas) or connected to one another within a hierarchical structure. Schemes: Can live semi-independently (need a higher- level wrapper as they do not have their own schemas) or in-line within a Study Unit or Group module.

DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Module level: DDI Instance Resource PackageGroup Study Unit Subgroup Study Unit Conceptual Components Data Collection Archive Organizations Study Unit Subgroup (Sub)group Study Unit

DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Within modules: Data Collection Question SchemeProcessingMethodology SamplingTime Method Question Item Question Item WeightingCoding

DDI 3.0 – Modular Structure Relationships are established through: In-line inclusion ( Relational order is explicit) Referencing Internal External (Relational order is implicit)

DDI 3.0 – Structural mechanisms Enable modular design and help actualize its benefits. Inheritance Referencing Identification

DDI 3.0: Inheritance Inheritance is based on the hierarchical structure of the model. In DDI 3.0 a number of elements are reused at different levels of the hierarchy. When the same element is present at multiple levels, lower levels inherit content from the upper levels, and only need to specify differences (=local overrides).

DDI 3.0 Inheritance Example Instance: Coverage: Spatial: 50 US states -Study Unit A – no Spatial Coverage defined = will be inherited from Instance -Study Unit B – Coverage: Spatial: 48 coterminous states = supersedes definition in Instance

DDI 3.0: Referencing DDI 3.0 modular structure is dependent upon creating relationships by reference. Referencing implies bringing up the content of a DDI object within, or in association with, another object, by specifying its Unique Identifier. Identifiers are the key links between DDI objects.

DDI 3.0: Referencing Example Data Collection Module: Question Scheme: Question: ID: “Q1” Text: “How many days in the past week did you watch the national network news on TV?” Conceptual Components Module: Concept Scheme: Concept: ID: “C1” Description: “Exposure to national TV news” Logical Product Module: Variable Scheme: Variable: ID: “V1” Name: V Label: Days past week watch natl news on TV Question Reference: ID: “Q1” Concept Reference: ID : “C1”

DDI 3.0: Referencing Example

DDI 3.0: Identification Consistency in building and using identifiers is needed for: –Proper functioning of reference systems, enabling a smooth exchange and reuse of existing metadata. –Machine-actionability of DDI instances, allowing them to serve as a basis for running programs and processes.

DDI 3.0: Identification Element types used in the Identification system: Maintainable Versionable Identifiable All elements

DDI 3.0: Identification Element Types Non-identified elements: –Require context, which is provided by containing parents. Example: codes within code schemes –Are not reusable. Example: variable and category statistics

DDI 3.0: Identification Element Types Identifiables –Carry their own ID –May be referenced / reused –Cannot be versioned or maintained, except as part of a complex parent element (Example: Variable – a change implies a new version of the entire scheme).

DDI 3.0: Identification Element Types Versionables –Carry their own ID –Carry their own Version: content changes are important to note (Example: Concept – may be independently versioned within a scheme).

DDI 3.0: Identification Element Types Maintainables –Are higher level DDI objects –Are both identifiable and versionable –Can also be published and maintained as separate entities (Example: all modules, schemes, comparison maps)

DDI 3.0: Identification Structure Maintainable elements: –URN and / or ID + Identifying Agency + Versioning Information: Version Version Date Version Responsibility Version Rationale Versionable elements: –URN and / or ID + Versioning Information Identifiable elements: –URN and / or ID

DDI 3.0: Identification Structure Non-specified Identification information is inherited from the levels above. Example 1: Inheritance is assumed…. Maintainable: Variable Scheme: ID: VarScheme_A Identifying Agency: ICPSR Version: 1.0 Identifiable: Variable: ID: Var_1 [Identifying Agency] [Version]

DDI 3.0: Identification Structure Non-specified Identification information is inherited from the levels above. Example 1: Inheritance is assumed… Maintainable: Variable Scheme: ID: VarScheme_A Identifying Agency: ICPSR Version: 1.0 Identifiable: Variable: ID: V1 [Identifying Agency] [Version] Example 2: Inheritance is applied by default Maintainable: Logical Product ID: LogicalProd_Y Identifying Agency: ICPSR Version: 1.0 Maintainable: Variable Scheme: ID: VarScheme_A Identifying Agency: [ ] Version: [ ]

DDI 3.0: Identification Structure: IDs Uniqueness of Identifiers is necessary for both internal and external referencing: 1) All IDs MUST be unique within a maintainable 2) All maintainables MUST have unique IDs across an Agency

DDI 3.0: Identification Structure: Creating unique Identifiers A DDI Instance may include multiple maintainables at different hierarchical levels: Instance (maintainable) – unique ID within Identifying Agency Study Unit (maintainable) – unique ID within Identifying Agency Logical Product (maintainable) – unique ID within Identifying Agency Variable Scheme (maintainable) – unique ID within Identifying Agency

DDI 3.0: Identification Structure: Creating Unique Identifiers Instance_A (unique at ICPSR) StudyUnit_1 Logical Product_1 VariableScheme_1 Variable_1 Instance_B (unique at ICPSR) StudyUnit_1 Logical Product_1 VariableScheme_1 Variable_1 Post-markup: Variable ID: Instance_AStudyUnit_1LogicalProduct_1VariableScheme_1Variable_1 Instance_BStudyUnit_1LogicalProduct_1VariableScheme_1Variable_1 Markup:

DDI 3.0: Identification Structure: URNs Have a fixed structure and MUST include object ID, Identifying Agency, and Version. For versionable and identifiable elements, the containing maintainable is specified. Take precedence when both a URN and the Identification sequence are used for the same object. May be constructed post-markup from the Identification sequence.

DDI 3.0: Identification: URN Structure Examples: Maintainables: urn:ddi:3.0:StudyUnit:ddialliance.org:StudyUnit_ID:1.0 Versionables: urn:ddi:3.0:ConceptScheme:ddialliance.org:ConceptScheme_ID:1.0: Concept:Concept_ID:2.1 Identifiables: urn:ddi:3.0:VariableScheme:ddialliance.org:VariableScheme_ID:1.0: Variable:Variable_ID Object name Identifying Agency Object ID Object Version

DDI 3.0: Referencing Reference structure: URN, and/or: [Referenced object’s] ID + Identifying Agency + Version + [Containing] Module ID + [Containing] Scheme ID

DDI 3.0: Reuse of Information Referencing Mechanisms for REUSE Inheritance Reuse of Information: 1.Facilitates development of documentation throughout the study life cycle 2.Promotes interoperability and standardization across organizations 3.Saves markup time and effort 4.Reduces the risk of human entry error 5.Provides a basic level of implicit comparability

DDI 3.0 Modules Content, Markup Examples

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroupResource Package Study UnitSubgroupStudy UnitSub(Group) Concepts Data Coll. Logical Pr. etc…

Other “specialized” DDI 3.0 modules Aggregate Data: –NCube Logical Product –Inline NCube Record Layout –NCube Record Layout –Tabular NCube Record Layout Inline Microdata: –Dataset User-specific Markup Templates: –DDI Profile

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

DDI 3.0 Modules used to mark up a simple study

DDI 3.0 modules for documenting a single, survey-type study DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

DDI 3.0 modules for documenting a single, survey-type study [Reusable] [XHTML] Instance –Study Unit Conceptual Component Data Collection Logical product Physical Data Product Physical Instance Archive –Organizations

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

DDI Instance -- wrapper for all modules -- Identification –URN –Identification Sequence –Name Citation … (+ optional DC Elements) Coverage –Topical –Spatial –Temporal Group (module) – repeatable Resource Package (module) - repeatable Study Unit (module) - repeatable Other Material(s) Note(s) Translation Information

Coverage in DDI 3.0 Study: American National Election Study (ANES), 2004 Topical Coverage: –Subject: Historical and Contemporary Electoral Processes –Keyword: Electoral campaigns Political attitudes Political participation Spatial Coverage: –Description: United States –Top level: nation –Lowest level: congressional district Temporal Coverage: –Date: 2004

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Study Unit -- documents a single “study” -- Identification, Other Material(s), Note(s) Citation Abstract Universe Reference Funding Information Purpose Coverage Analysis Unit Embargo Conceptual Component (module) Data Collection (module) Logical Product (module) Physical Data Product (module) Physical Instance (module) Archive (module) –Organizations (module)

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Conceptual Component -- lists concepts and universes -- Identification, Other Material(s), Notes Coverage Concept Scheme… or Reference to External Scheme –Vocabulary – describes vocabulary used –Concept Label Description Similar Concept –Difference –Concept Group Concept Reference (nestable) Universe Scheme … or Reference to External Scheme –Universe Human Readable Machine Readable Subuniverse –Subuniverse

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Data Collection Identification, Other Material(s), Note(s) Coverage Methodology –Time Method –Sampling Collection Event –Data Collector –Data Source –Collection Date (s) –Mode of data collection Question Scheme – lists actual questions Instrument – documents question flow, conditions Processing Event –Control and cleaning operations –Weighting –Data Appraisal Information –Coding

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Logical Product -- documents intellectual content of data -- Identification, Other Material(s), Note(s) Coverage Category Scheme … or Reference to external category scheme –Category Label Derivation (if applicable) Definition Code Scheme … or Reference to external code scheme –Category Scheme Reference –Hierarchy Type –Level (in the hierarchy) –Code Category Reference Value Code (nestable) Variable Scheme … or Reference to external variable scheme

Logical Product Variable Scheme: Variable Variable … or Reference to an externally documented variable –Identification Name –Label –Definition –Universe Reference –Concept Reference –Question Reference –Embargo Reference –Response Unit –Analysis Unit –Representation Imputation Derivation Coding Instructions Value Representation: »Text »Date / Time »Numeric »Code

Logical Product Variable Scheme: Variable Group Variable Group: –Type –Label –Definition –Universe Reference –Concept Reference –Variable Reference (lists variables in the group) –Variable Group Reference (allows nesting of groups) Variable Group Reference (use for externally documented Variable Group)

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Physical Data Product -- Describes Physical Layout of Data -- Identification, Other Material(s), Note(s) Logical Product Reference Gross Record Structure: –Records Per Case –Variable Quantity –Logical Record Reference –Physical Record Reference Related Logical Records Record Layout: –Data Item –Variable Reference –Physical Location –Value Location »StartPosition »Width Dataset (module)

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Physical Instance -- Documents a specific data file --- Identification, Other Material(s), Note(s) Citation Coverage Physical Data Product Reference Data File Identification –Location –URI Gross File Structure –Creation Software –Case Quantity –Overall Record Count Statistics –Logical Product Reference –Variable Statistics Variable Reference Total Responses Summary Statistics Category Statistics »Value »Statistic

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Archive Identification, Other Material(s), Note(s) Archive Specific –Item Location Call Number URI Format Media Availability Status –Access Confidentiality Statement Access Permission Restrictions Citation Requirement Deposit Requirement Access Conditions Disclaimer Contact –Funding Information Life Cycle Information –Event Type Date Agency Description Organizations (module)

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Organizations Identification Organization –URL –Individual Individual –Organization –Title –Language Role –Entity Reference –Organization Reference –Individual Reference –Description –Period Relation –Organization Reference –Individual Reference –Description –Period Name Description Location Telephone Relation

DDI 3.0 Markup Example A Survey Variable

Version 2.1 vs. Version 3.0 Example: A survey variable ASCII codebook:

Version 2.1 vs. Version 3.0 Example: A survey variable in Version 2.1 Data Description: Variable

Version 2.1 vs. Version 3.0 Example: A survey variable in Version 2.1 name=“V043015”

Version 2.1 vs. Version 3.0 Example: A survey variable in Version 3.0 Logical Product: Variable Scheme Data Collection: Question Scheme Logical Product: Code Scheme Logical Product: Category Scheme Conceptual Component: Concept Scheme Universe Scheme Physical Instance: Statistics

Version 2.1 vs. Version 3.0 Example: A survey variable in Version 3.0 Logical Product Variable Scheme: ID Variable: ID Data Collection: Question Scheme: ID Question: ID Logical Product: Code Scheme: ID Code Logical Product: Category Scheme: ID Category: ID Physical Instance: Statistics: Variable Statistic Category Statistics Conceptual Component Concept Scheme: Concept: ID Universe Scheme: (Sub)Universe: ID

DDI 3.0 Markup: A Survey Variable Concept Concept: Attention to Presidential Campaign on National TV Conceptual Component: Concept Scheme: Concept

DDI 3.0 Markup: A Survey Variable Concept

DDI 3.0 Markup: A Survey Variable Universe Conceptual Component: Universe Scheme: (Sub)Universe (A7:How many days in the PAST WEEK did you watch the NATIONAL network news on TV? 0-7; 8=DK; 9=RF)

DDI 3.0 Markup: A Survey Variable Universe

DDI 3.0 Markup: A Survey Variable Question ID, Question Text Data Collection: Question Scheme: Question Item

DDI 3.0 Markup: A Survey Variable Question ID, Question Text Other Response Domains:

DDI 3.0 Markup: A Survey Variable Variable name, label, type of physical representation Logical Product: Variable Scheme: Variable

DDI 3.0 Markup: A Survey Variable Variable name, label, type of physical representation Other types of Representation:

DDI 3.0 Markup: A Survey Variable Category labels, missing data information Logical Product: Category Scheme: Category

DDI 3.0 Markup: A Survey Variable Category labels, missing data information missing=“true”

DDI 3.0 Markup: A Survey Variable Category Values Logical Product: Code Scheme: Code

DDI 3.0 Markup: A Survey Variable Category Values

DDI 3.0 Markup: A Survey Variable Statistics Physical Instance: Statistics Variable Statistics: Category Statistic

DDI 3.0 Markup: A Survey Variable Statistics

DDI 3.0 Markup: A Survey Variable Logical Product Module

DDI 3.0 Markup Modules used in a full variable description Concept Universe Question Values Value Labels Variable name Variable label Statistics Location: Physical Data Product

DDI 3.0 Modular Approach Advantages Modules and schemes can be independently maintained. Pieces of information can be reused without being repeated.

DDI 3.0 Modular Approach: Reusing information

Variable Markup in Version 2 -- carries redundant information--

Variable Markup in Version 3.0 Modular Approach: Reusing Information

DDI 3.0 Grouping

DDI 3.0: Groups Entirely new feature in DDI 3.0. Designed to document and compare related studies.

DDI 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Group -- documents “families” of studies -- Identification, Other Material(s), Note(s) Citation Abstract Universe Funding Information Purpose Coverage Universe Reference Conceptual Component (module) Data Collection (module) Logical Product (module) Archive (module) –Organizations (module) Study Unit (module) Group (module) Comparative (module)

DDI 3.0 Grouping Attributes Set of mandatory attributes indicate the nature of the relationships among group members Group parameters: –Time –Instrument –Panel (population of respondents) –Geography –Datasets –Language

DDI 3.0 Grouping Attributes Example

DDI 3.0: Types of Groups Groups of studies may be: –Formal (“by design”): Designed to be compared (longitudinal, time-series, or cross-national studies) Documented and compared through use of Inheritance –Informal (“ad-hoc”): Decision to group and compare is taken post- production, or “after the fact”. Comparability documented in the Comparative module

Formal Groups: Inheritance Example 1: Time-series: Same questions repeated over time, same resulting variables. Group (Studies A-C) Temporal Coverage_G1: Data Collection: Question Scheme Logical Product: Variable Scheme Study A Temporal Coverage: 1991 (Replace Ref:G_1) Physical Data Product Physical Instance: Statistics Study B Temporal Coverage: 1992 (Replace Ref:G_1) Physical Data Product Physical Instance: Statistics Study C Temporal Coverage: 1993 (Replace Ref:G_1) Physical Data Product Physical Instance Study A Temporal Coverage: 1991 (Replace Ref:G_1) …… Physical Data Product Physical Instance Study B Temporal Coverage: 1992 (Replace Ref:G_1) …… Physical Data Product Physical Instance

Formal Groups: Inheritance Attributes “Add”, “Replace”, “Delete”. In a complex grouping structure inheritance paths may become quite intricate. ID attributes ADD, REPLACE and DELETE are introduced to resolve potential inheritance ambiguities: –ADD = [empty] -> flags element as a new addition. –REPLACE = “ReferenceType” -> referenced element is being replaced at the lower level (“local override”). –DELETE = “ReferenceType” -> referenced element is being deleted at the lower level.

Formal Groups: Inheritance Example 2: Time-series: Same core questions repeated over time, different topical modules added to each iteration. Group (Studies A-C) Data Collection: Core Questions(Q1-Q50) Logical Product: Core Variables (V1-V50) Study A Topical Module “Health Status” Data Collection: ADD : Questions (Q51A-Q80A) Logical Product: ADD: Variables (V51A-V80A) Study B Topical Module “Gun Control” Data Collection: ADD: Questions (Q51B-Q80B) Logical Product: ADD: Variables (V51B-V80B) etc…

Formal Groups: Inheritance Example 3: Any group by design: some questions are not asked in some iterations. Group (Studies A-E) Data Collection: All Questions (Q1-Q100) Logical Product: All Variables (V1-V100) Study A Study B Data Collection: DELETE: Question Q55 Logical Product: DELETE: Variable V55 Group (Studies C-E) Data Collection: DELETE: Questions Q60-Q69 Logical Product: DELETE: Variables V60-V69 Study CStudy DStudy E

Formal Groups: Inheritance Example 4 (SOEP, Germany): Longitudinal: Same variables, with different name each year. (No name) ADD: Name only

Formal Groups: Inheritance Example 5 (SOEP, Germany): Longitudinal: In 2002 variable “Income” changes currency from DM to Euro: change in question wording. (No question ) ADD: question only

Formal Groups: Inheritance Example 5 (SOEP, Germany) continued: These variables also change names every year…

Formal Groups: Inheritance Example 5 (SOEP, Germany) – the final picture : information is inherited down the hierarchy.

Inheritance in Formal Groups Simplification of DDI Instances: common metadata is only entered once. More efficient means of documentation: for new additions, only differences need to be specified. Relational information embedded in the inheritance structure: comparison becomes machine-actionable.

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

Comparative -- documents comparability in ad-hoc groups -- Identification, Note(s) Comparison Description (human-readable) Concept Map –Source Scheme Reference –Target Scheme Reference –Item Map Source Item Target Item Map Type Difference Variable Map Question Map Category Map Code Map Universe Map

DDI 3.0 Using the Comparative Module Instructions on how to use the Comparative Module and build comparison maps: “DDI 3.0 User Guide”, pp

Producing DDI 3.0 markup Getting started

DDI 3.0: Tools projects DDI Toolkit: Core library for developing open source tools Version 1/2 Version 3.0 converters DDI 3.0 URN resolution tool DDI 3.0 validation tool Version 3.0 stylesheets with display and editing layers Grouping tool Concept management tool Registry applications

Producing DDI 3.0 markup -- Getting started -- Software to assist in document creation: DeXtris: –XML browser –Converts DDI 1/2 to DDI 3.0

DDI 3.0 Tools: Using Dextris

Producing DDI 3.0 markup -- Getting started -- Software to assist in document creation: SPSS system to DDI 3.0 converter: (See description and link on DDI 3.0 Proof of Concept page)

Producing DDI 3.0 markup -- Getting started -- XML editors oXygen: Create new DDI instance Edit/update DDI instance Validate DDI instance View schemas

DDI 3.0: Viewing Schemas in oXygen

Producing DDI 3.0 markup -- Getting started -- Other tools to assist in producing DDI 3.0 markup: DDI “core” template Version 3.0 documentation: –Module descriptions –Field level documentation –DDI Help Center

Producing DDI 3.0 markup -- Using multiple modules -- Resource: “Getting Started with DDI 3.0”

DDI Version 3.0 Displaying Markup Stylesheets: Basic: Web presentation in XHTML Enhanced: Adds graphics for presenting frequencies Automated calculation of valid percentages

DDI Version 3.0 Questions? Comments? Sanda Ionescu: DDI Users Listserv:

The End