Name and organization Have you worked with DDI before? (2 or 3) If not, are you familiar with XML? What kind of CAI systems do you use? Goals for today
Introduction DDI 3 Background XML Background How DDI 3 documents survey instruments Creating DDI3 and Documentation Manual Markup Using functionality from CAI systems Custom development Colectica Discussion Questions and discussion Additional documentation activities
Data Documentation Initiative DDI3 Background
Background Concept of DDI and definition of needs grew out of the data archival community Established in 1995 as a grant funded project initiated and organized by ICPSR Members: –Social Science Data Archives (US, Canada, Europe) –Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance –Membership based alliance –Formalized development procedures Copyright © 2008 GESIS
Origins of the DDI Alliance Versions 1.* and 2.* were developed by an informal network of individuals from the social science community and official statistics –Funding was through grants It was decided that a more formal organization would help to drive the development of the standard forward –Many new features were requested –The DDI Alliance was born to facilitate the development in a consistent and on-going fashion Copyright © 2008 GESIS
Requirements for 3.0 Improve and expand the machine-actionable aspects of the DDI to support programming and software systems Support CAI instruments through expanded description of the questionnaire (content and question flow) Support the description of data series (longitudinal surveys, panel studies, recurring waves, etc.) Support comparison, in particular comparison by design but also comparison-after-the fact (harmonization) Improve support for describing complex data files (record and file linkages) Provide improved support for geographic content to facilitate linking to geographic files (shape files, boundary files, etc.) Copyright © 2008 GESIS
DDI 3.0 and the Data Life Cycle A survey is not a static process: It dynamically evolved across time and involves many agencies/individuals DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle” 3.0 focus on metadata reuse (minimizes redundancies/discrepancies, support comparison) Also supports multilingual, grouping, geography, and others 3.0 is extensible Copyright © 2008 GESIS
Development of DDI – Acceptance of a new DDI paradigm –Lifecycle model –Shift from the codebook centric / variable centric model to capturing the lifecycle of data –Agreement on expanded areas of coverage 2005 –Presentation of schema structure –Focus on points of metadata creation and reuse 2006 – Presentation of first complete 3.0 model – Internal and public review 2007 – Vote to move to Candidate Version – Establishment of a set of use cases to test application and implementation 2008 – April: DDI 3.0 published Copyright © 2008 GESIS
XML: Extensible Markup Language Designed to transport and store data
XML Schemas, DDI Modules, and DDI Schemes Copyright © 2008 GESIS Instance Study Unit Physical Instance DDI Profile Comparative Data Collection Logical Product Physical Data Structure Archive Conceptual Component Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset
XML Schemas, DDI Modules, and DDI Schemes Copyright © 2008 GESIS Instance Study Unit Physical Instance DDI Profile Comparative Data Collection Logical Product Physical Data Structure Archive Conceptual Component Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset
XML Schemas, DDI Modules, and DDI Schemes Copyright © 2008 GESIS Instance Study Unit Physical Instance DDI Profile Comparative Data Collection Question Scheme Control Construct Scheme Interviewer Instruction Scheme Logical Product Category Scheme Code Scheme Variable Scheme NCube Scheme Physical Data Structure Physical Structure Scheme Record Layout Scheme Archive Organization Scheme Conceptual Component Concept Scheme Universe Scheme Geographic Structure Scheme Geographic Location Scheme Reusable Ncube Inline ncube Tabular ncube Proprietary Dataset
Maintainable Schemes Category Scheme Code Scheme Concept Scheme Control Construct Scheme Geographic Structure Scheme Geographic Location Scheme Interviewer Instruction Scheme Question Scheme NCube Scheme Organization Scheme Physical Structure Scheme Record Layout Scheme Universe Scheme Variable Scheme Packages of reusable metadata maintained by a single agency Copyright © 2008 GESIS
Designed to Support Registries A “Registry” is a catalog of metadata resources Resource package –Structure to publish non-study-specific materials for reuse Extracting specified types of information in to schemes –Universe, Concept, Category, Code, Question, Instrument, Variable, etc. Allowing for either internal or external references –Can include other schemes by reference and select only desired items Providing Comparison Mapping –Target can be external harmonized structure Copyright © 2008 GESIS
Data Collection Methodology Question Scheme –Question –Response domain Instrument –using Control Construct Scheme Coding Instructions –question to raw data –raw data to public file Interviewer Instructions Question and Response Domain designed to support question banks – Question Scheme is a maintainable object Organization and flow of questions into Instrument – Used to drive systems like CASES and Blaise Coding Instructions – Reuse by Questions, Variables, and comparison Copyright © 2008 GESIS
QuestionItem in DDI
QuestionItem
Opening tag & identification QuestionText NumericDomain
In a QuestionScheme
ControlConstructScheme with QuestionConstructs
An Instrument
Those all go in a DataCollection element
The DataCollection element goes in a StudyUnit, which goes in a DDIInstance or ResourcePackage
Create QuestionScheme and QuestionItems
Create ControlConstructScheme Add QuestionReferences
Add control flow items to ControlConstructScheme Include a main Sequence element
Create the Instrument Element Add the main ControlConstructReference
Create the DDIInstance element Create the StudyUnit element Create the DataCollection element Add the QuestionScheme, ControlConstructScheme, and Instrument to the DataCollection element
Check the XML document against the DDI schemas to see if we got it right.
We have DDI, now we need documentation
Custom DevelopmentMQDSColectica
Michigan Questionnaire Documentation System (MQDS) Sue Ellen Hansen Nicole Kirgis
What Does MQDS Do? Facilitates automated documentation and harmonization of Blaise survey instruments and datasets – Extracts survey question metadata – Standardized format
Survey Question Metadata Question universe Variable name and label Question text Question variable text (fills) Data type Code values and code text Skip instructions etc.
MQDS Version 1 Extracted metadata from Blaise data model as XML tagged data Provided user interface for selection of – Blaise files – Instrument questions and sections – Types of metadata to extract – Languages to display – Style sheet for generation of instrument documentation or codebook
Using MQDS V1 XML: Codebook in Five Languages National Latino and Asian American Study
MQDS Version 1 Limitations – XML not DDI-compliant DDI Version 2 did not have XML tags for all metadata provided by Blaise Did not provide easy means of adding XML tags without becoming noncompliant – XML files for complex surveys can be very large (text files) Entire files had to be processed in computer memory Limited ability to fully automate documentation
DDI Version 3 Released April 2008 Focus on complete data lifecycle –going beyond the codebook
DDI Version 3 Included extensions proposed by DDI working group on instrument design Persistent Content of QuestionUse of Question in Instrument Question text Static Dynamic or variable Order and routing Sequence / skip patterns Loops Multiple-part questionUniverse Response domain Open Set categories Special types (date, time, etc.) Analysis unit Definitional textInstructions
MQDS Version 3 Joint SRC and ICPSR venture Goals: – Address version 2 limitations Process Blaise instrument of any size – Exploit new elements and validate to the recently released DDI version 3 standard – Move from processing XML metadata in memory to streaming metadata to a relational database
MQDS Version 3 Relational Database: Import, Export, Transform 3. Transform 1. Import 2. Export XML (DDI 3) User specifies output files (location, Language/locale, XML output options, etc.) Codebook Questionnaire User specifies stylesheet selection criteria, type of output desired (html, rtf, pdf), etc. User specifies input files (location, file type, etc.) Blaise Datamodel (BMI) Blaise Database (BDB) Other File Types (e.g. SAS, SPSS, etc) Relational Db Relational Db SQL Server / SQL Server Express Database connection settings DDI 3 elements not in *.bmi
MQDS Version 3 Relational database – DDI compliant standardized tables – Flexibility for SRC and ICPSR to add extensions that meet their specific organizational needs – Allows Automated documentation of any Blaise survey instrument Importing and documenting data produced by other software Lower cost development of other tools that facilitate editing and disseminating data
MQDS V3 Prototype: Exporting Language XML
MQDS Development Expect to release Summer 2009 Working out a distribution plan for Blaise users