Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow, GESIS Wolfgang Zenk-Möltgen, GESIS
Research Data Life Cycle CollectionConceptProcessingDistributionDiscoveryAnalysis Archiving Repurposing
Current Uses of DDI DDI 2 used for many different purposes by many different archival institutions, e.g., metadata records for data catalogs, export to Web-based information systems such as Nesstar, long-term preservation, and PDF codebooks GESIS and ICPSR are developing procedures and systems to extend use of DDI in their institutions
DDI 3 Expands in Scope To date use mainly limited to Distribution and Archiving stages of data life cycle DDI 3 enables use of new elements and structures to extend markup to other stages of the life cycle - both earlier and later Emphasis is on projects and tasks already in process at each institution
DDI 3 Use at GESIS Structured Comments – Processing Translation of EVS Questionnaire – Collection Supporting Enhanced Publications – Analysis Continuity Guides: Trends by Concepts – Concept, Discovery, Repurposing
Extracting structured information in current workflow Example: building derived variables by SPSS SPSS setups contain commands and comments Necessary steps for using SPSS setups as information source for DDI –Improving comments for automated extraction formalize layout add keywords from a list –Extraction of structured comments and related commands by custom tool. –Transformation of this information into DDI 3 fragments
***v* Variables/DerivedVariables * DESCRIPTION * This section is on derived variables; ***. ***v* DerivedVariables/w101_new * NAME * w101_new * DESCRIPTION * w101_new is a derived variable from w101; * It has the original value from w101 * when w102 is equal 1 * otherwise it has the value 5; * USED VARIABLES * w101, w102 * SOURCE **. compute w101_new = 5. if ( w102 = 1 ) w101_new = w101. ** * VERSION * * AUTHOR * Achim Wackerow * * ***. SPSS Result Extractor Report (HTML) DDI 3 fragments GenerationInstruction Description Command Extracting structured information in current workflow
Translation of EVS Questionnaire DSDM
Publications with References to Data: DDI 3.1 URN contains: Agency Object Version URL of Documentatio n and/or Data URL of Documentatio n and/or Data DDI Alliance find agency gesis.de.ddi return resolver address find object return URL request document return document Publication with References (URNs) Supporting Enhanced Publications
DSDM DDI 3 EPE Simple Export Wizard 1.2.0
Grouping Trends Continuity guides in different contexts –Synoptical question / variable lists –Documentation of changes in question wording / answer scales Systematic organization by conceptual categories –CodebookExlorer tool (relational DB) –Publication as html links on variable level in ZACAT Taking advantage of DDI3 in the future –Defining the standard and comparison –Qualifying relations (e.g. q-text modified, scale modified,…)
Continuity guides Literal question text over time Conceptual categories Deviations in answer categories
Trends by concepts Conceptual categories Trend variables by study Country 1 Country 2
STUDY UNIT 1 … n DataCollection … Have you …? … LogicalProduct often … … Cat1 4 … GROUP STUDY UNIT 8-14 DataCollection … LogicalProduct … Comparison map Equivalency Relationship Description DDI3 RESOURCE „Ex-post Standard“ Universe Concept Data Collection Do you …? … CODS1 Logical Product often … CATS1 Cat1 1 … Questiontext <>modified<> Values <>different>> <>generation instruction<> <>scale reversed<> Label <>identical<> GROUP STUDY UNIT 15-x DataCollection … LogicalProduct …
DDI 3 Use at ICPSR Information collected from data producers in pre- collection phase – Concept Metadata output from CAI applications – Data Collection Processor‘s dashboard – Data Processing Metadata mining: New faceted search tool to facilitate discovery through more precise searching – Data Discovery Relational database for comparison and harmonization across studies – Repurposing
SMDS Metadata Modules
DDI as backbone for structured metadata CollectionConceptProcessing DistributionDiscoveryAnalysis Repurposing SIP AIP DIP CAI Tools MQDS etc. Information extracted from SPSS etc. O A I S Archive Custom Tools (e.g. Forms-based) Statistical packages Online Analysis. Search engines. Distribution Packages Web information system A combination of this information forms a traditional SIP. Information from each life cycle stage - sent to the archive - can be understood as dynamic SIP. Self-archiving by web forms can be offered for the different stages. The structured metadata combined with data forms the core of the archive. It would be organised in a way where metadata can be reused and information can be ingested and distributed in a dynamic way. Data / Documents outside of DDI An AIP must be specially built, because the metadata can include just references to other reused metadata. An AIP should include everything of one study, DDI can be also the main structure of the AIP. Data can be inline in DDI. An AIP would exist beside the core structure in the archive. An easy roundtrip should be possible between the core structure and the AIP. The purpose of the AIP is comparable to PDF/A where all fonts are included. The core structure is headed to efficient processing and reuse of metadata.
DDI-based archive as collection of reusable components Metadata in DDI is structured in small items which can be identified and maintained by one or more institutions These parts can be –the basis for comparison and metadata mining (discovery of new relationships) –a candidate for reuse in other studies or new studies (like standard questions or variables) Study 1 Study-specific information Items for reuse Study 1 Study-specific information Items for reuse New study Repository of reusable components Standard concepts Standard questions Standard variables Harmonized information Controlled vocabularies
Issues for Discussion Advantages and disadvantages of seeking to capture additional metadata throughout the data life cycle How much information to make available to funding agencies, data producers, and secondary users? Rules for structured documentation and delivery of items to archives for preservation An overall DDI tool to capture and curate all metadata and data – the Holy Grail???