DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas.

Slides:



Advertisements
Similar presentations
Workshop on Metadata Standards and Best Practices November th, 2007 Session 4 The Data Documentation Initiative Technical Overview Pascal Heus Open.
Advertisements

Status on the Mapping of Metadata Standards
Foundational Objects. Areas of coverage Technical objects Foundational objects Lessons learned from review of Use Case content Simple Study Simple Questionnaire.
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
Data and Metadata Management: A Business Perspective Wendy Thomas – Minnesota Population Center Marcel Hebing – German Socio-Economic Panel Study (SOEP),
Wendy Thomas Minnesota Population Center NADDI 2014.
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
[Meta]-Data Management using DDI
Reusable!? Or why DDI 3.0 contains a recycling bin.
1 ISO – Metadata Next Generation International consensus being built on structured metadata within a broader Geomatics Standard under ISO Technical.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Präsentationstitel IAB-ITM Find the right tags in DDI IASSIST 2009, 27th-30th Mai 2009 IAB-ITM Finding the Right Tags in DDI 3.0: A Beginner's Experience.
Codebook Centric to Life-Cycle Centric In the beginning….
Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
 Name and organization  Have you worked with DDI before? (2 or 3)  If not, are you familiar with XML?  What kind of CAI systems do you use?  Goals.
Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.
Course on DDI 3: Putting DDI to Work for You 8 December 2010 Wendy Thomas, Minnesota Population Center 2 nd Annual European DDI Users Group Meeting Learning.
ISO Standards: Status, Tools, Implementations, and Training Standards/David Danko.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
ISO as the metadata standard for Statistics South Africa
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
Overview of DDI Arofan Gregory METIS October 5-7, 2011.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Just Enough DDI 3 for the “DDI: Managing Metadata for Longitudinal Data — Best Practices”
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
North American Profile: Partnership across borders. Sharon Shin, Metadata Coordinator, Federal Geographic Data Committee Raphael Sussman; Manager, Lands.
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences DC Thomas Bosch GESIS – Leibniz.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
Metadata Management and Tools August 1, 2013 Data Curation Course.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Secure Epidemiology Research Platform (SERPent) Kick Start Meeting - April 15 th, 2010 Pascal Heus
DDI and the Lifecycle of Longitudinal Surveys Larry Hoyle, IPSR, Univ. of Kansas Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences.
Survey Data Management and the Combined Use of DDI and SDMX Arofan Gregory Chris Nelson Metadata Technology Eurostat, June
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
General concepts: DDI Irena Vipavc Brvar, ADP SEEDS Kick-off meeting, Lausanne, May 2015.
Presented By Margaret Hellen Atiro Uganda Bureau of Statistics at the United Nations Regional Seminar on Census Data Archiving 20 – 23 Sep 2011, Addis.
TIC Updates EDDI 2010 Wendy Thomas – 6 Dec Schedule and Process Changes Production schedule is moving to: – Summer / Winter release schedule January.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
>> Metadata What is it, and what could it be? EU Twinning Project Activity E.2 26 May 2013.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Metadata models to support the statistical cycle: IMDB
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
DDI and GSIM – Impacts, Context, and Future Possibilities
Data Management: Documentation & Metadata
Data and Metadata Management: A Business Perspective
Wendy Thomas NADDI 2012 University of Kansas
Enhancing ICPSR metadata with DDI-Lifecycle
Updates on the XSLT stylesheets for DDI
in the data production process
Arofan Gregory METIS October 5-7, 2011
Proposal of a Geographic Metadata Profile for WISE
The role of metadata in census data dissemination
DDI and GSIM – Impacts, Context, and Future Possibilities
Presentation transcript:

DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas

License Details on next slide. S012

License (cont.) On-line available at: This is a human-readable summary of the Legal Code at: S013

Credits Some of these slides were developed for DDI workshops at IASSIST conferences and at GESIS training in Dagstuhl/Germany Major contributors – Wendy Thomas, Minnesota Population Center – Arofan Gregory, Open Data Foundation Further contributors – Joachim Wackerow, GESIS – Leibniz Institute for the Social Sciences – Pascal Heus, Open Data Foundation S014

Outline Introductions History of DDI in terms of development lines DDI Codebook (DDI-C) – Capturing metadata within DDI-C (NESSTAR) – Structural features of DDI-C and implications for use DDI Lifecycle (DDI-L) – Capturing metadata within DDI-L (DdiEditor) – Structural features of DDI-L and implications for use DDI resources

Introductions Who are you? What does your organization do? – Data collection – Data production – Training – User access – Preservation What is the scale of your operations? W. Thomas - EDDI2011

HISTORY OF DDI IN TERMS OF DEVELOPMENT LINES

Background Concept of DDI and definition of needs grew out of the data archival community Established in 1995 as a grant funded project initiated and organized by ICPSR Members: – Social Science Data Archives (US, Canada, Europe) – Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance – Membership based alliance – Formalized development procedures S028

Early DDI: Characteristics of DDI 1/2 Focuses on the static object of a codebook Designed for limited uses – End user data discovery via the variable or high level study identification (bibliographic) – Only heavily structured content relates to information used to drive statistical analysis Coverage is focused on single study, single data file, simple survey and aggregate data files Variable contains majority of information (question, categories, data typing, physical storage information, statistics) S029

Limitations of these Characteristics Treated as an “add on” to the data collection process Focus is on the data end product and end users (static) Limited tools for creation or exploitation The Variable must exist before metadata can be created Producers hesitant to take up DDI creation because it is a cost and does not support their development or collection process S0210

DDI Development Version Version Version Version Version Version Version 3.2 [2013] Version Version [Version 4 20??] Version CodebookLifecycle [Version ??] [Version ??]

Desired Areas of Coverage Simple survey Aggregate data Complex data file structures Series (linkages between studies) Programming and software support Support for GIS users Support for CAI systems Support comparability (by-design, harmonization) Codebook Version 1 Simple survey Option for some programming and software support Version 2 Aggregate data Some support for GIS users Lifecycle Version 3 Simple survey Aggregate data Programming and software support GIS support CAI support Complex data files Series Comparability

Technical difference between Codebook and Lifecycle structures Codebook – Codebook based – Format originally XML DTD (2.5 XML Schema) – After-the-fact – Static – Metadata replicated – Simple study – Limited physical storage options Lifecycle – Lifecycle based – Format XML Schema – Point of occurrence – Dynamic – Metadata reused – Simple study, series, grouping, inter-study comparison – Unlimited physical storage options S05-3 Learning DDI: Pack S05 Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

DDI CODEBOOK

2.5 Feature Enhancement Added sections covering – Study authorization – Study budget – Ex-post study evaluation – Collector training – Instrument development Expanded detail – Sample procedure to include sample frame and target sample – Response rate – Typing of data appraisal – Detail for data processing and coding instruction – Allows for citation and persistent identifier for individual data files

Compatibility Enhancements Ability to capture and retain DDI-Lifecycle identification information (agency, id format, version information) Designation of a single note as a master and capture all the related objects it should be attached to Use of XHTML for structured content Ability to capture DDI-Lifecycle representation or response domain type explicitly Capture full range of ISO date types Added new sections to DDI-Lifecycle (v.3.2) to reflect added information fields in DDI-Codebook (v.2.5)

Conventionalize the use of Controlled Vocabularies Declare the use of a specific controlled vocabulary Define the object (element or attribute) that uses the controlled vocabulary Define the valid controlled vocabulary value of a non-valid legacy entry

DDI-Codebook Applications Simple survey capture High level study description with variable information for stand alone studies Descriptions of basic nCubes (aggregate / statistical tables) Replicating the contents of a codebook including the data dictionary Collection management beyond bibliographic records S05-4 Learning DDI: Pack S05 Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Continued use of DDI-Codebook Current users New users whose needs are met by DDI- Codebook Software availability – NESSTAR – IHSN Microdata Toolkit – NADA Catalog – World Bank Open Data program (Microdata Catalog)

S03 20 DDI-Codebook Document Description Citation of the codebook document Guide to the codebook Document status Source for the document Study Description Citation for the study Study Information Methodology Data Accessibility Other Study Material File Description File Text (record and relationship information) Location Map (required for nCubes optional for microdata) Data Description Variable Group and nCube Group Variable (variable specification, physical location, question, & statistics) nCube Other Material

DDI Codebook Looking into NESSTAR

DDI 3 IN 60 SECONDS Introduction to DDI

Study Concepts Concepts measures Survey Instruments using Questions made up of Universes about Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Questions Responses collect resulting in Data Files Variables made up of Categories/ Codes, Numbers with values of Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Study Concepts Concepts measures Survey Instruments using Questions made up of Universes about Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Questions Responses collect resulting in Data Files Variables made up of Categories/ Codes, Numbers with values of Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

DDI LIFECYCLE

Change DDI-L is a major change from DDI-C in terms of content and structure. Lets step back and look at: – Basic differences between DDI-C and DDI-L – Applications for DDI-C and DDI-L – Differences that allow DDI-L to do more – How these differences provide support for better management of information, data, and metadata S0528

DDI-L Applications Describing a series of studies such as a longitudinal survey or cross-cultural survey Capturing comparative information between studies Sharing and reusing metadata outside the context of a specific study Capturing data in the XML Capturing process steps from conception of study through data capture to data dissemination and use Capturing lifecycle information as it occurs, and in a way that can inform and drive production Management of data and metadata within an organization for internal use or external access S0529

Why can DDI-L do more? It is machine-actionable – not just documentary It’s more complex with a tighter structure It manages metadata objects through a structured identification and reference system that allows sharing between organizations It has greater support for related standards Reuse of metadata within the lifecycle of a study and between studies S0530

Reuse Across the Lifecycle This basic metadata is reused across the lifecycle – Responses may use the same categories and codes which the variables use – Multiple waves of a study may re-use concepts, questions, responses, variables, categories, codes, survey instruments, etc. from earlier waves S0531

Reuse by Reference When a piece of metadata is re-used, a reference can be made to the original In order to reference the original, you must be able to identify it You also must be able to publish it, so it is visible (and can be referenced) – It is published to the user community – those users who are allowed access S0532

Change over Time Metadata items change over time, as they move through the data lifecycle – This is especially true of longitudinal/repeat cross- sectional studies This produces different versions of the metadata The metadata versions have to be maintained as they change over time – If you reference an item, it should not change: you reference a specific version of the metadata item S0533

DDI Support for Metadata Reuse DDI allows for metadata items to be identifiable – They have unique IDs – They can be re-used by referencing those IDs DDI allows for metadata items to be published – The items are published in resource packages Metadata items are maintainable – They live in “schemes” (lists of items of a single type) or in “modules” (metadata for a specific purpose or stage of the lifecycle) – All maintainable metadata has a known owner or agency Maintainable metadata may be versionable – Versions reflect changes over time – The versionable metadata has a version number S0534

Study A uses Variable ID=“X” Resource Package published in Study B re-uses by reference Ref= “Variable X” uses Variable ID=“X” uses Study B Ref= “Variable X” S0535

Management of Information, Data, and Metadata An organization can manage its organizational information, metadata, and data within repositories using DDI-L to transfer information into and out of the system to support: – Controlled development and use of concepts, questions, variables, and other core metadata – Development of data collection and capture processes – Support quality control operations – Develop data access and analysis systems S0536

Variable ID=“X” Version=“1.0” Variable ID=“X” Version=“1.1” Variable ID=“X” Version=“2.0” changes over time Variable Scheme ID=“123” Agency=“GESIS” contained in S0537

Looking inside the DdiEditor

Upstream Metadata Capture Because there is support throughout the lifecycle, you can capture the metadata as it occurs It is re-useable throughout the lifecycle – It is versionable as it is modified across the lifecycle It supports production at each stage of the lifecycle – It moves into and out of the software tools used at each stage S0539

Metadata Driven Data Capture Questions can be organized into survey instruments documenting flow logic and dynamic wording – This metadata can be used to create control programs for Blaise, CASES, CSPro and other CAI systems Generation Instructions can drive data capture from registry sources and/or inform data processing post capture S0540

Reuse of Metadata You can reuse many types of metadata, benefitting from the work of others – Concepts – Variables – Categories and codes – Geography – Questions Promotes interoperability and standardization across organizations Can capture (and re-use) common cross-walks S0541

Virtual Data When researchers use data, they often combine variables from several sources – This can be viewed as a “virtual” data set – The re-coding and processing can be captured as useful metadata – The researcher’s data set can be re-created from this metadata – Comparability of data from several sources can be expressed S0542

Mining the Archive With metadata about relationships and structural similarities – You can automatically identify potentially comparable data sets – You can navigate the archive’s contents at a high level – You have much better detail at a low level across divergent data sets S0543

Citation / Series Statement Abstract / Purpose Coverage / Universe / Analysis Unit / Kind of Data Other Material / Notes Funding Information / Embargo Conceptual Components Data Collection Logical Product Physical Data Product Physical Instance ArchiveDDI Profile Study Unit S0444

Group Conceptual Components Data Collection Logical Product Physical Data Product Sub Group Archive DDI Profile Citation / Series Statement Abstract / Purpose Coverage / Universe Other Material / Notes Funding Information / Embargo Study UnitComparison S0445

DDI 3 Lifecycle Model and Related Modules Study Unit Data Collection Logical Product Physical Data Product Physical Instance Archive Groups and Resource Packages are a means of publishing any portion or combination of sections of the life cycle Local Holding Package S0446

Study Unit – Identification – Coverage Topical Temporal Spatial – Conceptual Components Universe Concept Representation (optional replication) – Purpose, Abstract, Proposal, Funding Identification is mapped to Dublin Core and basic Dublin Core is included as an option Geographic coverage mapped to FGDC / ISO – bounding box – spatial object – polygon description of levels and identifiers Universe Scheme, Concept Scheme – link of concept, universe, representation through Variable – also allows storage as a ISO/IEC compliant registry S0447

Data Collection Methodology Question Scheme – Question – Response domain Instrument – using Control Construct Scheme Coding Instructions – question to raw data – raw data to public file Interviewer Instructions Question and Response Domain designed to support question banks – Question Scheme is a maintainable object Organization and flow of questions into Instrument – Used to drive systems like CASES and Blaise Coding Instructions – Reuse by Questions, Variables, and comparison S0448

Logical Product Category Schemes Coding Schemes Variables NCubes Variable and NCube Groups Data Relationships Categories are used as both question response domains and by code schemes Codes are used as both question response domains and variable representations Link representations to concepts and universes through references Built from variables (dimensions and attributes) – Map directly to SDMX structures – More generalized to accommodate legacy data S0449

Physical storage Physical Data Structure – Links to Data Relationships – Links to Variable or NCube Coordinate – Description of physical storage structure in-line, fixed, delimited or proprietary Physical Instance – One-to-one relationship with a data file – Coverage constraints – Variable and category statistics S0450

Archive An archive is whatever organization or individual has current control over the metadata Contains persistent lifecycle events Contains archive specific information – local identification – local access constraints S0451

Building from Component Parts UniverseScheme ConceptScheme CategoryScheme CodeScheme QuestionScheme Instrument Variable Scheme NCube Scheme ControlConstructScheme LogicalRecord RecordLayout Scheme [Physical Location] PhysicalInstance S0452

Group Resource Package – Allows packaging of any maintainable item as a resource item Group – Up-front design of groups – allows inheritance – Ad hoc (“after-the-fact”) groups – explicit comparison using comparison maps for Universe, Concept, Question, Variable, Category, and Code Local Holding Package – Allows attachment of local information to a deposited study without changing the version of the study unit itself S0453

3.1 Local Holding Package Depository Study Unit OR Group Reference: [A reference to the stored version of the deposited study unit.] Local Added Content: [This contains all content available in a Study Unit whose source is the local archive.] Citation / Series Statement Abstract / Purpose Coverage / Universe Other Material / Notes Funding Information / Embargo S0454

Support for OAIS content SIP – Study level material – Variable and Question content – Methodology and process AIP – Lifecycle events (Archive processing) – Value added – Provenance DIP – Selections of all of the above

DDI Resources – Specifications – Resources Tools Best Practices Use cases – DDI Users list