Wendy Thomas Minnesota Population Center NADDI 2014.

Slides:



Advertisements
Similar presentations
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Advertisements

Overview of key concepts and features
Discove r Humanities and Social Science Electronic Thesaurus - HASSET Faceted search HASSET is the subject thesaurus that the UK Data Service uses to index.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
Requirements Specification
Reusable!? Or why DDI 3.0 contains a recycling bin.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Präsentationstitel IAB-ITM Find the right tags in DDI IASSIST 2009, 27th-30th Mai 2009 IAB-ITM Finding the Right Tags in DDI 3.0: A Beginner's Experience.
Codebook Centric to Life-Cycle Centric In the beginning….
FGDC, Meet the DDI Adding Geospatial Metadata to a Numeric Data Catalog Julie Linden Yale University.
Use of METS in CDL Digital Special Collections Brian Tingle.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.
ISO/TC211 Geographic Information/Geomatics Implementing ISO Metadata David Danko Work Item 15—Project Leader
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
Digital Library Architecture and Technology
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
North American Profile: Partnership across borders. Sharon Shin, Metadata Coordinator, Federal Geographic Data Committee Raphael Sussman; Manager, Lands.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Transitioning from FGDC CSDGM Metadata to ISO 191** Metadata
Secure Epidemiology Research Platform (SERPent) Kick Start Meeting - April 15 th, 2010 Pascal Heus
DDI and the Lifecycle of Longitudinal Surveys Larry Hoyle, IPSR, Univ. of Kansas Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
Metayogi Increasing the Accessibility of the Semantic Web Karim Tharani Doug Macdonald Rachel Heidecker.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Center of Excellence for Oceans and Human Health at the Hollings Marine Laboratory Metadata Development in Support of the Oceans and Human Health Tidal.
Enhancements to Galaxy for delivering on NIH Commons
An Overview of Data-PASS Shared Catalog
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Data Management: Documentation & Metadata
Application of Dublin Core and XML/RDF standards in the KIKERES
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Session 2: Metadata and Catalogues
Open Archival Information System
Database Design Hacettepe University
FRBR and FRAD as Implemented in RDA
Presentation transcript:

Wendy Thomas Minnesota Population Center NADDI 2014

Coverage  Problem statement Why are there problems with interoperability with external search, storage and delivery systems  Minnesota Population Center situation Legacy model, increased requirements for interconnectedness, and internal needs  Approach and Progress

Problem Statement  System differences Convergence of 3 primary systems for managing information Content coverage, organization, and entry point  Differences in content standards Can have a different primary focus and purpose Content coverage, organization, and entry point Depth of searchable content  Combining contents with systems Ingest expectations Delivery expectations

A little historical background

Library Perspective  Libraries are collections of individual objects selected and organized by topical content Descriptions (metadata) are traditionally held external to the object and are designed to support discovery via title, author, topical, temporal, and geographic coverage Collections are fluid (libraries access and deaccess objects) When objects became electronic with searchable content, descriptions were linked to OR bundled with the object to allow “keyword” searching of the object itself Descriptions are “high level” and “generic” (i.e. they describe the object overall and support description of a wide range of object types)

Archives Perspective  In general, archives consist of records that have been selected for permanent or long-term preservation on grounds of their enduring cultural, historical, or evidentiary value. Archival records are normally unpublished and almost always unique, unlike books or magazines for which many identical copies exist.

Archives cont.  Archive metadata Normally separate from the object/record itself Focuses on relationships between records particularly in terms of organizational source, time, and the processes that created them (provenance) Preservation is a key provision (archives ingest and preserve) Queries often focus on relationships within the collection rather than on a “piece of information”; descriptive records support this via the use of fond, series, file, and item descriptions

Information Technology Perspective  Focus on storing, retrieving, manipulating and communicating information Storage is electronic (an object and/or description can be stored) Retrieval is based on unique addresses discovered by searching: ○ Structured indexed content ○ All electronic content ○ Following chains of relationships (explicit or virtual) Optimization occurs around speed of delivery and accuracy of the delivered content

Implications  Each external system we interact with comes out the perspective of a different primary system, prioritizing some aspects over others  Each has integrated other perspectives into their system approach to varying degrees

Content differences: There’s metadata and then there’s METADATA  metadata Bibliographic+ metadata is the high level discovery objects common to a broad range of objects. Think Dublin Core, OAI-ORE, MARC, etc.  METADATA Content metadata varies by discipline or group of disciplines. It carries the detailed information required to accurately determine the fit of data for a specific use and how to access datum within a data object

Bibliographic+ metadata  Carries standard title, author, publisher, identifier, distributor information  Provides structured coverage information (temporal, topical, spatial)  May provide unstructured topical searching by leveraging access to content metadata through keyword searching of some or all text content  Bibliographic metadata is associated with an object or aggregation of objects

Examples of bibliographic+ metadata  Dublin Core – the basics  MARC, DMARC, other bibliographic record standards  METS – a means of wrapping a common structure of bibliographic metadata with the content metadata and objects (Digital Library Federation)  OAI-ORE – a structure that adds the archival perspective of aggregations and flexible resource mapping (OAIS)

METADATA  Content metadata is designed for specific purposes including but not limited to Supporting deep topical discovery Describing how to access a single datum within the object Determining fitness of data to a specific use Informing users of quality and facilitating use Capturing process and provenance information Driving production Supporting comparison, analysis, and repurposing …and more

Examples of content metadata  EML – Ecological Metadata Language Resource module containing information describing dataset, literature, protocol, and software resources  FGDC – Federal Geographic Data Committee Information on identification (bibliographic), data quality, organization of data, spatial reference, entity and attributes, distribution, and metadata reference  DDI – Data Documentation Initiative Study, conceptual framework, data collection/capture, methodology, data processing, logical content of the data, physical storage, summary statistics, archival management, lifecycle events, comparison, groups, reusable metadata, source data, collections of data, etc.

Common features  Provides high-level metadata with detailed, coverage relevant metadata  Binds metadata and data within the metadata or through explicit external links  Perspective is generally data file centric  Common stated purpose is to support discovery and access

Combining the content with systems  Ingest expectations: There is an assumption that because we all cover the basic metadata that it is organized in similar ways That metadata has related data That the focus of the metadata is the data file/set  Delivery expectations All over the board

Comparison of purposes DDI-LFGDC  The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.  The standard was developed from the perspective of defining the information required by a prospective user to determine the availability of a set of geospatial data; to determine the fitness and the set of geospatial data for intended use; to determine the means of accessing the set of geospatial data; and to successfully transfer the set of geospatial data. Federal Geographic Data Committee. FGDC-STD Content standard for digital geospatial metadata (revised June 1998). Federal Geographic Data Committee. Washington, D.C. (pg. iv)

Comparison of purposes DDI-LFGDC  The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.  The standard was developed from the perspective of defining the information required by a prospective user to determine the availability of a set of geospatial data; to determine the fitness and the set of geospatial data for intended use; to determine the means of accessing the set of geospatial data; and to successfully transfer the set of geospatial data. Federal Geographic Data Committee. FGDC-STD Content standard for digital geospatial metadata (revised June 1998). Federal Geographic Data Committee. Washington, D.C. (pg. iv)

Comparison of purposes DDI-LFGDC  The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.  The standard was developed from the perspective of defining the information required by a prospective user to determine the availability of a set of geospatial data; to determine the fitness and the set of geospatial data for intended use; to determine the means of accessing the set of geospatial data; and to successfully transfer the set of geospatial data. Federal Geographic Data Committee. FGDC-STD Content standard for digital geospatial metadata (revised June 1998). Federal Geographic Data Committee. Washington, D.C. (pg. iv)

DDI-Lifecycle  Pushed the focus from a data file firmly onto the Study defining the StudyUnit as a coordinated data capture process A one time data capture through one or more instruments A single wave or data capture cycle of a repeated study  Allowed Grouping of Study Units into series or other relationships

 DDI-L does not come SOLELY from a discovery perspective  Its no longer “data file” focused So……  When we interact with external systems that use a Library/IT discovery/access based approach its difficult to know what the primary access point is

Resulting issues with various systems  METS What is the primary entry point?  Da|ra If the “data file” is the primary object what about derivatives? What about multiple forms of primary content metadata?  DataONE Where do we store the relational information for OAI-ORE (Resource Maps, Aggregation, etc.) How can we support scrapping multi-relational descriptive metadata out of DDI content?

MPC Metadata Systems  Microdata storage and access system (IPUMS and related systems)  Aggregate data storage and access system (NHGIS)  Integration of access systems (TerraPop)  Specialized access systems for some microdata projects (IHS, ATUS,...)

The MPC as a hybrid institution  Are we a research center? Modify (integrate and harmonize) rather than collect data Provide the data infrastructure for other people’s research  Are we an archive? Archival responsibility for our products Archival responsibility for selected source data  Are we a service center? Provide support for proposal development and implementation Forum for discussion

Current Data Metadata Structure  Data is held in ASCII fixed format files  Metadata is held in multiple formats Standardized MPC Data Base (microdata and aggregate data) ○ Runs the dissemination and access system Structured text documents ○ Study level information used in user interface Physical and digital images of related materials and original metadata Provenance and Process notes…varied

MPC Database: Content Metadata [Variables, Summary Statistics, SpatialTemporal] Content for Interface: Study Level information, Methodology, Questions, Comparability Process Metadata Related Documents: Physical, PDF Data Access System Catalog: Dublin Core

Current level of standards compliance  Dublin Core Use an extended version of Dublin Core Terms to describe related documents and data files  DDI-Codebook Original input structure for aggregate data systems Output structure for microdata products (Metadata databases could be mapped to DDI Lifecycle presumably without loss)

Model Selection  Currently going with an integrated model using Premis, DDI, ISO 19115, and Dublin Core  Working on developing a profile of objects from each that will be supported within the MPC (required/optional) and how they relate to each other  Determine mapping to external metadata structures we need to interact with

The Issues  Identification of gaps in metadata and determining how to fill them  Involve individuals in resolving metadata capture issues on a process-by-process basis  Minimize time requirements on research staff for analysis activities and process changes  Relaying a sense of the larger picture – why metadata is captured and how it is used – without overwhelming individuals  Develop a means of instituting these practices early in the project proposal stage for future projects

Specific Requirements  Producing specific “flavors” of DDI to meet needs of DDI based systems (World Bank, other NADA systems)  Generating and storing different required subject headings  Organize profiles of DDI 3.2 to serve different functions Publication Internal management of specified content

Initial decisions  Continue to maintain internal systems Move more content to database  Define current system as the delivery system and explore what is needed for a processing/archival layer(s)  Publish DDI 3.2 for archiving and dissemination purposes  Publish other dissemination formats from DDI 3.2 (leveraging DDI 3.2 to X mapping activities)  Use DDI 3.2 (4) to inform the content and structure of processing/archival layer(s)

Additional recommendations  Clearly differentiate harmonized content from sample specific content  Add a collection management layer to: Capture cross collection relationships Facilitate interface with external system Integrate non-DDI related objects (50,000 documents related to census activities from around the world)  Generate publication profiles and processes to meet external needs

Sharing perspective  Our original approach was based on how we wanted to manage metadata internally  Viewed DDI-L as a base output from which high level records or DDI-C could be created for external distribution  We currently are working with 5 different organizations who want to provide access to our collections  Everyone wants something different

External catalog  IHSN has a specific format of NESSTAR’s DDI-C for individual samples  Da|ra wants a fuller DDI bibliographic record based on the study  DataONE wants an OAI-ORE resource map based on the data file  All have their locally supported search subjects

What I want  To make sure all the metadata regarding our data files can be expressed in a DDI 3.2 instance  Leverage the more detailed bibliographic information structure of DDI  Maintain an set of bibliographic information (extended Dublin Core) to serve a source for generating records based on external profile requirements that covers all of our holdings (DDI and non-DDI)

Collection management  Create extended Dublin Core records for non-DDI material  Create collection level records that can serve as OAI-ORE Aggregations  Automatically generate the subject headings for external systems based on our internal subject headings  Capture all relationships between records in a way that supports a variety of objects being considered “top level” objects

Dublin Core Extensions  Add MPC type codes that allow for selection of specific elements when creating a profile of metadata for a specific external system  Addition of more specified OWL and OAI-ORE predicates for linking  Addition of specialized links between a data file and it’s primary metadata  Content to support the consistent generation of RDF URN identifiers

DDI content  Study level metadata Bibliographic, spatial, concepts, coverage Related data files (Physical Instance) Instruments (Questionnaires) Other Materials (bibliographic information) ○ Codes ○ Spatial metadata  Group level metadata Bibliographic, spatial, concepts, coverage  Resource Packages Bibliographic, coverage

I need to be able to “scrape” the following information from the DDI:  Record for each object within a DDI Study Unit and Resource Package  Record for each “collection”  Links between records to support flexible aggregations  Generate specialized subject headings from local subject content

Return metadata to DDI  When objects are deposited in da|ra a DOI is generated and needs to be noted in the DDI  When objects are deposited in DataONE an identifier is generated and needs to be noted in the DDI  When a DDI instance (DDI-L or DDI-C) is generated the object is stored and the specific DDI identification (Agency, ID, Version) needs to be noted in the DDI store as a product

Possible areas of enhancement  Making the internal use of Dublin Core extensible in terms of adding DDI and/or Local type attributes  Capturing more specific relational information (OAIS Resource Maps, DataONE link to specific metadata for a data file)  Improved access control  Provenance management

Questions