W. Christopher Lenhardt

Slides:



Advertisements
Similar presentations
Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
Advertisements

Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
PubMed Central Mahyar Ahmadpour-B. Kowsar Publicatin Corp. Kowsar Editorial Meeting 1 September 19th, 2013 Tehran, Iran.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Designing Flexible Workflow for Upstream Participation of the Scientific Data Community Robert R. Downs and Robert S. Chen NASA Socioeconomic Data and.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
International Council on Archives Section on University and Research Institution Archives Michigan State University September 7, 2005 Preserving Electronic.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
World Data Center for Human Interactions in the Environment Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
Development of a Long- Term Interdisciplinary Data Archive with the Columbia University Library System 24 October 2006 Robert S. Chen, Robert R. Downs,
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Fedora and the Preservation of University Electronic Records Project NHPRC Electronic Records Research Grant Kevin L. Glick Manuscripts and Archives, Yale.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Surveying and Scheduling Records of OCIO Presented by Jennifer Wright Smithsonian Institution Archives Records Management Team February 16, 2005.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Presented by Eliot Christian, USGS Accessibility, usability, and preservation of government information (Section 207 of the E-Government Act) April 28,
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Working with your archive organization: Broadening your user community Robert R. Downs, PhD Socioeconomic Data and Applications Center (SEDAC) Center for.
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
Announcing the 2014 National Digital Stewardship Agenda.
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Robert R. Downs1and Robert S. Chen2
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
Ingest and Dissemination with DAITSS
Auditing of Trustworthy Data Repositories – Speakers
Criteria for Assessing Repository Trustworthiness: An Assessment
Preparing a Trustworthy Domain Repository for ISO Certification
Implementing the Data Management Principles Opportunities and Advantages Robert R. Downs, PhD Sr. Digital Archivist, CIESIN, Columbia University.
Building A Repository for Digital Objects
DAITSS: Dark Archive in the Sunshine State
An Overview of Data-PASS Shared Catalog
Providing access to your data Determining your audience
Statewide Digitization and the FCLA Digital Archive
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
W. Christopher Lenhardt
Working with your archive organization Broadening your user community
Research Data Management
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Research data preservation in Canada
An Open Archival Repository System for UT Austin
Open Archival Information System
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Bird of Feather Session
Robin Dale RLG OAIS Functionality Robin Dale RLG
Presentation transcript:

W. Christopher Lenhardt IASSIST 2007: Building Global Knowledge Communities with Open Data McGill University, Montréal Québec, CANADA Appraisal and Selection of Scientific Data for the Long-Term Archive: A Case Study Robert R. Downs Senior Digital Archivist; SEDAC Archives Manager Robert S. Chen Director & Senior Research Scientist; SEDAC Manager; CODATA Secretary-General W. Christopher Lenhardt Associate Director, Information Services; Deputy Director, SEDAC Center for International Earth Science Information Network (CIESIN) Socioeconomic Data and Applications Center (SEDAC) The Earth Institute at Columbia University 16 May 2007

Scientific communities use digital data to advance knowledge Need for Management and Long-Term Preservation of Scientific Digital Data Scientific communities use digital data to advance knowledge Scientific digital data are increasingly integral to scientific progress Current software is used to create, render, analyze, and share data in digital form with others Print cannot offer analytical capabilities of software (e.g., GIS) Data volumes are growing exponentially and are becoming increasingly complex Scientific digital data are at risk if not properly preserved and curated Data must be correct, complete, documented, and described for a range of potential future users Information systems technology evolution is inevitable and is not always anticipated in a timely manner Digital and optical media deterioration occurs even under ideal conditions Need to support future requirements for discovery, access, and use

Cost Assumptions for Long-Term Archiving of Scientific Digital Data Over time, many more data sets will be archived and curated Economies of scale will reduce costs of curation, especially for large, relatively homogeneous data sets Nevertheless, each additional data set to be curated does increase the cost of operating the long-term archive at the margin For now, cost drivers are personnel costs; technology costs stable or decreasing Future budget limitations could put curated data at risk of loss, or at least lead to reductions in desirable levels of accessibility and support Long-term archiving requires a commitment to incur continuous costs in the future and therefore requires careful analysis of what ought to be included, i.e., selection and appraisal

Appraisal and selection ensures that: Benefits of Appraisal and Selection of Scientific Data for Accession to the LTA Appraisal and selection ensures that: The long-term archive (LTA) will contain quality data resources that have been appraised as possessing enduring value The high quality of scientific data resources can help justify archival costs during times of constrained budgets Limited long-term archive resources can be focused on the most important scientific data rather than spread across all data Plans for providing services for each data set can take into consideration potential future needs and use Potential future value of data can be identified and documented during the appraisal process Preparation of scientific data for appraisal and selection provides added value to data Requirements and priorities for planning long-term archive infrastructure and services take into account differences in data and their potential use

Ideal Qualities for Appraisal and Selection of Scientific Data for Long-Term Archiving Documented appraisal process Ongoing review and improvement of appraisal criteria and process Defined categories and choices for decisions Community-based selection criteria Efficient process for appraisal and selection Diversified stakeholder representation on selection committee

SEDAC Long-Term Archive The SEDAC LTA: A Case Study for Appraisal and Selection of Scientific Data SEDAC The Socioeconomic Data and Applications Center is operated for NASA by the Center for International Earth Science Information Network (CIESIN), a unit of the Earth Institute at Columbia University SEDAC Active Archive Publicly accessible online scientific data products and services relevant to the needs of the interdisciplinary community interested in human dimensions of the environment SEDAC Long-Term Archive An LTA established in collaboration with the Columbia University library system to archive and curate selected data from the SEDAC Active Archive Mission: The SEDAC Long-Term Archive acquires, preserves, and maintains the content of selected high-quality data, data products, documentation, and services relevant to human dimensions of global change in a digital form to support the discovery, access, and use of archived resources by scientific, educational, and decision-making communities for at least the next 50 years.

Older SEDAC Data Need a Long-Term Home Version (pub) GPW v1 (1995) GPW v2 (2000) GPW v3 (2005) Estimates for 1994 1990, 1995 1990, 1995, 2000 Input units 19,000 127,000 ~ 375,000 More than 180 citations of GPW versions 1 and 2 http://sedac.ciesin.columbia.edu/gpw/

Selection and Appraisal of SEDAC Resources for Accession into the SEDAC LTA Draft SEDAC Long-Term Archive Management and Operations Plan (September 23, 2005): The SEDAC Long-Term Archive (LTA) Board receives nominations from the SEDAC Lead Project Scientist for SEDAC resource to be submitted to the LTA and evaluates each resource for potential accession. The Board will consider the Selection Criteria for Submission of SEDAC Resources to the LTA to appraise each nominated resource for submission to the LTA and to identify the level of service and the retention schedule to be assigned to the resource. Nominations not containing sufficient information for appraisal shall be returned to the Lead SEDAC Project Scientist for insufficient evidence. Rejections shall be returned with an explanation of the deficiencies and the criteria that the nomination did not meet.

SEDAC LTA Board Representation LTA Board established with representation from SEDAC, the Earth Institute, and the Columbia University Libraries: SEDAC Project Scientist SEDAC Systems Engineer SEDAC Archives Manager (serves as Chair) Two representatives designated by Earth Institute Two representatives designated by Columbia University Libraries If SEDAC discontinues operations at Columbia University CIESIN will designate a replacement for one SEDAC position Columbia University Library will appoint replacements for the other two positions, including the chair

SEDAC Active Archive data recommended for LTA Decision Practice for Appraisal and Selection of Scientific Data for SEDAC LTA SEDAC User Working Group (UWG) reviews and approves data for SEDAC Active Archive dissemination SEDAC scientist identifies candidate scientific data and nominates data to the Lead SEDAC Project Scientist for dissemination Data described and presented to UWG with recommendation SEDAC UWG approves data for dissemination by SEDAC Active Archive SEDAC Active Archive data recommended for LTA SEDAC Active Archive data considered for transfer to LTA Plan for data is described with rationale to justify recommendation Nominated data recommended to UWG for transfer to the LTA SEDAC LTA Board appraises and selects data for LTA Data set recommended with proposed services to LTA Board LTA Board appraises data and reviews service recommendations LTA Board approves data for accession to the LTA LTA Board approves preservation and service levels for data

Decision Path for Submission of Scientific Data to the SEDAC LTA SEDAC LTA Board approves plan for submission of data to LTA SEDAC Lead Project Scientist recommends plan for submission of data to LTA SEDAC User Working Group accepts plan to transfer data to LTA SEDAC Lead Project Scientist recommends plan to transfer data to LTA SEDAC User Working Group approves plan for dissemination SEDAC Lead Project Scientist recommends plan for dissemination SEDAC Scientists review and identify scientific data

Development of Criteria for Selection of Data for the SEDAC LTA Reviewed literature, conducted research on requirements for digital preservation of scientific data, and participated in workshops on scientific data stewardship and digital preservation Reviewed existing policies and appraisal criteria: Library collections development records management and appraisal criteria traditional archives, scientific data centers, digital archives LTA Board reviewed and revised drafts Broad perspectives from diverse experiences represented Ongoing review by the LTA Board for current relevance and applicability to appraisal practice

Summary of Current Selection Criteria for Accession to SEDAC Long-Term Archive Scientific or Historical Value citation, research, and educational use as published in refereed scientific publications/reports from recognized committee of scientists Potential Usability and Use evidence of usability, usefulness, and sufficient usage by the community interested in human dimensions of the environment. Adequate evidence indicate potential for future use justifies costs of long-term archiving Uniqueness of Data (non-redundant stewardship) not being preserved in any form in another archive and is at risk of loss if not accessioned into the Long-Term Archive Relevance to LTA Mission currently endorsed or approved by community interested in human interactions in the environment. For the short-term, relevance includes content germane to SEDAC mission and SEDAC strategic plan Documented for Accessibility completeness and correctness of documentation to facilitate future discovery, access, and use Technological Accessibility (feasibility) received in format meeting technical criteria for the Service Level designated for the resource Legality and Confidentiality unrestricted permissions for preservation and future dissemination. No information that is confidential or prohibited from dissemination Non-Replicability data replication not feasible, excessively costly or prohibitive

Current Services Assigned for SEDAC LTA Data Preservation Services Preserve Content in Original Formats Preserve and Maintain Content in Supported Formats Dissemination Services Restricted Dissemination Public Dissemination

Current Levels of Services for Preservation of Data in the SEDAC LTA Preserve Content in Original Formats Content is maintained in Original Formats on accessible system for the specified retention period. Preserve and Maintain Content in Supported Formats Content received in Supported Formats is maintained on accessible system and is migrated to current Supported Formats. Supported Formats: ASCII Text (txt, xml, html) Comma Separated Values (csv) Image Files (png, jpg, tif, gif) Portable Document Format (pdf)

Current Levels of Services for Dissemination of Data in the SEDAC LTA Restricted Dissemination: The resource and its Dissemination Information Package (DIP) are not accessible by the public. The discovery metadata for the resource is included in the LTA restricted access catalog. Access to the restricted resource is granted in compliance with the restrictions specified for the resource. Limited user support is provided for a restricted resource that is authored by SEDAC in compliance with the restrictions specified for the resource. The use of restricted resources and services is evaluated and reported. Public Dissemination: The Dissemination Information Package (DIP) for the resource is freely accessible by the public in digital form. The discovery metadata for the resource is included in the LTA public access catalog. Contact information for user support is provided on the LTA public access catalog. Responses are provided for legitimate requests to correct publicly disseminated documentation or access capabilities. Answers or referrals are provided for scientific and technical questions about publicly disseminated resources. Changes are described for publicly disseminated resources or services. The use of publicly disseminated resources and services is evaluated and reported.

Preservation and Dissemination Services for SEDAC Data Approved for Accession to the LTA Data Set Title Publication Date Preservation Service Dissemination Service Environmental Treaties and Resource Indicators (ENTRI) - The Update of the Treaty Status Data 1998 Preserve Content in Original Formats Public Dissemination HALOPH: A Data Base of Salt Tolerant Plants of the World 1989 Environmental Subset of Collection of Multilateral Conventions at the Fletcher School of Law and Diplomacy 1992 Preserve and Maintain Content in Supported Formats Freedom in the World (1995-1996) 1995, 1996 World Resources 1996-97 1996 World Resources 1998-99: A Guide to the Global Environment: Environmental Change and Human Health (Data Tables) Gridded Population of the World (GPW) Version 1 1995

Candidate Data for SEDAC Long-Term Archive Candidate data sets recommended for accession to the LTA by the SEDAC Project Scientist and currently being prepared for review by the LTA Board. Gridded Population of the World (GPW) Version 2 Gridded Population of the World, Version 2: Ancillary Data

Overview of Process for Transferring Selected Data to the SEDAC LTA Selected data sets are transferred from the SEDAC Active Archive to the SEDAC Long-Term Archive (LTA) The SEDAC LTA accessions, preserves, and disseminates each selected data set in accordance with the preservation and dissemination services approved for that data set. The SEDAC Active Archive deaccessions each data set that has been accessioned into the SEDAC LTA once that data set has been disseminated by the LTA.

SEDAC Data Repository Organization SEDAC Digital Object Repository SEDAC Active Archive Data and Information Products SEDAC Long-Term Archive Data and Information Products Public Access to Data and Information Restricted Access to Data and Information Public Access to Data and Information Restricted Access to Data and Information Active Archive is for near-term dissemination with high levels of service. Primary users are discipline-specific scientists. Long-Term Archive is for the 50 – 100 year preservation time-frame with different expectations for levels of service.

Consistent with Open Archival Information System (OAIS) Framework Conclusions: Appraisal and Selection Process Contributed to Adoption of an OAIS Compliant Digital Repository System Consistent with Open Archival Information System (OAIS) Framework Meeting Responsibilities for the Long-Term Preservation of Data for Future Access and Use Prompted Prototype Implementation of Fedora Open Source Digital Repository System Submission Information Packages (SIPs) Prepared for Data Approved for Submission to the LTA Unique Persistent Identifiers (PIDs) Generated for each Digital Object (LTA Data Set) Digital Object Contains Content and Metadata Datastreams for an OAIS Compliant Archival Information Package (AIP) Changes to Digital Objects Stored as New Versions Ingest and Management of Various Data Types Web-Based Dissemination of Content and Metadata (Dublin Core and FGDC CSDGM) Search Supported by Resource Indexing Objects Assigned Behavior Definition and Dissemination Methods Ingest, Store, and Export in Extensible Markup Language (XML) Collection and Object Relationship Management Using Resource Description Framework (RDF) Graphs Prototype Assessment Revealed Need to Implement VITAL Product From VTLS VITAL is based on Fedora and includes integrated features and support Web-Based Discovery and Access for Public Browsing, and Simple, Advanced, and Full-Text Search Capabilities Web-Based Administration and Content Manager Client for Staff VITAL Batch Ingest Utility VTLS Automated Loading and Electronic-submission Tool (VALET) Enables Workflow, Author Submission, Cataloging, and Review Lightweight Directory Access Protocol (LDAP) Authentication Server JStore Harvard Object Validation Environment (JHOVE) Handle system server assigns unique identifiers that are resolved to URLs Search Retrieve Web / URL (SRW/SRU) and Z39.5 services Generation of SHA-1 Fixity Signatures on Ingest for Integrity Validation Synchronized Failover System for Contingent System and Data Recovery Also Reviewing Requirements to Adopt PREMIS and GML

SEDAC Long-Term Archive http://sedac.ciesin.columbia.edu/lta/