Designing Flexible Workflow for Upstream Participation of the Scientific Data Community Robert R. Downs and Robert S. Chen NASA Socioeconomic Data and.

Slides:



Advertisements
Similar presentations
Criteria for the trustworthiness of data centres Jens Klump Helmholtz Centre Potsdam German Research Centre for Geosciences (GFZ) DataCite Summer Meeting.
Advertisements

DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Repository audit and risk profiles: trust through transparency
By Eileen Clegg Digital Preservation at Columbia in the Old Days (2009)
TRAC / TDR ICPSR Trustworthy Digital Repositories.
Long-term Archive Service Requirements draft-ietf-ltans-reqs-00.txt.
NOAA Metadata Update Ted Habermann. NOAA EDMC Documentation Directive This Procedural Directive establishes 1) a metadata content standard (International.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
S/W Project Management
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
International Council on Archives Section on University and Research Institution Archives Michigan State University September 7, 2005 Preserving Electronic.
Providing Access to Your Data: Rights Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Science Archives in the 21st Century 25/26 April Towards an International standard for Audit and Certification of Digital Repositories David Giaretta.
World Data Center for Human Interactions in the Environment Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as.
Repository Requirements and Assessment August 1, 2013 Data Curation Course.
MOIMS Reportp. 1 Digital Repository Audit and Certification BOF Goal  Obtain CCSDS / ISO approval of a standard that establishes the criteria that a repository.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Responsible Data Use: Data restrictions Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
The Real At Risk E-Content: University Web Resources EDUCAUSE Joanne Kaczmarek University of Illinois at Urbana-Champaign Taylor Surface OCLC October 12,
Coming to TERM: Designing the Texas Repository Model Sue Soy Stan Gunn Marlan Green Dr. Patricia Galloway University of Texas at Austin Graduate.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Responsible Data Use: Data Restrictions Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
DRAFT EDMC Procedural Directives NOAA Environmental Data Management Committee 12/3/2015 1
Providing Access to Your Data: Rights Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Defining Submission Agreements and Policies DigCCurr Professional Institute May 16-21, 2010 & January 5-6, 2011 Chapel Hill, North Carolina, USA Carolyn.
April 12, 2005 WHAT DOES IT MEAN TO BE AN ARCHIVES? Trusted Digital Repository Model Original Presentation by Bruce Ambacher Extended by Don Sawyer 12.
DalSpace A content repository for Dalhousie community members.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Proposed Preliminary Statewide Full Service Partnership Classification System BASED ON STAKEHOLDER FEEDBACK THIS REPORT IS THE MENTAL HEALTH SERVICES OVERSIGHT.
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Data Archive Ingest WG Report to MOIMS Plenary May 14, 2004.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Providing access to your data: Determining your audience Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
The OAIS model SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
Working with your archive organization: Broadening your user community Robert R. Downs, PhD Socioeconomic Data and Applications Center (SEDAC) Center for.
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
Ingest and Dissemination with DAITSS
Auditing of Trustworthy Data Repositories – Speakers
Criteria for Assessing Repository Trustworthiness: An Assessment
Preparing a Trustworthy Domain Repository for ISO Certification
Trusted Repository Systems Overview
Implementing the Data Management Principles Opportunities and Advantages Robert R. Downs, PhD Sr. Digital Archivist, CIESIN, Columbia University.
Providing Access to Your Data: Handling sensitive data
Trustworthiness of Preservation Systems
Advanced Tracking and Resource tool for Archive Collections (ATRAC)
W. Christopher Lenhardt
Working with your archive organization Broadening your user community
Sophia Lafferty-hess | research data manager
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Presentation transcript:

Designing Flexible Workflow for Upstream Participation of the Scientific Data Community Robert R. Downs and Robert S. Chen NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science Information Network (CIESIN) The Earth Institute, Columbia University Prepared for presentation to the IASSIST 2010 Meeting June 3, 2010 Cornell University Ithaca, NY

Scientific Data are at Risk if not Archived Replication, comparison, new, and future uses of existing data require scientific data stewardship –Data must be identifiable, discoverable, accessible, usable, and recoverable Data Preservation requires preparation –Datasets need to be complete, documented, and described, and must contain permissions for their use Stewardship of data often decreases after completion of the project that produced the data –Some data are neglected if not archived soon after creation

Saving Scientific Data For Use By Others Scientific data repositories can provide capabilities to submit data for archiving –Scientist or team member submits data online A data submission system could assist data producers in preparing and describing their data for archiving –Data preparation prior to project completion Capabilities for data submission must balance the need for comprehensive information about the data with the practicalities of what data producers are willing and able to provide. –Easy tools to deposit and describe data 3

Designing a Data Submission System Identify Trusted Repository Requirements for Submission Categorize Submission Services Define Functions for Submission Services Create Workflow for Data Submission and Review Model Scientific Data Submission and Workflow Review of Successful Submissions Recommendations for Submission Services 4

Identifying Requirements for Submission System Reviewed requirements for trustworthy archives and digital repositories in relevant standards and documents –Consultative Committee for Space Data Systems (CCSDS) (2002) Reference Model for an Open Archival Information System (OAIS). Adopted as ISO 14721:2003 –CCSDS (2004) Producer-Archive Interface Methodology Abstract Standard. Adopted as ISO 20652:2006 –CCSDS. Audit and Certification of Trustworthy Digital Repositories: Draft Recommended Practice R-1 Red Book, Issue 1. (July 2009). Initially Utilized TRAC document –Online Computer Library Center (OCLC) and Center for Research Libraries (CRL) (2007) Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC), Version 1.0. Identified and categorized pre-ingest requirements from TRAC –Requirements relevant to submission and workflow prior to ingest Identified pre-ingest requirements from R-1 (Draft ISO Standard) –Related and additional submission and pre-ingest workflow requirements

Communication Requirements Identified From TRAC Document A3.5 Repository has policies and procedures to ensure that feedback from producers and users is sought and addressed over time. A3.7 Repository commits to transparency and accountability in all actions supporting the operation and management of the repository, especially those that affect the preservation of digital content over time. B1.4 Repository’s ingest process verifies each submitted object (i.e., SIP) for completeness and correctness as specified in B1.2. B1.6 Repository provides producer/depositor with appropriate responses at predefined points during the ingest processes. B1.7 Repository can demonstrate when preservation responsibility is formally accepted for the contents of the submitted data objects (i.e., SIPs). B1.8 Repository has contemporaneous records of actions and administration processes that are relevant to preservation (Ingest: content acquisition). * Source: Online Computer Library Center (OCLC) and Center for Research Libraries (CRL). (2007). Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC), Version 1.0.OCLC & CRL. February Available: 6

Authentication Requirements Identified From Draft Recommended Practice (CCSDS R-1)* 3.3.4The repository shall commit to transparency and accountability in all actions supporting the operation and management of the repository that affect the preservation of digital content over time The repository shall have mechanisms to appropriately verify the identity of the Producer of all materials The repository shall follow policies and procedures that enable the dissemination of digital objects that are traceable to the originals, with evidence supporting their authenticity. * Source: Consultative Committee for Space Data Systems (CCSDS) Audit and Certification of Trustworthy Digital Repositories: Draft Recommended Practice. Red Book, Issue R-1 (July 2009). Available: 7

Digital Repository Services for Web-Based Data Submission Authentication Verify identity of data producer or representative for each submission session Data Deposit Gather and deposit data and documentation Data Description Describe data for preservation, discovery, and use Submission Agreement Establish agreement between the producer and repository Communication Confirm submission, request information if needed, and notify upon ingest Review and Approval Review submission information package and approve for ingest Transformations Transform descriptive information and actions into metadata standards for ingest Source: Downs & Chen (2009) Earth and Space Science Informatics Workshop.

Workflow for Web-Based Submission of Scientific Data 1.Secure authenticated login by authorized data producer or representative –Multiple sessions may be needed to assemble submission information 2.Deposit and describe data and documentation files –Automate and encourage descriptions for each file 3.Describe scientific data set –Encourage unique title and offer selectable choices when possible 4.Grant permissions for data set –Offer choices based on data type, organization, and collection 5.Submit Data Set –Provide capabilities to review and modify entire package before submission 6.Notify Submitter and Archivist that submission was completed – notifications include contact information for subsequent communication 7.Review submission for completeness and correctness –Apply appraisal criteria for collection to which data set was submitted –Contact producer regarding questions or need for additional information 8.Approve data set for ingest to digital repository –Notify submitter that submission has been approved for ingest into digital repository 9.Transform descriptions and actions into metadata for ingest to digital repository –Descriptive information is converted into XML metadata and ingested into digital repository Source: Downs & Chen (2009) Earth and Space Science Informatics Workshop.

Data Producer Authentication Login for One or More Sessions Communication Notifications and Requests Ingest Archival Information Package In Digital Repository Transformation Transform Values to XML Metadata Submission Agreement Grant Intellectual Property Rights Data Description Describe Data Set Data Deposit Provide Files and Descriptions Review and Approval Appraise and Approve Submission Information Package Model for Web-Based Data Submission and Workflow Data Reviewer Derived from Downs &Chen (2009) Earth and Space Science Informatics Workshop

Review of Successful Data Submissions Resources Reviewed: –Legacy Data Submission Process –Forms Used in Legacy Submission Process –Descriptions of Submitted Data –Data Collections –Cyberinfrastructure and physical facilities –Initial Prototype of Submission System 11

Support for Successful Submission Affordances identified to address challenges for online submission of data: –Enable Timely Preparation of Submissions –Facilitate Authentication of Submitter –Elicit Information to Contact Submitter –Invite Complete Documentation –Foster Composition of Data Descriptions –Provide Choices to Describe Data –Request Non-Restrictive Permissions 12

Enable Timely Preparation of Submissions Challenge: Data submitted before creation or a long time after creation can be incorrect or incomplete –Previous asynchronous capabilities enabled assembly of submissions locally prior to submission. –Submissions prior to completion can result in an addendum to replace missing or incorrect files. –Submissions long after completion can result in delays for scheduling dissemination. Recommendation: Encourage producers to submit data at the time when it has been created by enabling multiple sessions for producers to prepare and submit data. 13

Facilitate Authentication of Submitter Challenge: Identification of the data submitter is needed to ensure that the data producer is being represented –Previous physical and submission capabilities enabled verification of the identity of the data provider. –Submissions received from non-authorized individuals might not contain the correct or complete data. –The data producer or their representative can provide rights for archiving and using the data. Recommendation: Establish capabilities and procedures to allow data producers and their representatives to receive a username and password that can be used to log in to the data submission system when submitting data. 14

Elicit Information to Contact Submitter Challenge: Submitters need to be contacted to resolve issues with submission. Recommendation: Request or generate the complete name and address of the individual who submits the data. –Automatically populate contact information fields upon log in and request verification. –Online form to request for contact information: complete name and address –Obtain additional contact information Institution, mailing address, telephone number 15

Invite Complete Documentation Challenge: Data require documentation to facilitate understanding about the data and their applicability –Data must be understood by those not familiar with the study. Recommendation: Request submission of documents describing the data, their creation, and measures used. –Methodology document (who, why, what, where, when, and how the data were obtained) –Variable definitions and specification (location) of values (codebook) –Descriptions of instruments, measures, and units of measurement –Explanations of caveats, assumptions, additions, corrections 16

Foster Composition of Data Descriptions: Title Challenge: The relevance of a data set cannot always be determined from the title. Recommendation: Guidance for describing the data within the title to enable discovery and to differentiate it from other data. Considerations for inclusion within title: –Purpose: Characteristic measured –Measure: Instrument –Location: Geographical aspects measured or political (country, state, county, city, etc.) –Temporal Aspects: Date or range of dates when data was collected or measured –Version: Sequential version identifier or date of release Examples Indicators of Coastal Water Quality: Change in Chlorophyll-a Concentration , Alaska-Argentina National Footprint Accounts, 2006 Edition, Footprint and Biocapacity by major land type by nation,

Provide Choices to Describe Data Challenge: Identifying terms to describe data can be time consuming Recommendation: Provide choices from groups of controlled vocabularies to describe data –Examples of terminology for consideration: ISO 19115:2003 Geographic Information – Metadata Topic Categories Semantic Web for Earth and Environmental Terminology (SWEET) See 18

Selecting Terms from Controlled Vocabulary: ISO Topic Categories Source: Downs & Chen (2009) Earth and Space Science Informatics Workshop.

Request Non-Restrictive Permissions Challenge: Intellectual property rights must be obtained to enable the use of data by the archive and by others. –Unknown rights to data can restrict data stewardship and use –Limiting the rights to data can prevent some uses of the data Recommendation: Avoid legal terms in request for data producer to grant rights, with limited restrictions, if possible. –Simple Form with choices to be clicked, based on affiliation of submitter and type of resource Creative Commons License (Attribution) Additional data sharing options Public Domain (Created by Government employee) 20

Summary: Capabilities for Upstream Submission and Workflow Requirements are applicable to social science and natural science data and to interdisciplinary data Potential risk when engaging data producers early –not knowing which data are important to preserve (but, capturing more information should improve selection and appraisal) Benefits of obtaining data through robust workflow prior to the end of the project that collects the data –higher quality metadata, including provenance information –reduced risk of not getting minimum metadata (e.g., when authors move on to other projects) –lower costs overall (data are submitted when ready) –ability to follow up with producers 21

References Consultative Committee for Space Data Systems (2004) Producer-Archive Interface Methodology Abstract Standard. (CCSDS B-1). Also: Space data and information transfer systems – Producer-archive interface – Methodology abstract standard (ISO 20652:2006). Available: Consultative Committee for Space Data Systems (2002) Reference Model for an Open Archival Information System (OAIS). Also: Space data and information transfer systems - Open archival information system - Reference model (ISO 14721:2003). Available: Consultative Committee for Space Data Systems (CCSDS) Audit and Certification of Trustworthy Digital Repositories: Draft Recommended Practice. Red Book, Issue R-1 (July 2009). Available: Downs RR, Chen RS (2009) Designing Submission Services for a Trustworthy Digital Repository of Interdisciplinary Scientific Data. Earth and Space Science Informatics Workshop: Developing the Next Generation of Earth and Space Science Informatics. August 3-5, University of Maryland, Baltimore County. Available: Downs RR, Chen RS (2010) Designing Submission and Workflow Services for Preserving Interdisciplinary Scientific Data. Earth Science Informatics. Available: Nestor Working Group, Trusted Repositories -Certification (2006) Catalogue of Criteria for Trusted Digital Repositories, Version 1. Available: Online Computer Library Center (OCLC) and Center for Research Libraries (CRL) (2007) Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC), Version 1.0.OCLC & CRL. February Available: The Digital Curation Centre (DCC) and Digital Preservation Europe (DPE) (2007) Digital Repository Audit Method Based on Risk Assessment (DRAMBORA). Available: