Download presentation
Presentation is loading. Please wait.
1
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach
2
Active Data Curation Curation is the active use of data. It is a lifecycle process. Curation requires discipline specific knowledge and experience. Domain dependent curation rules and preservation actions must be merged into the scientific workflow processes. Need to automate data ingest, descriptive metadata creation, preservation and digital object relationships.
3
Scientific Workflow Fedora/Hydra Trusted Digital Repository (OAIS compliant) Knowledge Creation Tools Preservation Actions Metadata Management METS, PREMIS, MODS, DC, XSLT The Grainger Library Active Data Curation Lifecycle Elements Curation Rule Engine Operates on Metadata, Content Objects AIPs, OAI-ORE Curation Rule Engine: -- Domain dependent -- Can be invoked explicitly -- But also automated based on system trigger events CI-3, CI-5 Responses Access Mechanisms and E-Scholarship Services, GRIPs DIP Packages SIP packages Appraisal and Selection Migration and Emulation Tools Use, Reuse, Repurposing Tools Ingest scripts: fixity, integrity, authentication, transformation
4
Say What? What is the role of the library? The engineering librarian? The campus? The subject discipline? Libraries are creating content asset preservation systems. Trusted Digital Repositories. Fedora/Hydra/archivematica at UIUC Library. Role for the science/engineering library: connecting data to literature. Knowledge creation process and libraries. GrIPs (Group Information Profiles). NSF Data Management Plans.
6
What Data should be Curated? Defining data curation: DataNet projects: Data Conservancy (Hopkins), DataONE (New Mexico). Purdue profiles. Raw data and processed data. We surveyed several groups in specific disciplines. –Atmospheric Sciences (experimental) –Biophysics (simulation data).
7
Atmospheric Science: Experimental Data Five levels and two data streams: –Level 1: raw voltages from an instrument –Level 2: calibrated data derived from raw voltages –Level 3: image products displaying the data –Level 4: derived parameters, statistics, etc. from calibrated data –Level 5: analysis of Level 4 data that winds up in papers, publications, etc. Two other necessary data streams: ancillary instrument information and metadata.
8
Biophysics: Simulation Data Modeling of interactions of atomic level molecular data. Three levels: –Level 1: raw data from simulation run: positions and velocities of particles; software widely used. –Level 2: various raw data extracts of subsets of particles run data. –Level 3: visualization files (movie, images); analysis products generated from the visualization data for publication data. Also necessary are input parameters (starting coordinates, etc.) and other metadata.
9
Data Management Plan The Data Management Plan (DMP) is a new NSF mandatory supplementary document for all research proposals. –http://www.nsf.gov/bfa/dias/policy/dmp.jsp Each directorate, including the Engineering Directorate (ENG) is providing specific directions and required elements. The ENG document: http://nsf.gov/eng/general/ENG_DMP_Policy.pdf
10
Data Management Plan The digital data to be archived includes analyzed data – typically data that will go into articles and papers, and the metadata that defines the data that was generated. For Engineering Directorate grants, raw data from sensors or other instruments is not required to be archived.
11
Data Management Plan Maximum of two pages and will not count against the 15 page limit for proposals. UIUC Grainger Library has prepared overview document and template for DMPs. Working on Wizard. As part of NSF Ethics CORE Digital Library, working on RCR Requirement database and Wizard.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.