Download presentation
Presentation is loading. Please wait.
Published byDenis Weaver Modified over 9 years ago
1
Data Sharing, Small Science, and Institutional Repositories Melissa H. Cragin & Carole L. Palmer Center For Informatics Research in Science and Scholarship Grad. School of Library and Information Science, University of Illinois Jacob R. Carlson & Michael Witt Purdue University Libraries
2
A view from the Institutional Repository Advancing university-based cyberinfrastructure is dependent on our understanding of how to support data practices and needs. Sharing is at the heart of success: collecting, storing, and making use of data can only come after the means for sharing are in place. We cannot collect and curate all data, particularly in a way that facilitates effective re-use. We will need to work with researchers to develop selection and appraisal guidelines, and data services.
3
Data Curation Profiles Project Project focus: which data are researchers willing to share, when, and with whom? Objectives: derive requirements for managing data sets in IRs develop policies for archiving and access identify librarian roles & skill sets for supporting data management, sharing & curation. Biochemistry Biology Civil Engineering Electrical Engineering Food Sciences Earth and Atmospheric Sciences Soil Science Anthropology Geology Plant Sciences Kinesiology Speech and Hearing Earth and Atmospheric Sciences Soil Science
4
Methods Institutional Review Board for approval of Human Subjects Research increasingly focused, materials-based interviews Pre-interview Worksheet Requirements Worksheet “data set” samples Data Curation Profiles http://www.datacurationprofiles.org/
5
“Faculty of the Environment” Data Needs Project Collaborators: Bryan Heidorn, Michelle Wander, U of I Environmental Council
6
Smallish Science single PI (often) often dependent on graduate students ad hoc data management systems idiosyncratic sharing practices “success” dependent on using one’s own data But… may be working at community level may be producing all digital data may be conducting “data-driven” science may be producing very large data sets
7
Data Characteristics CrystallographyGeology Type 1. “ Raw data ” Most information rich, long-term value for re-use … 4. “ CIF file ” – crystallography exchange Most commonly shared data type 1. “Reduced spreadsheet” – table with average values for multiple observations Most often requested by others Format 1. Binary data – image 4. Crystallographic Information File – text (field-wide standard for numerical data) 1. Excel spreadsheet Size 1. Each image or “ frame ” ¼ to 1 Mb Set is approx. 2,400 frames = approx 1Gb 4. > 500Kb 1. spreadsheet size – under 1Mb Intellectual Property/Data Owners Service model provide a service to chemists by solving crystal structures Ownership of the data is ambiguous, and require negotiation before data “hand-off” Depends on source of funding governmental and private grants, gov. institutions, industry Ownership of and right to the data range from full to very limited, some long-term “embargoes” Accessibility Field-wide repositories Many journals require deposit of CIF files OAI-PMH tools becoming available for CIF files Difficult and ad hoc Well-known researchers receive direct requests for data, often based on publications Profiling complexities & differences
8
Findings Distinguishing exchange from open sharing exchange: sharing amongst collaborators is a primary concern, often with significant barriers (more) open access: limited by need for control and reward system, but also Sharing with wider “publics” is conditioned by both data management pressures and personal experience the “known person – cost” algorithm incidents of misuse What is most easily or willingly shared is not always the data that has the most re-use value
9
Field Specific Research AreaForm to be sharedFormats Type of data setSize Shared when? Atmospheric science severe weather modeling compressed output of the modelVis5D 1 file / dataset10-100 Mb 4-6 month embargo, Agronomy water quality, drainage, and plant growth cleaned and reviewed sensor and hand- collected sample data.xls approx. 100 files ~1MB each, up to 20 Mb After publication Geology rock, water and microbes averaged sensor and hand-collected sample data; photographs.xls; jpg 1 file; images< 1 Mb After publication Civil Engineering traffic movement cleaned and normalized sensor data MySQL (postgresql) 1 database approx. 1000 K/day 1 month to 1 year embargo Examples of what, and when
10
Implications for Institutional Repositories embargo services are a *must* (~66%, 14/20) clear, explicit data citation information in IR records disconnect: application of metadata standards highly important, but many unaware of existing standards preservation services are needed to support re-use: 11/19 participants said their data would be useful for more than 10 years.
11
Supporting the science process data exchange infrastructure support for data management planning data literacy instruction - integral to scientific information work Broader implications for academic institutions Leadership Opportunities for Libraries
12
Thank you This research is supported by the Institute of Museum and Library Services, (IMLS) grant # LG-06-070032-07. D. Scott Brandt, PI Co-PIs: M. Witt & J. Carlson, (Purdue) and C. Palmer & S. Shreeves (UIUC) RAs: D. Leiter (Purdue) and M. Kogan (UIUC)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.