Data Sharing Practices: Implications for Curation and Re-use Carole L. Palmer & Tiffany Chao Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign GSLIS Research Showcase 30 March 2012
Team members: Carole Palmer Tiffany Chao Nic Weber Karen Baker Andrea Thomer small science complex, heterogeneous data implications for data curation value for re-use across disciplines - Data Practices team Comparative analysis of researchers in the earth and life sciences Qualitative analysis of worksheets and interviews conducted with scientists. Investigation of data production and use in relation to curation needs, cultures of sharing, and re-use potential.
Field Specific Research Area Form to be sharedFormats Type of data setSize Shared when? Agronomy water quality, drainage, and plant growth cleaned, reviewed sensor; hand-collected samples.xls approx. 100 files ~1MB each, up to 20 Mb After publication Geology rock, water and microbes averaged sensor; hand-collected samples; photographs.xls; jpg 1 file; images < 1 Mb After publication Civil Engineering traffic movement cleaned, normalized sensor MySQL postgresq l 1 databas e appro x K/day 1 month to 1 year embargo Curation Profiles Project: What can be shared when?
Production vs. reuse / wholes and parts GeobiologyVolcanologySoil ecologySensor science Data unit Time series: (site specific) spreadsheets microscopy images annotated digital “field photos” Rock profile: physical rock thin section chemical analysis photographs field notes Database: multiple abiotic soil measures associated metadata Database: soil data sensor data User communities Geobiology, Geology, Chemistry, Microbiology U.S. Park Service Geology – igneous petrology Geophysics Geochemistry Biochemistry Earthworm ecology Network Science Computer Science Sharing conventions by request no repository mostly post-pub, some unpublished by request no repository public resource collection Reference data industry Limits – customization “vertical” dev.
Far from collective, shared data infrastructure Curation of functional groupings: Exposing data very different from supplying by request. Complex mis-use concerns: Misinterpretation– presumed problems Misappropriation – actual premature re-use Disregard of good faith practices – how used, what referenced Scholarly record of data collected and analyzed Unit for long-term preservation Organization for retrieval Raw material for future research