Arizona Astronomical Data Hub AAS 227: Dark/Orphaned Data P. Bryan Heidorn ORCID: University of January 2016
Thesis Large projects have well planned data stores Large amounts of data remain uncurated Orphan Data Much of that data is currently largely invisible – Dark Data This data should be curated professionally in collaboration with scientists Need for long-lived institutions
f(x)=ax k +o(x k ) Power Law of Science Data f(x)=ax k +o(x k )| X<.20 Data Volume Science Projects and Initiatives
Does NSF’s Data Follow the Power Law? I do not know but if $1 = X bytes…..
Dark data is the data that we know is/was there but we can’t see it. Hubble Space Telescope composite image "ring" of dark matter in the galaxy cluster Cl
Software Infrastructure for Sustained Innovation Christine Borgman, UCLA Ian Foster, University of Chicago Bryan Heidorn, University of Arizona Tom Howe, University of Washington Carl Kesselman, University of Southern California
Cyberinfrastructure Vision “The anticipated growth in both the production and repurposing of digital data raises complex issues not only of scale and heterogeneity, but also of stewardship, curation and long-term access. ” NSF Cyberinfrastructure Vision for 21st Century Discovery, Chapter 3
Recognition of need for data curation “Recommendation 6: The NSF, working in partnership with collection managers and the community at large, should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high- quality data scientists.” Long-Lived Digital Data Collections: Enabling Research and Education in the 21 st Century, Recommendations
Recognition of the importance of Information Recognition of the need for education New work roles within traditional institutions Interagency Working Group on Digital Data
AADH Workshop July 2015 28 Astronomers, software developers, librarians, AAS, VPR and School of Information
Accelerate for Success Partnership School of Information Department of Astronomy and Steward Observatory iPlant Collaborative Library AAS
AADH Broad Objectives Refine mission, science and education use cases Prevalence of Orphaned Data Take advantage of iPlant/CyVerse, Library and School of Information infrastructure and longevity Obtain community buy-in and manage expectations Establish short- and long-term funding
Develop a science advisory board to help guide and assist the project staff Collect data from AAS publication by University of Arizona researchers between 2005 and 2015 2500 articles in AAS Journals from 1086 papers with author affiliation of the National Optical Astronomy Observatory 343 journal articles from Arizona State University authors AADH Y1 Goals
Develop data/software catalog Adopt (meta-)data formats Write policy documents curators and authors Ingest selected data sets Develop discovery tool (eg. WWT) Create educational material Hold follow-on data/software carpentry workshops
The iPlant CyVerse Collaborative Discovery Environment Use hundreds of Apps and manage data in a simple web interface Bisque Image Analysis Environment Atmosphere custom cloud-based scientific analysis platform or use a ready-made one for your area of scientific interest Data Store Store, manage, access, and share all the data related to your research
Overcoming Barriers Reduce pain of metadata Reduce pain of data format Discourage bad behavior Reward good behavior
From repositories to collaborative space
Also… We are hiring a faculty member in Data Science also Astronomy Postdoc assistant-professor-data-science- tenure-eligible or at assistant-professor-data-science- tenure-eligible