The Astrolabe Project: Identifying and Curating Astronomical ‘Dark Data’ through Development of Cyberinfrastructure Resources Gretchen Stahlman, PhD Candidate University of Arizona School of Information Library and Information Services in Astronomy (LISA) VIII, June 7, 2017
Astrolabe Astrolabe is a new data repository and computational environment being created at University of Arizona (UA). Partners include: UA School of Information UA Department of Astronomy and Steward Observatory UA University Libraries CyVerse (formerly the iPlant Collaborative) American Astronomical Society Astrolabe has been funded by: UA Office for Research & Discovery (now RDI) National Science Foundation ACI
The “Astrolabe” Project Astrolabe has a mission to: Collect, preserve, disseminate Provide tools for analysis and data sharing Expose research data
Archive Management Image credit: Digital Curation Centre, www.dcc.ac.uk Image credit: http://data-archive.ac.uk/create-manage/life-cycle
Lifecycle Preservation and Access Curation “is the active management and appraisal of digital information over its entire life cycle” (Pennock, 2007). Curation requires insightful knowledge of data and communities. Resources must be developed to support publication of (and links between) research AND data.
“Dark Data” in the Long Tail Large projects have well-planned data stores, while large amounts of data remain uncurated (Heidorn, 2008). “Like dark matter, this dark data on the basis of volume may be more important than that which can be easily seen” (p. 281). Long Tail data require institutions, practices and policies to make these data useful to researchers.
Long Tail Distribution in Astronomy
The “Top 20%”
Curating the Long Tail with Astrolabe
General Properties of “Dark Data” “Dark Data” are typically: Heterogeneous Generated through unique procedures Curated by individual scientists Not maintained Obscured or protected Seldom reused Currently unnoticed
Astronomical Data Common Data Types Common Data Format Sky images Light curves Spectroscopy Catalogs Common Data Format FITS (Flexible Image Transport System) Culture of Open Access
American Astronomical Society (AAS) Key professional society for astronomers in the US Hosts two major conferences each year Non-profit organization Publishes four major journals The Astronomical Journal (AJ) The Astrophysical Journal (ApJ) The Astrophysical Journal Letters (ApJL) The Astrophysical Journal Supplements (ApJS)
CyVerse.org Discovery Environment Atmosphere Data Store Use hundreds of Apps and manage data in a simple web interface Atmosphere Custom cloud-based scientific analysis platform or use a ready-made one for your area of scientific interest Data Store Store, manage, access, and share all the data related to research
CyVerse Services
Astrolabe Organizational Model
July 2015 Workshop Outcomes Identify mission and clear science use cases Take advantage of CyVerse cyberinfrastructure and longevity of University of Arizona Obtain community buy-in and manage expectations Focus on “low-hanging fruit” such as data not curated elsewhere and data behind figures in journals Develop a follow-on workshop for additional feedback
July 2016 Workshop Outcomes Physical format of dark data (i.e. historical data stored on tapes) Author websites archiving data (not typically long-lived) LSST time domain and serendipitous data cases (follow-up to LSST observations and discovery through historical data) Searching the literature for references to dark data (for indicative text, broken links, etc.)
Astrolabe Timeline 2013 - AAS Strategy Meeting 2015 - Workshop #1 in Tucson funded by UA Start for Success seed grant 2016 UA Accelerate for Success awarded for one-year pilot – collaborators include iSchool, Steward, UA Libraries, CyVerse, AAS Changed name from Arizona Astronomical Data Hub (AADH) to Astrolabe to focus beyond AZ Established a Board of Directors Workshop #2 focusing on specifying requirements for Astrolabe system NSF ACI Grant awarded to develop WorldWide Telescope as Astrolabe front end and visualization tool, with the idea that this could scale to other repositories
WorldWide Telescope (WWT) A screenshot from WWT HTML5 web client – worldwidetelescope.org
Current Status of Astrolabe: 2017 Activities and Objectives Searching for uncurated or “at-risk” data by mining the literature, and by contacting authors individually based on our team’s review of particular types of publications Recently contracted a developer to accomplish objectives specified in recently-awarded NSF grant (award #1642446), will hire additional developers Working on funding proposals for system development, including project to create protocols for migrating data from obsolete media into Astrolabe Collaborating with CyVerse to develop and optimize interfaces, apps, metadata templates and indexing, cone search and VO Installed Montage for conversion of FITS to JPEG to TOAST Designing website as interface to CyVerse data store to facilitate data deposition and reuse
Our Team Principal Investigators AAS Affiliate WWT Developer Bryan Heidorn, PhD, UA School of Information Dennis Zaritsky, PhD, UA Department of Astronomy AAS Affiliate Julie Steffen, AAS Director of Publishing WWT Developer Jonathan Fay, AAS Contractor and Microsoft Software Engineer Postdoctoral Researcher Huanian Zhang, PhD, UA Department of Astronomy Graduate Research Associate Gretchen Stahlman, UA School of Information Astrolabe Advisory Board Members Robert Hanisch (NIST) Chris Lintott (Oxford/AAS) Barbara Kern (U of Chicago) Julie Steffen (AAS) Frank Timmes (AZ State/AAS) Benjamin Weiner (Steward/UA) Edwin Henneken (ADS) Henry “Trae” Winter, Astrolabe Advisory Board Chair (CfA)
Thank you! http://astrolabe.arizona.edu This material is based upon work supported by the National Science Foundation under Grant No. 1642446.