Download presentation
Presentation is loading. Please wait.
1
Open Data in Astronomy Sky Surveys
Ashley E. Sands Christine L. Borgman From CHI paper: These sites offer comparisons in rationales and policy interpretations of open data, which are shaped by their differing scientific objectives. While policy rationales and implementations shape infrastructures for scientific data, these rationales also are shaped by pre-existing Infrastructure (ie, co-shaping of policy and science). Both surveys planned for data openness from the beginning of their projects – but what does that mean? This talk shows how the definition of data openness can be diverse, even among very similar projects within one scientific field: optical astronomy sky surveys.
2
Research Methods Document analysis Ethnography Interviews
Public and private documents and artifacts Official and unofficial versions of scientific practice Ethnography Observing activities on site Embedded for days or months at a time Interviews Questions based on our research themes Compare multiple sites over time
3
Sloan Digital Sky Survey (SDSS-I/II)
Survey from Telescope in New Mexico, USA Terabytes of data total Scientific focus Galaxy formation Data reuse in other sub-fields Alfred P. Sloan Foundation is largest funder Tens of millions of dollars The Sloan Digital Sky Survey (SDSS) is one of the most ground-breaking surveys in the history of astronomy. The survey covered over a quarter of the night sky with high quality optical and spectroscopic imaging. The first phase of the SDSS project (SDSS-I) ran from , the second (SDSS-II) from , and subsequent related projects continue today. The SDSS data are openly available to astronomers and the general public through data releases. The final data release of the SDSS-I/II project collaboration occurred in June The collection to be curated, which I am discussing today, constituted about 130 Terabytes of astronomical observations. The SDSS dataset is significant in terms of scope, quality, public access, and variety of uses and users. The survey covered over a quarter of the night sky with high quality photometric and spectroscopic imaging. SDSS was the first ground-based astronomy survey to ensure prompt public release of data, with only a short proprietary period to clean the data and prepare them for release. Most science papers employing SDSS data are written by end-users unaffiliated with the official collaboration. Many collaborative telescope projects now emulate the open data practices pioneered by SDSS. Messier 51, The Whirlpool Galaxy
4
Large Synoptic Survey Telescope (LSST)
Survey from (expected) Telescope in Chile 15 Terabytes of data per night Science focus Galaxy formation, evolution Near-earth objects Milky Way National Science Foundation is primary funder Over 1 Billion dollars (expected) The LSST plans to make a map of the sky every three to four days, resulting in an estimated 15 terabytes of data collected each night over the course of ten years of observations. While the SDSS had a short proprietary period for data processing, the LSST expects to release data immediately, without a proprietary period for the project’s investigators. However, funding requirements do constrain the policies and practices of data release. To ensure the approximately one billion-dollar project is fully funded, the LSST uses data access to entice institutions and countries to contribute resources to the project. While SDSS data were released over the web and made open internationally, LSST expects to offer different levels of data access, determined by each country’s partnership level. Given the scale of investments necessary for infrastructure, data collection, processing, maintenance, and sustaining access to the resulting datasets, creative funding models may be crucial for future projects the size of sky surveys.
5
Project Timelines Design & Construction Operations SDSS-I & II
Operations SDSS-III & IV Commissioning 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 The sky surveys are on different schedules. LSST conceptualization began ten years after the same for SDSS. Note the 45 year time span – these are long and overlapping projects Concept & Funding Applications LSST Operations Design & Development Construction & Commissioning
6
Characteristics of Openness
What are considered to be data Why data are shared When data are made available To whom data are made available How long data are made available Despite both being sky surveys, and with many of the SDSS team members now leading LSST, there are differences in the ways that “open data” was interpreted in the two groups. These are the five factors you will go through with each of the two sites – there is one slide for each point on this slide, in order. This then matches slide #11 – the comparison slide. Astronomer at work
7
What are data? SDSS LSST Observations Software and Code Images Spectra
Catalogs LSST Software and Code Despite both being sky surveys, and with many of the SDSS team members now leading LSST, there are differences in the ways that “open data” was interpreted in the two groups. SDSS AND LSST ARE DIFFERENT – OBSERVATIONAL DATA ARE THE SAME, BUT PROMINENCE AND SHARING OF CODE ARE DIFFERENT SDSS SkyServer interface example classic.sdss.org
8
Why are data shared? Improve scientific efficiency
Increase feedback and iterative testing Meet funder requirements Encourage amateur astronomers and citizen scientists CB to AES: this is an aggregation of what we learned from both, right? That’s what you mean by the same? YES –THESE ARE THE REASONS WHY ASTRO IN LSST AND IN SDSS SHARE DATA SDSS AND LSST ARE THE SAME FROM SIG CHI (Written just about SDSS, but is true of both projects): Motivations for data openness in SDSS and LSST We identified four primary motivations for opening up SDSS data. First, the collaboration mentioned benefits that we describe as improving the efficiency of the science [61:3]. As with many kinds of science, making SDSS images and spectra available means that the data do not need to be collected again for most kinds of research, until a new wave of telescope or imaging capabilities occurs. Telescope time saved on repetitive observations can be used to increase the importance and usefulness of the scientific information collected. A second kind of motivation is what we refer to as quality related [74,76]. For example, open dissemination of the SDSS data is useful to the project as it increases the number of astronomers working with the data and software and thus increases the amount and diversity of helpful feedback provided to the collaboration in terms of ways to improve the dataset. Opening the SDSS data thus helped ensure the amount and quality of feedback the team received. For LSST, this is also true in terms of making the SOFTWARE open source – they hope to have feedback all along. A third motivation for data openness, which we learned from our interviewees, was that of ensuring continued funding from the NSF. In particular, in order to ensure distribution of the public funds, the SDSS team released the Early Data Release (EDR) [58] as an act of good faith to the NSF. Finally, the SDSS community identified some benefits of making data open to the public for educational and research purposes [3]. Amateur involvement in astronomy has an extremely rich history and has been critical for many new discoveries of objects [17], much more so than for the majority of other scientific disciplines [67]. A sophisticated infrastructure has emerged over the decades to support and integrate amateur observations into the body of astronomy knowledge [43]. SDSS very much regarded itself as part of this tradition, and also anticipated that members of the public might be able to contribute to astronomy through the use of SDSS data. Astronomer manipulating data for analysis (2012 photo)
9
When are data made available?
SDSS LSST planned Release dataset annually, accessible through user-friendly online interface YES Release transient event data within 60 seconds of observation -- Provide storage, software, computing platform, APIs for analysis and code development SDSS AND LSST ARE DIFFERENT SDSS: Data releases approximately every year – this gives time for the team to process the data so that it can be used by end-users. While there is technically a proprietary period, it is rarely used because it generally takes about the full time to process the data. The proprietary period thus only gave access to individuals on the team to the raw data, and perhaps only weeks or days of access to the completed data release. LSST: Two main layers of release Level 1: Within one minute of the initial data being captured, they plan to dispense “alerts”. These will announce an occurrence in the sky, and encourage individuals to follow-up by using other telescopes. These are nearly immediate, and there is no proprietary period. There is only minor processing to take the captured photons and convert them into something that can be categorized and distributed. Level 2: Very similar to the SDSS data releases – about every year there will be a full “release”. These are highly processed. “Level 3”: Essentially APIs and computing and storage space for researchers to employ to navigate the LSST data. It is more an infrastructure to enable research, and less a “data product” in its own right.
10
To whom are data made available?
“SDSS data have been released to the scientific community and the general public…” ( “Scientists in the US and Chile, LSST’s International Affiliates, and the general public are invited to share in this voyage of discovery” ( CB to AES The image shows only LLST so not apparent how different Can you make paired images or otherwise show the match? SDSS AND LSST ARE DIFFERENT. SDSS data is made available online free – to all levels of expertise and to any country around the world LSST, however, is still trying to finish collecting all of the necessary funds for operations. To entice new donors, they use data access as a carrot for universities and countries to “buy in” to the project. The current plan is for the United States and Chile (where the telescope is located) to have complete access to the data, while other countries will need to develop individual agreements with the project. The details are still being determined as to how security standards will be implemented, and how it will be determined what country an individual is in. Will they require individual logins that include their country? Will they block ISP addresses outside of the approved countries? These kinds of decisions are to be determined.
11
How long are data made available?
"By altering the traditional interactions between a telescope, its data and communities of astronomers, Sloan [SDSS] … is indeed a legacy to be celebrated.” Kennicutt Jr, R. C. (2007). Astronomy: Sloan at five. Nature, 450(7169), 488–489. “The ultimate deliverable of LSST is not the telescope, nor the instruments; it is the fully reduced data.” SDSS AND LSST ARE THE SAME. They both only promise until the end of data collection at the time – since SDSS is further along, it has proven to be sustained further. However, we do not know if LSST will, since currently the staff are only encouraged to focus on the current round of funding: Construction. FROM CHI: Once the data have been released, astronomers around the world use the data for their scientific objectives, which may necessitate further refinement and processing. Such data products, derived from work conducted outside of the SDSS collaboration can include catalogs that combine SDSS data with other sources of data. The resulting derived data products have been locally processed by individuals and small groups and tend to be stored on university computer networks or personal computers, with archival and sharing practices local and ad hoc. SDSS project documents did not specify preservation and access for derived and hybrid data products produced by end-user astronomers and therefore do not follow a standardized openness, sharing, or preservation policy [4,20]. While they expect the data to remain relevant for decades, the SDSS and LSST grant proposals failed to include requests for funding for long-term access and preservation. This reasoning could be because of the way the grants are structured and organized. Others in both the SDSS and LSST have said, “well, we made the data available, so we’ve done what we promised” – these folks implied that once the data were made available, end-users and other institutions would see the value of the data and care for it moving forward. SDSS and LSST collaboration teams manage data as part of the projects. However, the goal of these surveys is not data per se, but to further scientific knowledge. Scientists are the end users of these data. Datasets retrieved from SDSS may be used alone or in combination with other datasets; similar models of science are expected for LSST. “Derived” datasets result from further processing by scientific users. While the SDSS and LSST projects were designed for data sharing and management, individuals and small groups of researchers rarely are able to provide long-term access to their derived datasets (Sands, 2016). SDSS data continue to be used for scientific investigations years after they were initially collected. These data also are used to design and calibrate the next generation of astronomy sky surveys, including the LSST. Juric, M. (2014, March). LSST data management: Data products and software stack overview. Presented at the Joint DES-LSST Workshop, Fermilab.
12
Open data in SDSS and LSST
Similar Why data are shared How long data are made available Different What are data When data are made available To whom data are made available CB to AES: I can talk over this, but what is not clear is that SDSS is over and LSST has not yet begun – comparison is between current practice and planned practices. What’s a better way to make that temporal transition more apparent? PERHAPS ADD IN THE SLIDE THAT IS CURRENTLY #14 SOMEWHERE EITHER HERE OR AT THE START – TO BRIEFLY POINT OUT THE TEMPORAL DISTINCTION? Despite both being optical sky surveys and thus both being from a very specific sub-set of science, We actually found more differences than similarities for why and how data are shared. This is a clear example of the complexity of data management, curation, and sustainability in science, the importance of these differences between each community should not be underestimated nor conflated by policy-makers.
13
Acknowledgements UCLA Center for Knowledge Infrastructures
Milena Golshan, Irene Pasquetto, Bernie Randles, Peter Darch Research subjects Sloan Digital Sky Survey Large Synoptic Survey Telescope Funding: Alfred P. Sloan Foundation The Transformation of Knowledge, Culture, and Practice in Data-Driven Science. # C.L. Borgman, PI; S. Traweek, Co-PI. knowledgeinfrastructures.gseis.ucla.edu/ @UCLA_KI
14
Extra slides after this one
15
Openness for Valuable Astronomy Data
“... [The SDSS] imaging archive should be a world resource for many decades of the next century” (Margon, 1998, SDSS Article). “The LSST data products will change every field of astronomical study…” (2014, NSF Proposal). Open access to the SDSS data has been a critical part of ensuring the success of the project, serving practical and scientific goals, including the ability to secure further funding. The LSST plans to generate open data, but under a different policy framework for how, when, and to whom the data are accessible. While open sky survey data are often used in the course of scientific research, the resulting derived datasets are rarely made open. Concepts of open data vary widely between the SDSS team, the LSST team, and individual sky survey data users (Pasquetto, Sands, Darch, & Borgman, 2016). To reach the full potential of discovery available in the era of data-intensive science, digital data must be continuously managed over extended periods of time. “The ultimate deliverable of LSST is not the telescope, nor the instruments; it is the fully reduced data. All science will be [sic] come from survey catalogs and images” (Juric, 2014, sec. 2).
16
Sky Survey Differences in Scale
Example of differing scientific scale: SDSS: The astronomy survey, originally intended to provide quantitative data for the study of galaxies, has proven beneficial to nearly every subfield of astronomy. LSST: Intends to provide data for many fields from the beginning: the 2009 “Science Book” has 11 different chapters each covering a different sub-set of astronomy research (solar system, supernovae, weak lensing, transients, etc.)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.