LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.

Slides:



Advertisements
Similar presentations
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Advertisements

Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
Introduction The Open Science Grid (OSG) is a consortium of more than 100 institutions including universities, national laboratories, and computing centers.
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing ADASS London, UK September 23-26, 2007 Jacek Becla 1, Kian-Tat Lim 1,
Ultimate wide field Imaging: The Large Synoptic Sky Survey Marek Kowalski Physikalisches Institut Universität Bonn.
Jeff Kantor LSST Data Management Systems Manager LSST Corporation Institute for Astronomy University of Hawaii Honolulu, Hawaii June 19, 2008 LSST Data.
Caro-COOPS Data Management: Metadata. Cast-Net addresses the need for improved connectivity among coastal observing systems by creating a regional framework.
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
1 LSST: Dark Energy Tony Tyson Director, LSST Project University of California, Davis Tony Tyson Director, LSST Project University of California, Davis.
Building a Framework for Data Preservation of Large-Scale Astronomical Data ADASS London, UK September 23-26, 2007 Jeffrey Kantor (LSST Corporation), Ray.
Panel Summary Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University XLDB 23-October-07.
An Automated Component-Based Performance Experiment and Modeling Environment Van Bui, Boyana Norris, Lois Curfman McInnes, and Li Li Argonne National Laboratory,
Sky Surveys and the Virtual Observatory Alex Szalay The Johns Hopkins University.
Hunt for Molecules, Paris, 2005-Sep-20 Software Development for ALMA Robert LUCAS IRAM Grenoble France.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Peking University Astronomy Symposium 10/17/2010 Large Synoptic Survey Telescope.
The Large Synoptic Survey Telescope Philip A. Pinto Steward Observatory University of Arizona for the LSST Collaboration Legacy Projects Workshop 17 May,
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 New Frontiers with LSST: leveraging world facilities Tony Tyson Director, LSST Project University of California, Davis Science with the 8-10 m telescopes.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
Printed by With the help of scientists and staff at NRAO and WVU, high school students in West Virginia and neighboring states are.
1 STScI Pipeline Thermal-vacuum data handling I. Neill Reid for WFC3 Instrument CALibration team ICAL.
ALMA Software B.E. Glendenning (NRAO). 2 ALMA “High Frequency VLA” in Chile Presently a European/North American Project –Japan is almost certainly joining.
Chapter 4 Realtime Widely Distributed Instrumention System.
NOAO Brown Bag Tucson, AZ March 11, 2008 Jeff Kantor LSST Corporation Requirements Flowdown with LSST SysML and UML Models.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
1 Jacek Becla, XLDB-Europe, CERN, May 2013 LSST Database Jacek Becla.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
DC2 Post-Mortem/DC3 Scoping February 5 - 6, 2008 DC3 Goals and Objectives Jeff Kantor DM System Manager Tim Axelrod DM System Scientist.
1 Large Synoptic Survey Telescope Status Update for AAAC October 13, 2011 Nigel Sharp Division of Astronomical Sciences, NSF Kathy Turner Office of High.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Virtual Survey System sept 04 ASTRO-WISE- federation OmegaCEN AstroWise a Virtual Survey System OmegaCAM – Lofar – AstroGrid –((G)A) VO AstroWise a Virtual.
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
NACP A High-Resolution Daily Surface Weather Database for NACP Investigations Peter E. Thornton 1, Robert B. Cook 2, W. Mac Post 2, Bruce E. Wilson 2,
The Large Synoptic Survey Telescope: The power of wide-field imaging Michael Strauss, Princeton University.
Precision Studies of Dark Energy with the Large Synoptic Survey Telescope David L. Burke SLAC for the LSST Collaboration Rencontres de Moriond Contents.
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
LSST and Dark Energy Dark Energy - STScI May 7, 2008 Tony Tyson, UC Davis Outline: 1.LSST Project 2.Dark Energy Measurements 3.Controlling Systematic Errors.
LSST VAO Meeting March 24, 2011 Tucson, AZ. Headquarters Site Headquarters Facility Observatory Management Science Operations Education and Public Outreach.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Kevin Cooke.  Galaxy Characteristics and Importance  Sloan Digital Sky Survey: What is it?  IRAF: Uses and advantages/disadvantages ◦ Fits files? 
The Large Synoptic Survey Telescope and Precision Studies of Cosmology David L. Burke SLAC C2CR07 Granlibakken, California February 26, 2007 Brookhaven.
PDS Geosciences Node Page 1 Archiving LCROSS Ground Observation Data in the Planetary Data System Edward Guinness and Susan Slavney PDS Geosciences Node.
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante.
Pan-STARRS PS1 Published Science Products Subsystem Presentation to the PS1 Science Council August 1, 2007.
The LSST Data Processing Software Stack Tim Jenness (LSST Tucson) for the LSST Data Management Team Abstract The Large Synoptic Survey Telescope (LSST)
Mountaintop Software for the Dark Energy Camera Jon Thaler 1, T. Abbott 2, I. Karliner 1, T. Qian 1, K. Honscheid 3, W. Merritt 4, L. Buckley-Geer 4 1.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
1 LSST Town Hall 227 th meeting of the AAS 1/7/2016 Pat Eliason, LSSTC Executive Office Pat Osmer, LSSTC Senior Advisor.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Theme 2 AO for Extremely Large Telescopes Center for Adaptive Optics.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
February 12, 2002Tom McGlynn ADEC Interoperability Technical Working Group Report.
T. Axelrod, NASA Asteroid Grand Challenge, Houston, Oct 1, 2013 Improving NEO Discovery Efficiency With Citizen Science Tim Axelrod LSST EPO Scientist.
Final Data Archiving of the Sloan Digital Sky Survey-an Example
From LSE-30: Observatory System Spec.
Optical Survey Astronomy DATA at NCSA
HTCondor and LSST Stephen Pietrowicz Senior Research Programmer National Center for Supercomputing Applications HTCondor Week May 2-5, 2017.
Theme 2 AO for Extremely Large Telescopes
Long-Term Preservation of Astronomical Research Results
Theme 2 AO for Extremely Large Telescopes
Theme 2 AO for Extremely Large Telescopes
Presentation transcript:

LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George Mason University / LSST) The LSST research and development effort is funded in part by the National Science Foundation under Scientific Program Order No. 9 (AST ) through Cooperative Agreement AST Additional funding comes from private donations, in-kind support at Department of Energy laboratories and other LSSTC Institutional Members. National Optical Astronomy Observatory Research Corporation The University of Arizona University of Washington Brookhaven National Laboratory Harvard-Smithsonian Center for Astrophysics Johns Hopkins University Las Cumbres Observatory, Inc. Lawrence Livermore National Laboratory Stanford Linear Accelerator Center Stanford University The Pennsylvania State University University of California, Davis University of Illinois at Urbana-Champaign ABSTRACT: The Large Synoptic Survey Telescope (LSST) project will produce 30 terabytes of data daily for 10 years, resulting in a 65-petabyte final image data archive and a 70-petabyte final catalog (metadata) database. This large telescope will begin science operations in 2014 at Cerro Pachon in Chile. It will operate with the largest camera in use in astronomical research: 3 gigapixels, covering 10 square degrees, roughly 3000 times the coverage of one Hubble Space Telescope image. One co- located pair of 6-gigabyte sky images is acquired, processed, and ingested every 40 seconds. Within 60 seconds, notification alerts for all objects that are dynamically changing (in time or location) are sent to astronomers around the world. We expect roughly 100,000 such events every night. Each spot on the available sky will be re- imaged in pairs approximately every 3 days, resulting in about 2000 images per sky location after 10 years of operations ( ). The processing, ingest, storage, replication, query, access, archival, and retrieval functions for this dynamic data avalanche are currently being designed and developed by the LSST Data Management (DM) team, under contract from the NSF. Key challenges to success include: the processing of this enormous data volume, real-time database updates and alert generation, the dynamic nature of every entry in the object database, the complexity of the processing and schema, the requirements for high availability and fast access, spatial-plus-temporal indexing of all database entries, and the creation and maintenance of multiple versions and data releases. To address these challenges, the LSST DM team is implementing solutions that include database partitioning, parallelization, and provenance (generation and tracking). The prototype LSST database schema currently has over 100 tables, including catalogs for sources, objects, moving objects, image metadata, calibration and configuration metadata, and provenance. Techniques for managing this database must satisfy intensive scaling and performance requirements. These techniques include data and index partitioning, query partitioning, parallel ingest, replication of hot-data, horizontal scaling, and automated fail-over. In the area of provenance, the LSST database will capture all information that is needed to reproduce any result ever published. Provenance-related data include: telescope/camera instrument configuration; software configuration (software versions, policies used); and hardware setup (configuration of nodes used to run LSST software). Provenance is very dynamic, in the sense that the metadata to be captured change frequently. The schema has to be flexible to allow that. In our current design, over 30% of the schema is dedicated to provenance. Our philosophy is this: (1) minimize the impact of reconfiguration by avoiding tight coupling between data and provenance: hardware and software configurations are correlated with science data via a single ProcessingHistory_id; and (2) minimize the volume of provenance information by grouping together objects with identical processing history. Sample database schema shown here: - Image - Calibration - Provenance Each of these illustrates the vast collection of system parameters that are used to track the state of the end-to-end data system: what was state of the telescope & camera during an image exposure, or what were the values of the numerous calibration parameters, or what was the state & version of all relevant hardware and software components in the pipeline during data processing and ingest. Solution:Processing_ID Solution: Track all of these system parameters (provenance) with a single (unique) Processing_ID.