What to Expect at the LSST Archive: The LSST Science Platform Mario Juric, University of Washington LSST Data Management Subsystem Scientist for the.

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

MAST-VizieR/NED cross correlation tutorial 1. Introduction Figure 1: Screenshot of the MAST VizieR Catalog Search Form. or enter here as object class:
Tom Sheridan IT Director Gas Technology Institute (GTI)
Software for Science Support Systems EVLA Advisory Committee Meeting, March 19-20, 2009 David M. Harland & Bryan Butler.
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
Building a Framework for Data Preservation of Large-Scale Astronomical Data ADASS London, UK September 23-26, 2007 Jeffrey Kantor (LSST Corporation), Ray.
25 September 2007eSDO and the VO, ADASS 2007Elizabeth Auden Accessing eSDO Solar Image Processing and Visualisation through AstroGrid Elizabeth Auden ADASS.
Distributed Systems: Client/Server Computing
CLOUD COMPUTING. A general term for anything that involves delivering hosted services over the Internet. And Cloud is referred to the hardware and software.
Types of Operating System
Commissioning the NOAO Data Management System Howard H. Lanning, Rob Seaman, Chris Smith (National Optical Astronomy Observatory, Data Products Program)
Customized cloud platform for computing on your terms !
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 Radio Astronomy in the LSST Era – NRAO, Charlottesville, VA – May 6-8 th LSST Survey Data Products Mario Juric LSST Data Management Project Scientist.
Introduction to Cloud Computing
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
DC2 Post-Mortem/DC3 Scoping February 5 - 6, 2008 DC3 Goals and Objectives Jeff Kantor DM System Manager Tim Axelrod DM System Scientist.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 CORE DATA PROCESSING SOFTWARE PLAN REVIEW | SEATTLE, WA | SEPTEMBER 19-20, 2013 Name of Meeting Location Date - Change in Slide Master Data Release Processing.
Enterprise Cloud Computing
Group member: Kai Hu Weili Yin Xingyu Wu Yinhao Nie Xiaoxue Liu Date:2015/10/
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
Windows SharePoint Services. Overview Windows SharePoint Services (WSS) Information Worker Infrastructure component delivered in Windows Server 2003 Enables.
What is Firefly (1) A web UI framework for web applications
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Welcome To We have registered over 5,000 domain names and host over 1,500 cloud servers for individuals and organizations, Our fast and reliable.
EGI… …is a Federation of over 300 computing and data centres spread across 56 countries in Europe and worldwide …delivers advanced computing.
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
Unit 3 Virtualization.
Science Platform from the User Perspective
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Tools and Services Workshop
Joslynn Lee – Data Science Educator
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
Types of Operating System
Status and Challenges: January 2017
PDAC Roadmap from SUIT point of view
Optical Survey Astronomy DATA at NCSA
Tools and Services Workshop Overview of Atmosphere
A preliminary definition
API Aspect of the Science Platform
Windows Azure Migrating SQL Server Workloads
FICEER 2017 Docker as a Solution for Data Confidentiality Issues in Learning Management System.
LDF “Highlights,” May-October 2017 (1)
Chapter 21: Cloud Computing and Related Security Issues
PROCESS - H2020 Project Work Package WP6 JRA3
Cloud Computing.
Chapter 22: Cloud Computing Technology and Security
Ch > 28.4.
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Cloud Computing Dr. Sharad Saxena.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Databases, Web Pages and Archives
Cloud computing mechanisms
Technical Capabilities
Google Sky.
Data Management Components for a Research Data Archive
Introduction to Portal for ArcGIS
Office 365 Development July 2014.
Maria Teresa Capria December 15, 2009 Paris – VOPlaneto 2009
MS Confidential : SharePoint 2010 Developer Workshop (Beta1)
Presentation transcript:

What to Expect at the LSST Archive: The LSST Science Platform Mario Juric, University of Washington LSST Data Management Subsystem Scientist for the LSST Data Management Team. LSST 2017 Project and Community Workshop August 14-18, 2017

What Will LSST Deliver? Level 1 Level 2 Level 3 A stream of ~10 million time-domain events per night, detected and transmitted to event distribution networks within 60 seconds of observation. A catalog of orbits for ~6 million bodies in the Solar System. A catalog of ~37 billion objects (20B galaxies, 17B stars), ~7 trillion observations (“sources”), and ~30 trillion measurements (“forced sources”), produced annually, accessible through online databases. Reduced single-epoch, deep co-added images. A system to make it easier to create user-produced added-value data products (deep KBO/NEO catalogs, variable star classifications, shear maps, …) Level 1 Level 2 Level 3 For more details, see the “Data Products Definition Document”, http://ls.st/lse-163

How Do We Make Those Data Available? LSST Users Internet ? Dumping FITS tables onto an FTP site will not suffice in the 2020ies…

Modeling LSST User Needs A large majority of users are likely to begin by accessing the dataset through a web portal interface. They wish to become familiar with the LSST data set. They may query smaller subsets of data for “at home” analysis. Some users will wish to use the tools they’re accustomed to (e.g., TOPCAT, Aladin, AstroPy, etc.) to grab the data from the LSST archive. Some fraction of our users may choose to continue their analysis by utilizing resources available to them at the DAC. This avoids the latency (and the necessary local resources) associated with downloading (large) subset of the LSST dataset. Their science cases may not require too much computing, but are limited by storage, latency, or even just having the right software prerequisites. They would benefit from a prepared, next-to-the-data, analysis environment utilizing the 10% Level 3 allocation. Use cases demanding larger resources may be able to acquire them at adjacent computing facilities (e.g., XSEDE). These users will benefit from connectivity to such resources. Finally, the most demanding use cases, the rights-holders may utilize their own computing facilities to support larger-scale processing or even put up their own Data Access Centers. They need the ability to move, re-process, and/or re-serve, large datasets.

Accessing LSST Data and Enabling LSST Science: The LSST Science Platform Portal JupyterLab User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Internet LSST Users The LSST Science Platform is a set of integrated web applications and services deployed at the LSST Data Access Centers (DACs) through which the scientific community will access, visualize, subset and perform next-to-the-data analysis of the data.

The LSST Science Platform Aspects: Portal, JupyterLab, WebAPIs User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Internet LSST Users The Science Platform exposes the underlying DAC services services through three user facing aspects: the Portal, the JupyterLab, and the Web APIs. Through these, we enable access to the Data Releases and Alert Streams, and support next-to-the data analysis and Level 3 product creation using the computing resources available at the DAC.

LSST Portal: The Web Window into the LSST Archive JupyterLab User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Portal The Web Portal to the archive will enable browsing and visualization of the available datasets in ways the users are accustomed to at archives such as IRSA, MAST, or the SDSS archive, with an added level of interactivity. Through the Portal, the users will be able to view the LSST images, request subsets of data (via simple forms or SQL queries), construct basic plots/visualizations, and generally explore the LSST dataset in a way that allows them to identify and access (subsets of) data required by their science case. This will all be backed by a petascale-capable RDBMS.

LSST Portal: The Web Window into the LSST Archive We currently have an initial version of the Portal running at NCSA. Datasets: SDSS Stripe 82 AllWISE Soon: HSC (LSST-reprocessed) NEOWISE The Firefly Web Science User Interface (Wu et al, 2016; ADASS)

JupyterLab: Next-to-the-data Analysis User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Portal The tools exposed through the Web Portal will permit simple exploration, subsetting, and visualization of LSST data. They may not, however, be suitable for more complex data selection or analysis tasks. To enable that next level of next-to-the-data work, we plan to enable the users to launch their own Jupyter notebooks at our computing resources at the DAC. These will have fast access to the LSST database and files. They will come with commonly used and useful tools preinstalled (e.g., AstroPy, LSST data processing software stack). This service is similar in nature to efforts such as SciServer at JHU, or the JupyterHub deployment for DES at NCSA.

JupyterLab: Next-to-the-data Analysis YouTube demo of the LSST JupyterLab Aspect Demo: http://ls.st/bgt

Web APIs: Integrating With Existing Tools JupyterLab User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Portal The backend platform services – such as access to databases, images, and other files – will be exposed through machine-accessible Web APIs. We have a preference for industry standard and/or VO APIs (e.g., WebDAV, TAP, SIA, etc.) – the goal is to support what’s broadly accepted within the community. This will allow the discoverability of LSST data products from within the Virtual Observatory, federation of the LSST data set to other archives, and the use of widely utilized tools (eg., TOPCAT or others).

Computing, Storage, and Database Resources JupyterLab User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Portal Computing, file storage, and personal databases (the “user workspace”) will be made available to support the work via the Portal and within the Notebooks. An important feature is that no matter how the user accesses the DAC (Portal, Notebook, or VO APIs) they always “see” the same workspace.

How big is the “LSST Science Cloud” (@ DR2)? Computing: ~2,400 cores ~18 TFLOPs File storage: ~4 PB Database storage ~3 PB This is shared by all users. We’re estimating the number of potential DAC users not to exceed 7500 (relevant for file and database storage). Not all users will be accessing the computing cluster concurrently. We are estimating on order of a ~100. Though this is a relatively small cluster by 2020-era standards, it will be sufficient to enable preliminary end-user science analyses (working on catalogs, smaller number of images) and creation of some added-value (Level 3) data products. Think of this as having your own server with a few TB of disk and database storage, right next to the LSST data, with a chance to use tens to hundreds of cores for analysis. For larger endeavors (e.g., pixel-level reprocessing of the entire LSST dataset), the users will want to use resources beyond the LSST DAC (more later). These resources will be made available to the users of the U.S. Data Access Center. All DAC users will begin with some default (small) allocation, with more likely to be made available via a (TBD) proposal process.

How we (think) we will work with LSST data? Portal JupyterLab User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Internet LSST Users

Mapping the Platform to Model Users Most users are likely to begin with the Web Portal, to become familiar with the LSST data set and query smaller subsets of data for “at home” analysis. Some may use the tools they’re accustomed to (e.g., TOPCAT, Aladin, AstroPy, etc.) to grab the data using LSST’s VO-compatible APIs. Some users may choose to continue their analysis by utilizing resources available to them at the DAC. They’ll access these through Jupyter notebook-type remote interfaces, with access to a mid-sized computing cluster. It’s quite possible that a large fraction of end-user (“single PI”) science may be achievable this way. For users who need larger resources, they may be able to apply for more resources at adjacent computing facilities. For example, U.S. computing is located in the National Petascale Computing Facility at the National Center for Supercomputing Applications (NCSA). Significant additional supercomputing is expected to be available at the same site (e.g., NPCF currently hosts the Blue Waters supercomputer). Finally, rights-holders may utilize their own computing facilities to support larger-scale processing or even put up their own Data Access Centers. As they’re open source, they may re-use our software (pipelines, middleware, databases) to the extent possible.

Putting it all together: the LSST Science Platform Portal JupyterLab User Databases LSST Science Platform Software Tools User Computing User Files Data Releases Alert Streams Web APIs Internet LSST Users For more details, see the “LSST Science Platform Vision Document”, http://ls.st/lsp

Backups

Level 3: Added-value Data Products Level 3 Data Products: Added-value products created by the community These may enable science use-cases not fully covered by what we’ll generate in Level 1 and 2: Custom processing of deep drilling fields SNe photometry (e.g. CFHT-LS type forward modeling) Extremely crowded field photometry (e.g., globular clusters) Characterization of diffuse structures (e.g., ISM) Custom measurement algorithms Catalogs of SNe light echos The user computing and storage present in the LSST Science Platform are meant to enable next-to-the-data realization of use cases like the ones above. Level 3 software/data products may be migrated to Level 2 (with owners’ permission); this is one of the ways how Level 2 products will evolve.