API Aspect of the Science Platform Fritz Mueller, DAX T/CAM August 16, 2017
APIs supporting the Science Platform Web APIs: VO standard protocols where possible; custom extensions where necessary; proposed back to IVOA when applicable Catalog & other tabular data (dbserv) Image data (imgserv) Metadata (metaserv) Python objects (Data Butler) SQL database (Qserv) LSST2017 - Tucson, AZ - August 14 - 18, 2017
Web APIs: Catalog & Other Tabular Data (dbserv) TAP (TAP 1.1, once recommended status) ADQL (ADQL 2.1, once recommended status) Return data formats: VOTable (1.3 per TAP 1.1) Additionally: JSON and SQLite gzip compression through HTTP LSST2017 - Tucson, AZ - August 14 - 18, 2017
Web APIs: Image Data (imgserv) Discovery: SIA (SIA V2) Also TAP to database per IVOA ObsCore Data Model and CAOM2 Retrieval: Direct via SIA-returned URL when possible, or SODA Format: at least FITS; others could be added LSST2017 - Tucson, AZ - August 14 - 18, 2017
Web APIs: Metadata (metaserv) Catalog and image metadata IVOA ObsCore DM, VOResource/RegTap, CAOM2 where possible Otherwise, TAP access to ”native” metadata tables LSST2017 - Tucson, AZ - August 14 - 18, 2017
Python Object API (Data Butler) Abstracts storage technology, layout, and data encoding, via a generic key/value data identification API and plug-in architecture Rich, in-memory, Python objects in and out How pipeline payloads deal with LSST data products LSST2017 - Tucson, AZ - August 14 - 18, 2017
LSST2017 - Tucson, AZ - August 14 - 18, 2017 SQL Database (Qserv) A custom MPP RDBMS front-end, developed at SLAC, to meet the scaling and performance requirements of LSST LSST2017 - Tucson, AZ - August 14 - 18, 2017
SQL Database (Qserv) cont. Leverages existing OSS technologies on the back-end Data shards hosted in MariaDB database instances XRootD for scalable data-addressed message and data transport Spherical geometry w/ overlap Advanced shared-scan architecture to support high level of concurrency Optimized for immutable data use case LSST2017 - Tucson, AZ - August 14 - 18, 2017
Verification and Validation PDAC (Prototype Data Access Center): Integration environment with rest of Science Platform At non-trivial scale (~100 TB currently), scientifically valid data (SDSS Stripe82, IRSA AllWISE currently; NEOWISE & HSC upcoming) Feedback from SUIT team on functionality of service endpoints Feedback from small group of science users on performance, accuracy, usability, and appropriateness of design First PDAC user report: DMTR-22 LSST2017 - Tucson, AZ - August 14 - 18, 2017
Verification and Validation, cont. Qserv Scaling and Performance Tests (LDM-552): Graduated series of data-challenge style tests, on a glide-path toward project requirements for DR1: Most recent test report: DMTR-16 LSST2017 - Tucson, AZ - August 14 - 18, 2017
Status: Web APIs & Data Butler Preliminary implementations running in the PDAC, mostly custom interfaces, receiving feedback from SUIT team Current development focused on adding/extending VO protocol support Data Butler: Basic implementation has seen heavy use by Science Pipelines group Current development focused on architectural cleanup and implementation of additional backend storage/format plugins LSST2017 - Tucson, AZ - August 14 - 18, 2017
LSST2017 - Tucson, AZ - August 14 - 18, 2017 Status: Qserv Qserv: Three running instances (~30 physical nodes ea.) in continuous operation: dedicated hardware cluster at NCSA, early adoption also (2 clusters) at CC-IN2P3 Scale testing with databases up to ~70 TB, expected to cross 100 TB this year. On track to meet or exceed performance requirements Work in progress on data distribution/replication for data durability, improvement of auxiliary tooling (e.g. deployment, data ingest), improved query management (async/status) LSST2017 - Tucson, AZ - August 14 - 18, 2017
LSST2017 - Tucson, AZ - August 14 - 18, 2017 Selected Milestones Capability L2 Milestones Cycle VO web APIs Ingest infrastructure revamp for HSC reprocessing LDM-503-1 LDM-503-2 Fall 2017 Transformed EFD Schema Design Butler retrieves images from Data Backbone LSST-1220 LDM-503-3 LDM-503-4 Spring 2018 Services integrated with NCSA auth system Provenance system Qserv single-master fault-tolerance LDM-503-7 LDM-503-9 Fall 2018 User table import/export Calibration database Qserv multi-master fault-tolerance LDM-503-11 Fall 2019 Image regeneration Next-to-database processing Security audit LDM-503-13 Fall 2020 LSST2017 - Tucson, AZ - August 14 - 18, 2017