Download presentation
Presentation is loading. Please wait.
1
July 8, 2008SLAC Annual Program ReviewPage 1 LSST Data Management and Access Jacek Becla LSST Data Access & Database Technology Group Leader
2
July 8, 2008SLAC Annual Program ReviewPage 2 Databases and SLAC *Over a decade experience SLAC’s core competency (1 of 4): Ultra-large database management for users and collaborations distributed worldwide
3
July 8, 2008SLAC Annual Program ReviewPage 3 LSST Peta-scale Architecture & Analyses *O(100) PB system –55 PB pixel data –20+ PB derived products –Virtual data *All data public, accessed by –Professional astronomers –Amateur astronomers –General public Challenge & Opportunity
4
July 8, 2008SLAC Annual Program ReviewPage 4 Cutting-edge Features For Cutting-edge Science *Queryable, shareable user annotations *Complete provenance tracking *Flexible / extendable schema *Support for uncertainty / fuzzy joins *Seamless integration of catalogs with pixel data *Scalable, fast, fault-tolerant and cost-effective
5
July 8, 2008SLAC Annual Program ReviewPage 5 Query Complexity – Pushing the Limits *Example queries –Near neighbor searches for arbitrary regions –Complex time series analysis find all pairs of objects with similar time series *Simplified interfaces –Common languages (C++, python) –Likely through common tools (R, IDL, MATLAB)
6
July 8, 2008SLAC Annual Program ReviewPage 6 How? *Use shared-nothing MPP columnar-like Data Management System *Run on cloud / commodity hardware *Push computation to data *Support natively arrays & operations on arrays *Aggressively compress (lossless) *Share scans *Build provenance, lightweight uncertainty and other features into DMS
7
July 8, 2008SLAC Annual Program ReviewPage 7 1st XLDB Workshop *October 2007 at SLAC *Participation –Data-intensive science & industries, database researchers and vendors *Goals –Identify trends, bridge gaps *Very successful –Science – db research collaboration strongly encouraged
8
July 8, 2008SLAC Annual Program ReviewPage 8 SciDB Mini-Workshop *March 2008 in Asilomar *Participation –Database researchers + data-intensive science representatives (HEP, Astro, Bio, Remote Sensors, Fusion) *Goals –Discuss common science db-requirements –Stimulate database research *Very successful –Agreed to explore avenue of building new open-source science-oriented DBMS. Led by Michael Stonebraker and David DeWitt
9
July 8, 2008SLAC Annual Program ReviewPage 9 Why “sciDB” *Requirements novel, unlikely to be met by existing vendors –Arrays, spatial/temporal support, provenance, uncertainty, versioning *Large scale and complexity prohibits roll-your-own approach *Overlap increasing, including: –Science: astronomy, biology, photon science, physics, geoscience (geology, oceanography, atmospheric science, environmental science) –Commercial applications (R&D and non-R&D): remote sensing, resource extraction (oil, gas, minerals), medical imaging, pharmaceuticals, internet
10
July 8, 2008SLAC Annual Program ReviewPage 10 Open Source, Science-DBMS: Making It Real (1) *Science partners –Put up some resources, provide requirements, use cases, tests *CS Database Brain Trust –Design, direct building of the system, provide some resources *Industrial partners –Provide funding/resources, share experience *Company –Manage open source project, contribute engineering, provide support, services, PR
11
July 8, 2008SLAC Annual Program ReviewPage 11 Open Source, Science-DBMS: Making It Real (2) *Science partners –Initial partners: LSST/SLAC, PNNL, LLNL, FermiLab, UCSB Expecting to reach more labs/projects via SciDB Science Board –Use cases, requirements (all) –1 FTE (LSST), office space (SLAC) –Continuing to look for support from other labs, DoE and NSF *CS Database Brain Trust –Assembled *Industrial partners –Initial partners: eBay, Microsoft, Vertica –Strong interest at Amazon and Facebook *Company –New startup or Vertica initiative eBay's use cases & requirements very similar to LSST
12
July 8, 2008SLAC Annual Program ReviewPage 12 Open Source, Science-DBMS: Making It Real (3) *Funding available –eBay, Microsoft, VCs *Design in progress –Array data model –All requested features feasible *Beta expected 4Q'09 *XLDB2 planned for this fall (Sept 29/30) LSST Timescales: –R&D ends 4Q’10 –Construction begins 1Q’11 –First light in late 2014 or 2015 –Data taking for 10 years
13
July 8, 2008SLAC Annual Program ReviewPage 13 Summary *SLAC leads the design of the O(100) PB LSST Database & Data Access System *Open-source, science oriented DBMS is becoming a reality –Led by most influential database gurus –Designed by most experienced database engineers –In collaboration with big industrial partners *LSST DM system will enable unprecedented analyses in intuitive & cost-effective way –Will likely make big positive impact on complex scientific analytics and beyond
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.