July 8, 2008SLAC Annual Program ReviewPage 1 LSST Data Management and Access Jacek Becla LSST Data Access & Database Technology Group Leader.

Slides:



Advertisements
Similar presentations
Large Scale Knowledge Management across Media Prof. Fabio Ciravegna, Department of Computer Science University of Sheffield
Advertisements

GENI: Global Environment for Networking Innovations Larry Landweber Senior Advisor NSF:CISE Joint Techs Madison, WI July 17, 2006.
7 +/- 2 Maybe Good Ideas John Caron June (1) NetCDF-Java (aka CDM) has lots of functionality, but only available in Java – NcML Aggregation – Access.
SciDB An Open Source Data Base Project by Michael Stonebraker (and others) 1.
1 SLAC National Accelerator Laboratory Amber Boehnlein October 18, 2011.
Welcome!. Goals XLDB Goals 1.Identify trends, commonalities and major roadblocks related to building extremely large databases 2.Bridge the gap between.
Unlock Your Data Rich connectivity Robust data integration Enterprise-class manageability Deliver Relevant Information Intuitive design environment.
Grand Challenges Robert Moorhead Mississippi State University Mississippi State, MS 39762
GIS at SDSC Domains: –From geology, environmental science, hydrology, ocean biodiversity, regional development, Katrina response, archaeology, to neuroscience.
Welcome!.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Evolution in Coming 10 Years: What's the Future of Network? - Evolution in Coming 10 Years: What's the Future of Network? - Big Data- Big Changes in the.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
CERN IT Department CH-1211 Geneva 23 Switzerland t XLDB 2010 (Extremely Large Databases) conference summary Dawid Wójcik.
Waters Corporation Connecting Data to Decisions John Swallow Principal Engineer Waters Data Products
An Introduction to the Open Science Data Cloud Heidi Alvarez Florida International University Robert L. Grossman University of Chicago Open Cloud Consortium.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Collaborative Management Environment: Merging R&D Tracking and Electronic Proposal Submission R&D Council Meeting March 12, 1999 Dr. Thomas E. Potok.
HPC and e-Infrastructure Development in China’s High- tech R&D Program Danfeng Zhu Sino-German Joint Software Institute (JSI), Beihang University Dec.
Ch 4. The Evolution of Analytic Scalability
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
© Crown copyright Met Office Open Platform and ACRE
INTERNET2 COLLABORATIVE INNOVATION PROGRAM DEVELOPMENT Florence D. Hudson Senior Vice President and Chief Innovation.
H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Science Archive for Sky Surveys Data Providers and the VO - NeSC 2003 March Wide Field Astronomy Unit Institute for Astronomy.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
Copyright © 2012 Axceleon Intellectual Property All rights reserved HPC User Forum, Dearborn MI. Our Focus: Enable HPC solutions in the Cloud for our Customer.
Deb Agarwal abd Marty Humphrey e Norman Beekwilder e Monte Goode abd
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
October 21, 2015 XSEDE Technology Insertion Service Identifying and Evaluating the Next Generation of Cyberinfrastructure Software for Science Tim Cockerill.
Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008.
Dr Niall Smith Cork Institute of Technology Blackrock Castle Observatory Experiences from CIT Blackrock Castle Observatory GLORIA Community Open Day May.
DC2 Post-Mortem/DC3 Scoping February 5 - 6, 2008 DC3 Goals and Objectives Jeff Kantor DM System Manager Tim Axelrod DM System Scientist.
1 Analysis with Extremely Large Datasets Jacek Becla SLAC National Accelerator Laboratory CHEP’2012 New York, USA.
The Library The HEP Databases & The Changing Science at SLAC.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
WHAT OUR CUSTOMERS ARE SAYING “After thorough market research and a review process, Qorus Breeze Proposals stood out from the competitors because of its.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
The Science and Fiction of Petascale Analytics Jacek Becla Stanford Linear Accelerator Center.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
August 3, March, The AC3 GRID An investment in the future of Atlantic Canadian R&D Infrastructure Dr. Virendra C. Bhavsar UNB, Fredericton.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop BISQUE.
Future Grid Future Grid Overview. Future Grid Future GridFutureGridFutureGrid The goal of FutureGrid is to support the research that will invent the future.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
1 XLDB Jacek Becla SLAC Technology Officer for Scientific Databases XLDB Chairman LSST Database & Data Access Manager.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Connect communicate collaborate GÉANT - The GN3 Project Goals - Challenges - Vision Hans Döbbeling, DANTE TNC 2009, Malaga,
March, The C3 GRID An investment in the future of Canadian R&D Infrastructure.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
J. Templon Nikhef Amsterdam Physics Data Processing Group “Grid” Computing J. Templon SAC, 26 April 2012.
LSST CORPORATION Patricia Eliason LSSTC Executive Officer Belgrade, Serbia 2016.
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Geoffrey Fox Panel Talk: February
The NSRC cultivates collaboration among a community of peers to build and improve a global Internet that benefits all parties. We facilitate the growth.
Future Trends in Nuclear Physics Computing Workshop
Prototyping the Next EPICS Archiver
XtremeData on the Microsoft Azure Cloud Platform:
Quasardb Is a Fast, Reliable, and Highly Scalable Application Database, Built on Microsoft Azure and Designed Not to Buckle Under Demand MICROSOFT AZURE.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
SciDB An Open Source Data Base Project by Michael Stonebraker (and others) 1.
Presentation transcript:

July 8, 2008SLAC Annual Program ReviewPage 1 LSST Data Management and Access Jacek Becla LSST Data Access & Database Technology Group Leader

July 8, 2008SLAC Annual Program ReviewPage 2 Databases and SLAC *Over a decade experience SLAC’s core competency (1 of 4): Ultra-large database management for users and collaborations distributed worldwide

July 8, 2008SLAC Annual Program ReviewPage 3 LSST Peta-scale Architecture & Analyses *O(100) PB system –55 PB pixel data –20+ PB derived products –Virtual data *All data public, accessed by –Professional astronomers –Amateur astronomers –General public Challenge & Opportunity

July 8, 2008SLAC Annual Program ReviewPage 4 Cutting-edge Features For Cutting-edge Science *Queryable, shareable user annotations *Complete provenance tracking *Flexible / extendable schema *Support for uncertainty / fuzzy joins *Seamless integration of catalogs with pixel data *Scalable, fast, fault-tolerant and cost-effective

July 8, 2008SLAC Annual Program ReviewPage 5 Query Complexity – Pushing the Limits *Example queries –Near neighbor searches for arbitrary regions –Complex time series analysis find all pairs of objects with similar time series *Simplified interfaces –Common languages (C++, python) –Likely through common tools (R, IDL, MATLAB)

July 8, 2008SLAC Annual Program ReviewPage 6 How? *Use shared-nothing MPP columnar-like Data Management System *Run on cloud / commodity hardware *Push computation to data *Support natively arrays & operations on arrays *Aggressively compress (lossless) *Share scans *Build provenance, lightweight uncertainty and other features into DMS

July 8, 2008SLAC Annual Program ReviewPage 7 1st XLDB Workshop *October 2007 at SLAC *Participation –Data-intensive science & industries, database researchers and vendors *Goals –Identify trends, bridge gaps *Very successful –Science – db research collaboration strongly encouraged

July 8, 2008SLAC Annual Program ReviewPage 8 SciDB Mini-Workshop *March 2008 in Asilomar *Participation –Database researchers + data-intensive science representatives (HEP, Astro, Bio, Remote Sensors, Fusion) *Goals –Discuss common science db-requirements –Stimulate database research *Very successful –Agreed to explore avenue of building new open-source science-oriented DBMS. Led by Michael Stonebraker and David DeWitt

July 8, 2008SLAC Annual Program ReviewPage 9 Why “sciDB” *Requirements novel, unlikely to be met by existing vendors –Arrays, spatial/temporal support, provenance, uncertainty, versioning *Large scale and complexity prohibits roll-your-own approach *Overlap increasing, including: –Science: astronomy, biology, photon science, physics, geoscience (geology, oceanography, atmospheric science, environmental science) –Commercial applications (R&D and non-R&D): remote sensing, resource extraction (oil, gas, minerals), medical imaging, pharmaceuticals, internet

July 8, 2008SLAC Annual Program ReviewPage 10 Open Source, Science-DBMS: Making It Real (1) *Science partners –Put up some resources, provide requirements, use cases, tests *CS Database Brain Trust –Design, direct building of the system, provide some resources *Industrial partners –Provide funding/resources, share experience *Company –Manage open source project, contribute engineering, provide support, services, PR

July 8, 2008SLAC Annual Program ReviewPage 11 Open Source, Science-DBMS: Making It Real (2) *Science partners –Initial partners: LSST/SLAC, PNNL, LLNL, FermiLab, UCSB Expecting to reach more labs/projects via SciDB Science Board –Use cases, requirements (all) –1 FTE (LSST), office space (SLAC) –Continuing to look for support from other labs, DoE and NSF *CS Database Brain Trust –Assembled *Industrial partners –Initial partners: eBay, Microsoft, Vertica –Strong interest at Amazon and Facebook *Company –New startup or Vertica initiative eBay's use cases & requirements very similar to LSST

July 8, 2008SLAC Annual Program ReviewPage 12 Open Source, Science-DBMS: Making It Real (3) *Funding available –eBay, Microsoft, VCs *Design in progress –Array data model –All requested features feasible *Beta expected 4Q'09 *XLDB2 planned for this fall (Sept 29/30) LSST Timescales: –R&D ends 4Q’10 –Construction begins 1Q’11 –First light in late 2014 or 2015 –Data taking for 10 years

July 8, 2008SLAC Annual Program ReviewPage 13 Summary *SLAC leads the design of the O(100) PB LSST Database & Data Access System *Open-source, science oriented DBMS is becoming a reality –Led by most influential database gurus –Designed by most experienced database engineers –In collaboration with big industrial partners *LSST DM system will enable unprecedented analyses in intuitive & cost-effective way –Will likely make big positive impact on complex scientific analytics and beyond