Download presentation
Presentation is loading. Please wait.
Published bySherilyn Flowers Modified over 9 years ago
1
László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 dobos@complex.elte.hu, budavari@jhu.edu 1 Eötvös Loránd University, Budapest, Hungary Department of Physics of Complex Systems 2 The Johns Hopkins University, Baltimore, USA Department of Physics & Astronomy 24 th SSDBM conference, 25-27 June, 2012 ‒ Chania, Crete, Greece
2
Astronomical catalogs in RDBMS o(100 million) objects o(1TB – 10TB) DB size Done by coordinates RA, Dec Astrometric error Different sky coverage Different wavelength range Moving objects etc.
3
infrared (2MASS)visible (DSS)ultraviolet (Galex)
4
All data in RDBMS run computation inside the database use multiple servers and parallelize must be transparent for users Astronomers „script” what they do multiple re-runs, tweak parameters etc. huge web forms: no-no Use SQL to formulate the problem functions and language extensions to support astronomy extra syntax to describe the coordinate-based probabilistic join spatial constraints: celestial regions
5
SELECT s.objId, g.objID, t.objID, s.ra, s.dec, g.ra, g.dec, t.ra, t.dec, x.ra, x.dec FROM SDSSDR7:Galaxies AS s CROSS JOIN Galex:Galaxies AS g CROSS JOIN TwoMASS:ExtendedSources AS t XMATCH BAYESIAN AS x MUST s ON POINT(s.cx, s.cy, s.cz), 0.1 MUST g ON POINT(g.ra, g.dec), 0.2 MAY t ON POINT(t.ra, t.dec), 0.5 HAVING LIMIT 1e3 REGION CIRCLE J2000 165.7, 0.3, 60 Standard SQL Probabilistic crossmatch Spatial constraint
6
Custom SQL query Parsing Spatial Partitioning Workflow of many traditional SQL queries Job queue Parallel Execution
7
SQL is declarative everything can be executed that can be expressed extensions must be executable in any case Query optimization is hard Design language with easy optimization in mind constrain on the level of the grammar custom clauses instead of complex where clause logic
8
SkyQuery Web Interface Job Scheduler Cluster Registry Graywulf Database Server Cluster Remote Virtual Observatory Data Source Internet SDSS × 2MASS = ? XMATCH query SQL queries MyDB
9
Registry complete description of the server cluster from machine group to disk volumes contains all info for optimal database allocation Management tools allocate, resize, copy, mirror etc. databases monitor cluster status Scheduler co-location aware query execution jobs implemented as workflows .Net Workflow Foundation (WF4) parallel execution out-of-the-box extensive logging, persistence, retry logic etc.
10
SQL parser generator supports grammar inheritence easy to add custom extensions to plain SQL Metadata tools Tag SQL scripts with metadata Make it accessible from web interface Extract provenance information from user queries User web interface write and submit queries access to own database (MyDB) Jobs Crossmatch workflow, etc.
11
Current system: focus on astronomy/crossmatch Implement spatial constraints Extend to a generic framework + API mirroring, sharding of datasets query partitioning limited distributed joins transparent access to remote datasets smart caching of remote data / query results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.