Sky Query: A distributed query engine for astronomy

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
Spatial (or N-Dimensional) Search in a Relational World Jim Gray.
Recommendations for a Table Access Protocol Ray Plante, Tamas Budavari, Gretchen Greene, John Goode, Tom McGlynn, Maria Nieto-Santistaban, Alex Szalay,
Viewing and Features ShowSky - a Jini aware Applet/API astronomical archive discovery tool Object Design and Implementation Guide Star Catalog-II Jini.
Data Mining, ADQL, & The National Virtual Observatory's OpenSkyQuery Utility by Richard Doc Kinne, KQR 2008 AAVSO Fall Conference Nantucket, MA.
9 September 2005NVO Summer School Aspen Astronomical Dataset Query Language (ADQL) Ray Plante T HE US N ATIONAL V IRTUAL O BSERVATORY.
VO Standards – Catalog Access Tamás Budavári Johns Hopkins University.
Demonstration of VO Tools and Technology Tamás Budavári Johns Hopkins University.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Rule-based Cross-matching of Very Large Catalogs Patrick Ogle and the NED Team IPAC, California Institute of Technology.
1Key – Report Creation with DB2. DB2 Databases Create Domain for DB2 Test Demo.
László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 1 Eötvös Loránd University, Budapest,
AstroDAS: Sharing Assertions across Astronomy Catalogues through Distributed Annotation Rajendra Bose, Robert G. Mann, Diego Prina-Ricotti Digital Curation.
Web + VO + Database Technologies = HLA Footprints STScI: Gretchen Greene, Steve Lubow, Brian McLean, Rick White and the HLA Team JHU: Alex Szalay and Tamas.
CASJOBS: A WORKFLOW ENVIRONMENT DESIGNED FOR LARGE SCIENTIFIC CATALOGS Nolan Li, Johns Hopkins University.
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
Study of a Paper about Genetic Algorithm For CS8995 Parallel Programming Yanhua Li.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Probabilistic Cross-Identification of Astronomical Sources Tamás Budavári Alexander S. Szalay María Nieto-Santisteban The Johns Hopkins University.
Multiple Tiers in Action
Astro-DISC: Astronomy and cosmology applications of distributed super computing.
Inner join, self join and Outer join Sen Zhang. Joining data together is one of the most significant strengths of a relational database. A join is a query.
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
László Dobos, Tamás Budavári, Alex Szalay, István Csabai Eötvös University / JHU Aug , 2008.IDIES Inaugural Symposium, Baltimore1.
EdSkyQuery-G Overview Brian Hills, December
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing.
Sensor Network Databases1 Overview: Chapter 6  Sensor Network Databases  Sensor networks are conceptually a distributed DB  Store collected data  Indexes.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
How to build your own SkyNode A quick tutorial by Alberto Conti & Bernie Shiao Space Telescope Science Institute Baltimore, MD
DC2 Postmortem Association Pipeline. AP Architecture 3 main phases for each visit –Load: current knowledge of FOV into memory –Compute: match difference.
1 Database Management Systems: part of the solution or part of the problem? Clive Page 2004 April 28.
* Working Group 4. 2 AstroGrid-D Meeting, Heidelberg Tobias Scholl Astrometric Matching Prototype (D4.2) 50 RASS-BSC sources Correlation with.
P Structured Query Language for Virtual Observatory Yuji Shirasaki National Astronomical Observatory of Japan, and Masahiro Tanaka (NAOJ), Satoshi.
Semantic Access to Existing Archives Using RDF and SPARQL Alasdair J G Gray.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Association techniques for the Virtual Observatory Bob Mann.
10 Billion Piece Jigsaw Puzzles John Cleary Real Time Genomics.
IVOA Interoperalibity JVO Query Language Naoki Yasuda (NAOJ/Japanese VO)
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
2003 May 24Clive Page Implementation of XMATCH function.
The COMPASS (Catalogs of Objects and Measure Parameters for All Sky Surveys) Database Overview Gretchen Greene, Brian McLean, David Wolfe, and Charles.
Indexes and Views Unit 7.
Recent spatial work by Jim Gray and Alex Szalay Bob Mann.
Pan-STARRS PS1 Published Science Products Subsystem Presentation to the PS1 Science Council August 1, 2007.
Mireille Louys et al., OV-France Theory WG meeting, LYON, June DALIA : an observation vs simulation comparison frame work What could be a Model.
JVO portal service Yuji Shirasaki National Astronomical Observatory of Japan.
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
William O’Mullane/ Tannu Malik - JHU IVOA Cambridge May 12-16, 2003 SkyQuery.Net SKYQUERY Federated Database Query System (using WebServices)
May 2006IVOA Victoria, Canada1 VOQL Where do we stand? What is left? Yuji Shirasaki JVO Maria A. Nieto-Santisteban JHU T HE US N ATIONAL V IRTUAL O BSERVATORY.
June 27-29, DC2 Software Workshop - 1 Tom Stephens GSSC Database Programmer GSSC Data Servers for DC2.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Key Terms Attribute join Target table Join table Spatial join.
Catalogs contain hundreds of millions of objects
Top 8 Best Programming Languages To Learn
How to use the GALEX SkyNode*
Bridging the Data Science and SQL Divide for Practitioners
Spark Presentation.
A Black-Box Approach to Query Cardinality Estimation
Cross-matching the sky with database server cluster
Sameh Shohdy, Yu Su, and Gagan Agrawal
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
Progress Report of VOQL WG
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Efficient Catalog Matching with Dropout Detection
LSST, the Spatial Cross-Match Challenge
Presentation transcript:

Sky Query: A distributed query engine for astronomy László Dobos1, Tamás Budavári2, Alex Szalay2, István Csabai1 1 Eötvös Loránd University, Hungary 2 Johns Hopkins University, Baltimore Sky Query: A distributed query engine for astronomy

The multiwavelength sky infrared (2MASS) visible (DSS) ultraviolet (Galex)

Crossmatching Astronomical catalogs Done by coordinates in RDBMS o(100 million) objects o(1TB – 10TB) DB size Done by coordinates RA, Dec Astrometric error Different sky coverage Different wavelength range Moving objects etc.

Crossmatching on demand Crossmatch any number of catalogs All combinations cannot be precomputed Maybe catalog pairs? User can specify List of catalogs to match Region of interes Priors for non-coordinate-based matching

Problem description Astronomers „script” what they do multiple re-runs, tweak parameters etc. huge web forms: no-no All data in RDBMS run computation inside the database use multiple servers and parallelize must be transparent for users Problem description in SQL functions and language extensions to support astronomy syntax to formulate the coordinate-based probabilistic join spatial constraints: celestial regions

Sample SQL query Standard SQL Probabilistic crossmatch SELECT s.objId, g.objID, t.objID, s.ra, s.dec, g.ra, g.dec, t.ra, t.dec, x.ra, x.dec FROM SDSSDR7:Galaxies AS s CROSS JOIN Galex:Galaxies AS g CROSS JOIN TwoMASS:ExtendedSources AS t XMATCH BAYESIAN AS x MUST s ON POINT(s.cx, s.cy, s.cz), 0.1 MUST g ON POINT(g.ra, g.dec), 0.2 MAY t ON POINT(t.ra, t.dec), 0.5 HAVING LIMIT 1e3 REGION CIRCLE J2000 165.7, 0.3, 60 Standard SQL Probabilistic crossmatch Spatial constraint

Zone algorithms Pure SQL: Can leverage from query optimizer of SQL Server Divide sphere into zones ZoneID: very simple hash on declination Indexes built on ZoneID and right ascension help very quick pre-filtering of match candidates very well parallelized on multi-core machines [Gray, Szalay & Nieto-Santisteban 2006, The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets]