LSST, the Spatial Cross-Match Challenge

Slides:



Advertisements
Similar presentations
Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
Advertisements

Spatial (or N-Dimensional) Search in a Relational World Jim Gray.
Demonstration of VO Tools and Technology Tamás Budavári Johns Hopkins University.
Searching for Electromagnetic Counterparts of Gravitational-Wave Transients Marica Branchesi (Università di Urbino/INFN) on behalf of LIGO Scientific Collaboration.
Lecture 21 updates. Hubble’s STIS Spectrograph Please include this image at the start of the images of STIS.
MAST-VizieR/NED cross correlation tutorial 1. Introduction Figure 1: Screenshot of the MAST VizieR Catalog Search Form. or enter here as object class:
Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing ADASS London, UK September 23-26, 2007 Jacek Becla 1, Kian-Tat Lim 1,
The Milky Way PHYS390 Astrophysics Professor Lee Carkner Lecture 19.
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
Growth of Structure Measurement from a Large Cluster Survey using Chandra and XMM-Newton John R. Peterson (Purdue), J. Garrett Jernigan (SSL, Berkeley),
The Transient Universe: AY 250 Spring 2007 Existing Transient Surveys: Optical I: Big Apertures Geoff Bower.
X-ray sources in NSVS Tim McKay University of Michigan 04/03/04.
Part 5: The Galaxy and the Universe In this final part of the course, we will: 1. Look at the big spatial picture: Are there organizations of stars? What.
László Dobos, Tamás Budavári, Alex Szalay, István Csabai Eötvös University / JHU Aug , 2008.IDIES Inaugural Symposium, Baltimore1.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 New Frontiers with LSST: leveraging world facilities Tony Tyson Director, LSST Project University of California, Davis Science with the 8-10 m telescopes.
DC2 Postmortem Association Pipeline. AP Architecture 3 main phases for each visit –Load: current knowledge of FOV into memory –Compute: match difference.
Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang &
Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics.
The Pan-STARRS Data Challenge Jim Heasley Institute for Astronomy University of Hawaii.
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
1 Imaging Surveys: Goals/Challenges May 12, 2005 Luiz da Costa European Southern Observatory.
LSST and VOEvent VOEvent Workshop Pasadena, CA April 13-14, 2005 Tim Axelrod University of Arizona.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
DOES DARK MATTER REALLY EXIST? Prof. Megan Donahue Science Media Group MSU Physics & Astronomy Dept. Harvard-Smithsonian Center for Astrophysics.
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
Tomorrow, I will check Science Journals! Bell Work for weeks 2, 3 and 4 Make sure that all your answers are highlighted! All our work from the last 3.
IPHAS Early Data Release E. A. Gonzalez-Solares IPHAS Consortium AstroGrid National Astronomy Meeting, 2007.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Spotting the life of stars „Pi of the Sky” Project Katarzyna Kwiecińska UKSW-SNŚ on behalf of the Pi of the Sky collaboration.
Exploring an evidence of supermassive black hole binaries in AGN with MAXI Naoki Isobe (RIKEN, ) and the MAXI
Extragalactic Survey with MAXI and First MAXI/GSC Catalog Extragalactic Survey with MAXI and First MAXI/GSC Catalog Yoshihiro Ueda Kazuo Hiroi, Naoki Isobe,
Slide 1 PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Scalability of Local Image Descriptors Björn Þór Jónsson Department of Computer Science Reykjavík University Joint work with: Laurent Amsaleg (IRISA-CNRS)
Chapter 20: The Milky Way. William Herschel’s map of the Milky Way based on star counts In the early 1800’s William Herschel, the man who discovered the.
Pete Kuzma PhD student, Research School of Astronomy and Astrophysics
Medical Image Analysis
A.Zanichelli, B.Garilli, M.Scodeggio, D.Rizzo
“ Who will I blame my mistakes on. ” Dr
Navigating the Night Sky
Part 5: The Galaxy and the Universe
Parallel Databases.
1B11 Foundations of Astronomy Glossary of terms
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
C.Baltay and S. Perlmutter December 15, 2014
An improved method for estimating the efficiency of GW detectors
LHAASO-WCDA: Design & Performance
Cross-matching the sky with database server cluster
Fermi LAT Limits on High-Energy Gamma Lines from WIMP Annihilation
Sky Query: A distributed query engine for astronomy
Introduction to Query Optimization
Jessica L. Rosenberg George Mason University
Examples of Physical Query Plan Alternatives
The Key to the Database Engine
H205 Cosmic Origins APOD Today: Galaxy Evolution (Ch. 21)
Source Detection by Count Excess
Planning Observations
The “Milky Way”.
Point Sources Jacob Feintzeig WIPAC − May 21, 2014
Gyrochronology: Aging Nearby, Debris Disk Candidate Stars
Galaxies.
Using Single Photons for WIMP Searches at the ILC
Query Processing CSD305 Advanced Databases.
Dynamic Processes Shape Spatiotemporal Properties of Retinal Waves
Tamara C. Bidone, Haosu Tang, Dimitrios Vavylonis  Biophysical Journal 
CALET-CALによる ガンマ線観測初期解析
Efficient Catalog Matching with Dropout Detection
Detecting Dark Clouds in the Galactic Plane with 2MASS data
by W. R. Binns, M. H. Israel, E. R. Christian, A. C. Cummings, G. A
Presentation transcript:

LSST, the Spatial Cross-Match Challenge Maria Nieto-Santisteban Alexander Szalay Ani Thakar The Johns Hopkins University Jim Gray Microsoft Research

What is Cross-Matching? Identify point(s) in A with point(s) in B Cones: Find points nearby one point Distance from few arcseconds to few degrees Neighborhood: points nearby points Distance from few arcseconds to very few arcminutes Decide whether those points share more than just their position Points: Search around single position versus All-to-All match with arbitrary distance. Radius: Cone searches may go from small to very big All-to-All tend to be small. Otherwise we face combinatorial explosion. It is not a matter of radius but density, number of objects Whether is better to use cones or neighbors it may depend on the ratio between cardinalities and relative dispersion Maria A. Nieto-Santisteban / ADASS 2006

Maria A. Nieto-Santisteban / ADASS 2006 Zones Bin the data ZoneID = floor ((dec + 90.0) /zoneHeight) Place the data close on disk Cluster Index on ZoneID, Ra Trick required to handle the (360,0) Efficient Cones Neighbors (especially) Useful Partition the data Distribute workload Maria A. Nieto-Santisteban / ADASS 2006

Maria A. Nieto-Santisteban / ADASS 2006 Zone Table ObjID ZoneID* RA Dec CX CY CZ … 1 0.0 -90.0 2 20250 180.0 3 181.0 4 40500 360.0 +90.0 * Using a zone height of 8 arcsec in this example Maria A. Nieto-Santisteban / ADASS 2006

Maria A. Nieto-Santisteban / ADASS 2006 Declination vs. Zone, RA Order by Dec This slide shows two ways of ordering (“clustering”) catalog data on disk: by DECLINATION or by (ZONE, RA) (WHAT DO THE NUMBERS AND THEIR LOCATIONS MEAN?) The numbers in the figures are indexes that indicate the order on the disk. The positions of the indexes in the figure indicate the location of the object in a simulated LSST image of the Galactic center. The LSST image extends 3 degrees to the right and up off the top of the screen. (WHAT PROBLEM ARE WE TRYING TO SOLVE?) Now imagine we want to find neighbors within 8 arcsec of the object in blue. (WHAT IS THE WORST APPROACH?) The worst approach would be to calculate the distance from the blue object to all 2.5 million sources in the image. This would be very expensive because only 7 out of the 2.5 million distances would be within 8 arcsec (WHAT ABOUT ORDERING BY DECLINATION? FEWER CALCULATIONS.) We can do much better by using DECLINATION as a cluster index, as in the figure on the left. The indexes increase from bottom to top in the figure. There are big jumps between neighbors because stars off the right edge have intermediate declinations. Using the declination index, we limit the search range to +/-8 arcsec, as shown by the horizontal red lines. Then we only have to calculate distances for about 8000 sources, which is much better than 2.5 million. (WHAT ABOUT ORDERING BY ZONE AND RA? THE FEWEST CALCULATIONS.) We can do even better by using ZONE and RA as a cluster index , as in the figure on the right. Zone index increases in coarse bins from the bottom of the figure to the top. Within each zone, index increases monotonically from left to right. See how the indexes go 1, 2, 3, 4, … in the bottom zone. Then 3925, 3926,… in the next zone. And so on. Using the cluster index, we limit the search to a (different) narrow range in right ascension for each zone. We only have to calculate distances for 8 objects, which is much better than 8000 and much, much better than 2.5 million (DISK I/O AND SEEK TIME ARE ALSO BETTER WITH ZONES.) The other key point is that with a zone index, data for the neighbors are concentrated in a few disk blocks. With the DECLINATION index, data for the neighbors is spread over many disk blocks, with only one neighbor per disk block. In other words, the ZONE approach requires less CPU, less physical I/O, and less seeking by the disk. Order by RA within Zone Maria A. Nieto-Santisteban / ADASS 2006

“Circular” Regions Near the Poles d = cos1{sin(1) sin(2) + cos(1) cos(2) cos(1 1)} Maria A. Nieto-Santisteban / ADASS 2006

Maria A. Nieto-Santisteban / ADASS 2006 SQL CrossNeighbors SELECT * FROM prObj1 z1 JOIN zoneZone ZZ ON ZZ.zoneID1 = z1.zoneID JOIN prObj2 z2 ON ZZ.ZoneID2 = z2.zoneID WHERE z2.ra BETWEEN z1.ra-ZZ.alpha AND z2.ra+ZZ.alpha AND z2.dec BETWEEN z1.dec-@r AND z1.dec+@r (z1.cx*z2.cx+z1.cy*z2.cy+z1.cz*z2.cz) > cos(radians(@r)) Maria A. Nieto-Santisteban / ADASS 2006

Number of Rows in LSST Catalogs Single Exposure Single Night End of Survey Objects N/A 51010 Variable Objects 105 108 3108 Source Detections 3106 3109 81012 DIA Source Detections ( 105 ) ( 108 ) 31011 Data access will require good data organization Data partitioned and placed according their position in the sky Sources: Single Night = DR20 / 3000 because there are 300 night/year * 10 years = 3000 nights in the survey Sources: Single Exposure = Single Night / 900 because there are 900 exposures/night. There are 10 hours = 36000 seconds/night. Each exposure takes 40 seconds (15+3+15+5 [+2]), so there are 36000/40=900 exposures/night Maria A. Nieto-Santisteban / ADASS 2006

LSST Cross-Match’s challenges Issue alerts within 60 seconds Challenge: Heavily time constrained Nightly pipeline @ archive Challenge: Database consistency Deep Processing Challenge: Volume of data to process Association complexity User queries: Challenge: Many users, many types of users, many types of queries, a lot of data to look through Maria A. Nieto-Santisteban / ADASS 2006

Maria A. Nieto-Santisteban / ADASS 2006 Alert Processing 1. Start alert clock when 2nd exposure ends 3 second readout while slewing to next field 2. Calibrate images (dark subtract, flat field) 201 CCDs = 3.2 Gpixel 3. Difference image analysis Identify and extract variable sources 4. Cross-match with object catalog Distinguish known variables and new objects Point - Time to do the cross-match and issue the alert is a small fraction of the 60 seconds budget. Maria A. Nieto-Santisteban / ADASS 2006

Maria A. Nieto-Santisteban / ADASS 2006 Alerts Data Flow Variable Catalog Deep Catalog Moving Objects 125K 6M ? Known Variable ? Known Object ? Known Mover ? DIA Sources No No Yes 128K 3K 1K Yes Yes No Alerts trigger on DIA sources Variables are a small fraction of all objects Alert ? Alert ? Alert ? Examples: Cataclysmic Variable Supernova Gamma Ray Burst Maria A. Nieto-Santisteban / ADASS 2006

Alert Simulation for Galactic Center Extrapolate USNO-B LSST FOV of 10 deg2 6 Million Stars (DR20) 126 K Variable Stars 128 K DIA sources 3200 New Variables 1000 Un-matched Moving Objects, New Objects, Transients (GRB) Match distance = 1 arcsec Maria A. Nieto-Santisteban / ADASS 2006

Alert Cross-Match Performance We can detect alerts in 40 seconds on my desktop computer. Partition by FOVs Maria A. Nieto-Santisteban / ADASS 2006

Maria A. Nieto-Santisteban / ADASS 2006 Summary Cone search != Neighbors Zones efficiently index and “join” spatial data e.g., SDSS DR5 vs. 2MASS in 80 minutes Zones are a convenient for partitioning data Simulated a LSST FOV in Galactic Center Cross-match catalogs smallest to largest Finds possible alerts in 40 sec on desktop Maria A. Nieto-Santisteban / ADASS 2006