Efficient Catalog Matching with Dropout Detection

Slides:



Advertisements
Similar presentations
Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
Advertisements

9 September 2005NVO Summer School Aspen Astronomical Dataset Query Language (ADQL) Ray Plante T HE US N ATIONAL V IRTUAL O BSERVATORY.
MAST-VizieR/NED cross correlation tutorial 1. Introduction Figure 1: Screenshot of the MAST VizieR Catalog Search Form. or enter here as object class:
Outline Introduction Related work on packet classification Grouper Performance Empirical Evaluation Conclusions.
Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Data Structures Hash Tables
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
1 Energy Efficient Multi-match Packet Classification with TCAM Fang Yu
Probabilistic Cross-Identification of Astronomical Sources Tamás Budavári Alexander S. Szalay María Nieto-Santisteban The Johns Hopkins University.
Hash Tables1 Part E Hash Tables  
Virtual Dart – An Augmented Reality Game on Mobile Device Supervised by Prof. Michael R. Lyu LYU0604Lai Chung Sum ( )Siu Ho Tung ( )
SDSS Web Services Tamás Budavári Johns Hopkins University Coding against the Universe.
GIANT TO DWARF RATIO OF RED-SEQUENCE GALAXY CLUSTERS Abhishesh N Adhikari Mentor-Jim Annis Fermilab IPM / SDSS August 8, 2007.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
László Dobos, Tamás Budavári, Alex Szalay, István Csabai Eötvös University / JHU Aug , 2008.IDIES Inaugural Symposium, Baltimore1.
How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT.
Chapter 5 Ordered List. Overview ● Linear collection of entries  All the entries are arranged in ascending or descending order of keys.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
DC2 Postmortem Association Pipeline. AP Architecture 3 main phases for each visit –Load: current knowledge of FOV into memory –Compute: match difference.
Implementing Codesign in Xilinx Virtex II Pro Betim Çiço, Hergys Rexha Department of Informatics Engineering Faculty of Information Technologies Polytechnic.
Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang &
EÖTVÖS UNIVERSITY BUDAPEST Department of Physics of Complex Systems VO Spectroscopy Workshop, ESAC Spectrum Services 2007 László Dobos (ELTE)
September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.
Arrays Module 6. Objectives Nature and purpose of an array Using arrays in Java programs Methods with array parameter Methods that return an array Array.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
The Sloan Digital Sky Survey ImgCutout: The universe at your fingertips Maria A. Nieto-Santisteban Johns Hopkins University
Data Structure Introduction.
IVOA Interoperalibity JVO Query Language Naoki Yasuda (NAOJ/Japanese VO)
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Recent spatial work by Jim Gray and Alex Szalay Bob Mann.
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
1 the hash table. hash table A hash table consists of two major components …
Today’s Material Sorting: Definitions Basic Sorting Algorithms
Budapest Group Eötvös University MAGPOP kick-off meeting Cassis 2005 January
June 27-29, DC2 Software Workshop - 1 Tom Stephens GSSC Database Programmer GSSC Data Servers for DC2.
Slide 1 PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Inh Jee University of Texas at Austin Eiichiro Komatsu & Karl Gebhardt
16 Searching and Sorting.
19 Searching and Sorting.
Catalogs contain hundreds of millions of objects
Chapter 5 Ordered List.
CS239-Lecture 4 FlumeJava Madan Musuvathi Visiting Professor, UCLA
COMP261 Lecture 23 B Trees.
Astronomical Data Processing & Workflow Scheduling in cloud
Indices.
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
Ordered Array.
Moving towards the Virtual Observatory Paolo Padovani, ST-ECF/ESO
Cross-matching the sky with database server cluster
Sky Query: A distributed query engine for astronomy
Introduction to Query Optimization
Web Data Extraction Based on Partial Tree Alignment
Computer Vision Lecture 5: Binary Image Processing
Source Detection by Count Excess
Please thank our sponsors!
Using Single Photons for WIMP Searches at the ILC
EE 312 Software Design and Implementation I
Self-organizing Tuple Reconstruction in Column-stores
.NET Framework Data Providers
Applying principles of computer science in a biological context
Efficient Aggregation over Objects with Extent
LSST, the Spatial Cross-Match Challenge
Presentation transcript:

Efficient Catalog Matching with Dropout Detection Dongwei Fan

Motivation Zones Algorithm is one of the fastest crossmatching methods Blind to catalogs’ sky coverage information The goal is to include the sky coverage info and make it faster 4/21/2019

Outline Crossmatch with Footprints/Sky Coverage information Introduction How to do Dropout Detection Crossmatch in Arbitrary region 4/21/2019

Crossmatching with Zones Celestial Sphere was mapped to zones by Declination. Every object and every zone has a ZoneID ZoneID= 𝛿+90° ℎ , 𝛿 is Dec., h is height of zone Every catalog will be indexed by the ZoneID and R.A. Crossmatching only happens among the objects in nearby zones -- why Zones Algorithm so fast 4/21/2019

How to show footprint on Zones Use intervals to approximate the footprint Save intervals in Datatable Zones are narrow. i.e. 7.1 arcsec height. So the approximation is very closed to the footprint. This will take longer, but it is one time operation. 4/21/2019

Intersect footprint by their intervals Intersection intervals are also stored in an interval data table This step have to do for every pair of catalogs. It is very fast to complete in database. 4/21/2019

Outline Crossmatch with Footprints/Sky Coverage information Introduction How to do Dropout Detection Crossmatch in Arbitrary region 4/21/2019

Step by Step Step 1: Zones Definition Step 2: Approximate Footprint by intervals Step 3: Footprint Intersection Step 4: Index Catalogs & create ZoneZone Step 5: Crossmatch with sky coverage intersection 4/21/2019

Step 1: Zone Definition Create Table ZoneDef (Dec -90°- 0°,0°-90°) Calculate the Alpha for each Zone Calculate BinaryRegion for all zones Cut each Zone to two parts(RA 0°-180°, 180°-360°) After Step 1 we get a ZoneDef Table 4/21/2019

Handle the wrap around problem From Spherical Library, the boundary points are easy to calculated, but we cannot determine which is at the left Two ranges in figure could be (30, 330) or (330, 360)∪(0, 30) Select Min(Ra),Max(Ra): both return (30, 330) Which one is the real range? If it is (30, 330) - the purple line, that is OK But if (330,360)∪(0,30)? So cut the zone to (0,180),(180,360) then intersect with the region separately Then select Min(Ra), Max(Ra): For purple line return (30,180),(180,330) -- could be merged For blue line return (330,360),(0,30) 4/21/2019

Step 2: Approximate Footprint Get the Footprint of a catalog Make Regions Intersection between Footprint and Zones Calculate the Outline and get the intervals in zones – be careful of the buldge Merge intervals in the same zone, if they have intersection 4/21/2019

How to calculate the outline 4/21/2019

How to get the outline with bulge 4/21/2019

How to merge intervals in a zone Structure of Table 4/21/2019

T-SQL to merge intervals -1 4/21/2019

T-SQL to merge intervals -2 4/21/2019

T-SQL to merge intervals -3 4/21/2019

Step 3: Intersect Footprint Intersect footprints’ intervals approximation Assume: GALEX Intervals approximation (Red) SDSS Intervals approximation (Black) Handle wrap around problem, if RaMax+Alpha>360 insert (RaMin-360, RaMax-360) After Step 3 we get what we want: the new common sky coverage of two catalogs dbo.OutlineIntr( ZoneId int, RaMin float, RaMax float, primary key(ZoneId, RaMin, RaMax) ) 4/21/2019

How to intersect intervals 4/21/2019

Step 4: Index Catalogs Totally the same as Zones Algorithm Create ZoneID for each object and create an index on (zoneid, ra, objid) Create a ZoneZone table to indicate database to only search objects in nearby zones – by Jim Gray Create ZoneZone( zone1 int,--current object’s zoneid zone2 int,--which zones should be scanned alpha float, primary key(zone1, zone2) ) 4/21/2019

Step 5: Crossmatch with sky coverage intersection 4/21/2019

SQL Implementation Original Zone Match 4/21/2019

SQL Implementation Original Zone Match Zone Match in footprint intersection 4/21/2019

Performance Tested on thumper.pha.jhu.edu, with SDSS DR6 (230.4m objects) and GALEX GR3 AIS (54.9m objects) catalogs. Their overlap on the sky is 3689 Square degrees. 4/21/2019

4/21/2019

Outline Zone Match in two (or more) Footprints/Sky Coverage Introduction How to do Dropout Detection Crossmatch in Arbitrary region 4/21/2019

Dropout identification For a catalog (C1), we could know 1. which objects are inside the intersection (I) 2. which objects were matched(M) with the other catalog (C2) C1 >= I >= M I - M = IM: Objects inside the intersection but do not be matched IM could be the objects which cannot be found by C2. IM are the dropouts of C2. That would mean some objects in C2 is fainter than C1. 4/21/2019

Code 4/21/2019

Performance A total of 4.2 million SDSS’s dropouts from GALEX, which are inside the SDSS coverage but do not have associated counterparts, were found in just 29 seconds. 4/21/2019

4/21/2019

Outline Zone Match in two (or more) Footprints/Sky Coverage Introduction How to do Dropout Detection Crossmatch in Arbitrary region 4/21/2019

Crossmatch in Arbitrary Region Step 1: create the Arbitrary Region we want Step 2: approximate the region by intervals Then, we could Select objects in the region – will be implemented as a plugin on my software - FITS Manager Even crossmatch objects in the intervals 4/21/2019

4/21/2019

Summary Matching is fast with the Zones Algorithm But it is blind to sky coverage Sky coverage typically done separately Needs separate indices: yields slow combination Here we combine the two using the same index Faster matching for partial overlaps Increased performance for the on-the-fly runs This will be part of the SkyQuery 4/21/2019

News A big project from China-VO was approved. The total amount will be about 1.5M US$ from two founding. It is an Astronomical Resource Planning platform for the telescopes in China. Tasks: observation time application, proposal review, data archiving, data release and sharing, scientific discovery collection, will be (or demanded) done on the platform. The platform will provide proposal submission, data access, limited data processing and data mining, and maybe more functions, to astronomers. Achievements from the IVOA and VO partners will be adopted heavily by the platform.  4/21/2019

Thanks 4/21/2019