Spatial Searches in the ODM
slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not in masked areas) 3.Is this point in any of the regions Region in region 4.Find regions near this region and their area 5.Find all objects with error boxes intersecting region 6.What is the common part of these regions Various statistical operations 7.Find the object counts over a given region list 8.Cross-match these two catalogs in the region
slide 3 Sky Coordinates of Points Many different coordinate systems Equatorial, Galactic, Ecliptic, Supergalactic Longitude-latitude constraints Searches often in mix of different coordinate systems gb>40 and dec between 10 and 20 Problem: coordinate singularities, transformations How can one describe constraints in a easy, uniform fashion? How can one perform fast database queries in an easy fashion? Fast:Indexes Easy: simple query expressions
slide 4 Describing Regions Spacetime metadata for the VO (Arnold Rots) Includes definitions of Constraint: single small or great circle Convex: intersection of constraints Region: union of convexes Support both angles and Cartesian descriptions Constructors for CIRCLE, RECTANGLE, POLYGON, CONVEX HULL Boolean algebra (INTERSECTION, UNION, DIFF) Proper language to describe the abstract regions Similar to GIS, but much better suited for astronomy
slide 5 Things Can Get Complex
slide 6 We Do Spatial 3 Ways Hierarchical Triangular Mesh (extension to SQL) Uses table valued functions Acts as a new “spatial access method” Zones: fits SQL well Surprisingly simple & good 3D Constraints: a novel idea Algebra on regions, can be implemented in pure SQL
slide 7 PS1 Footprint Using the projection cell definitions as centers for tessellation (T. Budavari)
slide 8 CrossMatch: Zone Approach Divide space into declination zones Objects ordered by zoneid, ra (on the sphere need wrap-around margin.) Point search look in neighboring zones within ~ (ra ± Δ) bounding box All inside the relational engine Avoids “impedance mismatch” Can “batch” comparisons Automatically parallel Details in Maria’s thesis r ra-zoneMax zoneMax x ra ± Δ
slide 9 Indexing Using Quadtrees Cover the sky with hierarchical pixels COBE – start with a cube Hierarchical Triangular Mesh (HTM) uses trixels Samet, Fekete Start with an octahedron, and split each triangle into 4 children, down to 20 levels deep Smallest triangles are 0.3” Each trixel has a unique htmID 2,2 2,1 2,0 2,3 2,3,0 2,3,1 2,3,22,3,
slide 10 Space-Filling Curve ,2, [0.12,0.13) [0.122,0.123)[0.121,0.122)[0.120,0.121)[0.123,0.130) Triangles correspond to ranges All points inside the triangle are inside the range. [0.122,0.130) [0.120,0.121)
slide 11 SQL HTM Extension Every object has a 20-deep htmID (44bits) Clustered index on htmID Table-valued functions for spatial joins Given a region definition, routine returns up to 10 ranges of covering triangles Spatial query is mapped to ~10 range queries Current implementation rewritten in C# Excellent performance, little calling overhead Three layers General geometry library HTM kernel IO (parsing + SQL interface)
slide 12 Writing Spatial SQL -- region description is contained TABLE (htmStart bigint,htmEnd bigint) SELECT * from -- TABLE ( convexId bigint,x float, y float, z float) SELECT -- SELECTo.ra, o.dec, 1 as flag, o.objid FROM (SELECT objID as objid, cx,cy,cz,ra,[dec] FROM Objects q AS c ON q.htmID between c.HtmIdStart and c.HtmIdEnd ) AS o WHERE NOT EXISTS ( SELECT p.convexId AS p WHERE (o.cx*p.x + o.cy*p.y + o.cz*p.z < p.c) GROUP BY p.convexId )
slide 13 Status All three libraries extensively tested Zones used for Maria’s thesis, plus various papers New HTM code in production use since July on SDSS Same code also used by STScI HLA, Galex Systematic regression tests developed Footprints computed for all major surveys Complex mask computations done on SDSS Loading: zones used for bulk crossmatch Ad hoc queries: use HTM-based search functions Excellent performance