Download presentation
Presentation is loading. Please wait.
Published byJesus Watkins Modified over 11 years ago
1
Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
2
Equations Define Subspaces For (x,y) above the line ax+by > c Reverse the space by -ax + -by > -c Intersect 3 half-spaces: a 1 x + b 1 y > c 1 a 2 x + b 2 y > c 2 a 3 x + b 3 y > c 3 x y x=c/a y=c/b ax + by = c x y
3
Domain is Union of Convex Hulls Simple volumes are unions of convex hulls. Higher order curves also work Complex volumes have holes and their holes have holes. (that is harder). Not a convex hull +
4
Now in Relational Terms create table HalfSpace ( domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null,-- grouping a set of ½ spaces halfSpaceID int identity(),-- a particular ½ space x float not null, -- the (a,b,..) parameters y float not null, -- defining the ½ space z float not null, cfloat not null, -- the constant (c above) primary key (domainID, convexID, halfSpaceID) (x,y,z) inside a convex if it is inside all lines of the convex (x,y,z) inside a convex if it is NOT OUTSIDE ANY line of the convex select convexID-- return the convex hulls from HalfSpace-- from the constraints where @x * x + @y * y + @x * z < l -- point outside the line? group by all convexID-- consider all the lines of a convexID having count(*) = 0 -- count outside == 0
5
The Algebra is Simple (Boolean) @domainID = spDomainNew (@type varchar(16), @comment varchar(8000)) @convexID = spDomainNewConvex (@domainID int) @halfSpaceID = spDomainNewConvexConstraint (@domainID int, @convexID int, @x float, @y float, @z float, @l float) @returnCode = spDomainDrop(@domainID) select * from fDomainsContainPoint(@x float, @y float, @z float) Once constructed they can be manipulated with the Boolean operations. @domainID = spDomainOr (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainAnd (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainNot (@domainID1 int, @type varchar(16), @comment varchar(8000))
6
What! No Bounding Box? Bounding box limits search. A subset of the convex hulls. If query runs at 3M halfspace/sec then no need for bounding box, unless you have more than 10,000 lines. But, if you have a lot of half-spaces then bounding box is good.
7
A Different Problem Table-valued function find points near a point –Select * from fGetNearbyEq(ra,dec,r) Use Hierarchical Triangular Mesh www.sdss.jhu.edu/htm/ www.sdss.jhu.edu/htm/ –Space filling curve, bounding triangles… –Standard approach 13 ms/call… So 70 objects/second. Too slow, so precompute neighbors: Materialized view. At 70 objects/sec it takes 6 months to compute a billion objects.
8
Zone Based Spatial Join Divide space into zones Key points by Zone, offset (on the sphere this need wrap-around margin.) Point search look in a few zones at a limited offset: ra ± r a bounding box that has 1-π/4 false positives All inside the relational engine Avoids impedance mismatch Can batch all-all comparisons 33x faster and parallel 6 days, not 6 months! r ra-zoneMax (r 2 +(ra-zoneMax) 2 ) cos(radians(zoneMax)) zoneMax x Ra ± x
9
In SQL: points near point select o1.objID -- find objects from zone o1 -- in the zoned table where o1.zoneID between -- where zone # floor((@dec-@r)/@zoneHeight) and-- overlaps the circle floor((@dec+@r)/@zoneHeight) and o1.ra between @ra - @r and @ra + @r-- quick filter on ra and o1.dec between @dec-@r and @dec+@r -- quick filter on dec and ( (sqrt( power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2)))) < @r -- careful filter on distance Eliminates the ~ 21% = 1-π/4 False positives Bounding box
10
Do the following for @deltaZone in {-1, 0, 1} example ignores some spherical geometry details in full paper insert neighbors-- insert one zone's neighbors select o1.objID as objID, -- object pairs o2.objID as NeighborObjID,.. other fields elided from zone o1 join zone o2 -- force a nested loop on o1.zoneID-@deltaZone = o2.zoneID -- using zone number and ra and o2.ra between o1.ra - @r and o1.ra + @r -- elided 1/cos(@dec) where -- elided margin logic, see paper. and o2.dec between o1.dec-@r and o1.dec+@r -- quick filter on dec and sqrt(power(o1.x-o2.x,2)+power(o1.y-o2.y,2)+power(o1.z-o2.z,2)) < @r -- careful filter on distance
11
Summary SQL is a set oriented language You can express constraints as rows Then You –Can evaluate LOTS of predicates per second –Can do set algebra on the predicates. Benefits from SQL parallelism SQL == Prolog?
12
References Representing Polygon Areas and Testing Point-in-Polygon Containment in a Relational Database http://research.microsoft.com/~Gray/papers/Polygon.doc http://research.microsoft.com/~Gray/papers/Polygon.doc A Purely Relational Way of Computing Neighbors on a Sphere, http://research.microsoft.com/~Gray/papers/Neighbors.doc http://research.microsoft.com/~Gray/papers/Neighbors.doc
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.