Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial in Lucene and Solr David Smiley Lucene/Solr search developer / consultant 2016-05 at Harvard CGA.

Similar presentations


Presentation on theme: "Spatial in Lucene and Solr David Smiley Lucene/Solr search developer / consultant 2016-05 at Harvard CGA."— Presentation transcript:

1 Spatial in Lucene and Solr David Smiley Lucene/Solr search developer / consultant 2016-05 at Harvard CGA

2 About David Smiley Software Engineer (16 years) Search (7 years) Java (full-stack), Web, Spatial Freelance search consultant / developer Expert Lucene/Solr search advise / training Expert Lucene/Solr development skills Apache Lucene / Solr committer & PMC, Eclipse Locationtech PMC Authored 1st book on Solr, updated twice Presents at conferences & meetups Taught several Solr classes, self-developed & LucidWorks

3 Agenda Search Background Spatial in Solr Features How-to Recent Lucene developments, Future

4 Search Technology Keyword search Text analysis: stemming, synonyms, tokenization, phonetics Relevance ordering Query-completion (Find As You Type) Query did-you-mean Highlighted snippets Faceting, for navigation & analytics Result Clustering Query operators like fuzzy match, and “near” operator Some major features…

5 Faceted Navigation & Analytics by example… Notice the counts Optionally start with a keyword search or filter Extremely useful feature supported by very few platforms: Solr, ElasticSearch, Sphinx, … (no DBs)

6 Search Platforms A search platform has search features plus others like: A query language Boolean logic, numerics & dates, regexp, standard sorting Joins & Grouping Configuration Horizontal scaling options Administration tools, incl. a UI Note: Crawlers (Web/file/content-repository) are sometimes separate A NoSQL solution of the search variety

7 Apache Lucene & Solr Lucene: Provides most of the search “technology” behind search, plus some non-search but important capabilities (e.g. dates & numbers) But it’s just a toolkit/library/framework Solr & ElasticSearch Adds everything else needed to have a search platform / server / NoSQL solution Add some more of its own search technology too (Lucene & ElasticSearch too)

8 Spatial in Solr

9 Geospatial Features Lucene/Solr can index text, numbers, dates, and spatial data Features: Index latitude & longitude coordinates or any X Y pairs Index polygons or other geometry Query by point-radius, rectangle, polygon, or other geometry Including “Within” vs “Intersects” vs “Contains” predicates 2d/flat Euclidean OR geodetic spherical world model Sort or relevancy-boost by distance to indexed points Heatmaps -- spatial grid faceting GeoJSON & WKT formats

10 Big Picture Different spatial field types to choose from Vary in what features they support Syntax can vary too  Vary in performance for different features Shapes (AKA geometry): Index a shape – put it in a document’s field Query by another shape The default relation predicate is “intersects” Spatial code lives in 4 places: Solr, Lucene (several modules), Spatial4j, JTS

11 How-to: Index Points (LatLonType) Configuration: schema.xml: Index a point (JavaScript syntax, “lat,lon” format): {"id":"1", "point":"45.15,-93.85"}

12 How-to: Index Polygons (RPT Type) Configuration: schema.xml: Index a polygon (JavaScript syntax around WKT): {"id":"1", "geo_rpt": "POLYGON((30 10, 10 20, 20 40, 40 40, 30 10))"} or any supported shape, even just points

13 How-to: Search/filter Search for documents intersecting a 5 kilometer circle at 45.15,- 98.85: fq={!geofilt}&sfield=geo_rpt&pt=45.15,-93.85&d=5 Search for documents intersecting a lat-lon box (Range query style) fq=geo_rpt:[-90,-180 TO 90,180] Search for documents intersecting a polygon (WKT syntax) fq=geo_rpt:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))) distErrPct=0” Predicates: Intersects, Within, Contains, Disjoint

14 GeoJSON examples (Solr 6.1) Schema: Index by GeoJSON (literal) {"type":"Point","coordinates":[1,2]} Search by GeoJSON, return GeoJSON: /select?q={!field f=geo_rpt Intersects({"type":"Point","coordinates":[1,2]}) &wt=geojson&geojson.field=geo_rpt {"response":{"type":"FeatureCollection", "numFound":1,"start":0,"features":[ {"type":"Feature”, "geometry":{"type":"Point","coordinates":[1,2]}, "properties":{... the normal solr doc fields here... }}] }}

15 How-to: Distance Sort / Boost Sort with geodist() &sort=geodist() asc &pt=45.15,-93.85 &sfield=myField Relevancy boost This example is RPT only; alternatives exist for LatLonType &defType=edismax &boost=query($mysq) &mysq={!geofilt filter=false score=recipDistance pt=45.15,-98.85 d=5} &sfield=geo_rpt Points-only

16 How-to: Index Rects (BBoxField) Configuration: schema.xml Index a rectangle (JavaScript syntax around WKT): {"id":"1", ”bbox”:"ENVELOPE(-10, 20, 15, 10)"} Note: minX, maxX, maxY, minY order

17 How-to: Filter and sort by overlap Use this syntax: &q={!field f=bbox score=overlapRatio} Intersects(ENVELOPE(-10, 20, 15, 10)) BBoxField has more precision than RPT Field and supports more predicates (e.g. Equals) BBoxField only

18 Heatmaps: Spatial Grid Faceting Spatial density summary grid faceting, also useful for point-plotting search results Lucene & Solr APIs Scalable & fast usually… Usually rendered with a gradient radius -> See: http://spacemansteve.github.io/ leaflet-solr-heatmap/example/index.htmlhttp://spacemansteve.github.io/ leaflet-solr-heatmap/example/index.html

19 How-to: Heatmaps On an RPT field Might customize prefixTree & worldBounds Query: /select?facet=true &facet.heatmap=geo_rpt &facet.heatmap.geom= ["-180 -90" TO "180 90”] &facet.heatmap.format= ints2D or png // Normal Solr response... "facet_counts":{... // facet response fields "facet_heatmaps":{ "geo_rpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D”, [null, null, [0, 1,... ]]...

20 New in Lucene Spatial (in 2015,2016; that which isn’t in Solr yet)

21 Geo3D: Shapes on a Sphere … or Ellipsoid of configurable axis Not a general 3D space geometry lib Internally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematics Shapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional buffer Distance computations: Arc (angular or surface), Linear (straight-line), Normal

22 2D Maps Distort Straight Lines A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!

23 Geo3D, continued… Benefits Inherently more accurate than 2D projected spatial especially for big shapes or near poles Many computations are fast; no expensive trigonometry An alternative to JTS without the LGPL license (still) Has own Lucene module (spatial3d), thus jar file MavengroupId: org.apache.lucene, artifact: lucene-spatial3d Index/Search: Geo3DPoint & Geo3DDocValuesField Limited RPT & Spatial4j integration; see Geo3dShape No Solr integration yet; pending more Spatial4j integration

24 New Competing Spatial Fields GeoPointField, LatLonPoint, Geo3DPoint All of these: Naming is a challenge; don’t read into them too much Exist outside Lucene spatial-extra’s module Don’t use abstractions like SpatialStrategy or Spatial4j lib Worked on by various contributors Limited to indexed point data (not polygons, etc.) Note: in Lucene 4 & 5 there was one spatial module. In Lucene 6, that module was effectively renamed to “spatial-extras” with a new “spatial” module now, plus “spatial3d”.

25 New Fields continued… GeoPointField (in “spatial”) Supports distance sort/boost without a separate field Approximate grid index + docValues (2-phase iter impl) Geo3DPoint (in “spatial3d”) See Geo3D geometry slides earlier Uses new “BKD” PointValues index; 3 dimensions LatLonPoint (in “sandbox”) Most efficient Uses new “BKD” PointValues index; 2 dimensions

26 Performance http://home.apache.org/~mikemccand/geobench.html Summary: LatLonPoint is currently 2x faster than other 2 (changes often) LatLonPoint has smallest index if don’t also need dist. sorting If need that (i.e. need “docValues”), GeoPoint is smallest No sort perf comparison yet; Geo3D looks promising Comparison to RPT (in spatial-extras): RPT similar to GeoPoint in search performance RPT’s indexes are huge Remember: RPT supports index based heatmaps & non-point indexed shapes (and predicates), and custom shapes

27 Future The dust hasn’t settled in Lucene spatial land… lots of activity lately, lots of performance enhancements Need to add Solr adapters Some Solr spatial ease-of-use / consistency / better docs would be good Heatmap performance planned/funded Heatmap with stats (instead of counts) planned/funded


Download ppt "Spatial in Lucene and Solr David Smiley Lucene/Solr search developer / consultant 2016-05 at Harvard CGA."

Similar presentations


Ads by Google