Download presentation
Presentation is loading. Please wait.
Published byMartina Bridges Modified over 9 years ago
1
…and postgis & full text search & fuzzy comparisons
2
For manipulating 2D/3D spatial data Points, lines, and polygons formed from points and lines Can perform union, intersection, operations Can project shapes into 2D areas Has a 3D geometry type (relatively new) Can calculate accurate distances in meters Works with an open source server that allows folks to share geospatial data Command line interface Also supports some forms of raster data Provides spatial indices Has a notion of a geometric column
3
SELECT superhero.name FROM city, superhero WHERE ST_Contains(city.geom, superhero.geom) and city.name = 'Gotham'; SELECT AsBinary(the_geom) as wkb_geometry FROM river AS r, state AS s WHERE intersects(r.the_geom, s.the_geom)
4
Used for OSM (open street map) data and uses postgis Mapnik is an open source system for rendering maps Used to design maps Written in C++ It renders maps from postgis databases
5
But first, not to be confused with the Like operator Used % as the wild card Or with regular expressions for character string comparison
6
First, you index the words in a document and create an array of lexemes Second, specify a boolean phrase using and, or, not, and parens We typically don’t index “stop” words like and, or, the, etc. Dictionaries are used to find roots of related words, like dead and dying Thesauruses dictionaries are used to for recognition of domain-specific and similar words
7
A document is a text attribute in a row of a table Often we use part of a document or concatenate various parts of documents
8
Define stop words that should not be indexed Map synonyms to a single word. Map phrases to a single word using a thesaurus. Map different variations of a word to a canonical form
9
Uses a match operator - @@ Basic search consists of asking about the relationship to a vector of words to a given document, which is also a vector The vector can have and, or, etc. in it tsvector – document – normalized lexemes tsquery – query
10
SELECT title FROM pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table') ORDER BY last_mod_date DESC LIMIT 10; select the ten most recent documents that contain create and table in the title or body Results can be ranked
11
soundex(text) returns text Converts a string to its Soundex code Based on pronunciation difference(text, text) returns int converts two strings to their Soundex codes and then reports the number of matching code positions 0 is a no match 4 is a full match Def: A phonetic coding system intended to suppress spelling variation and determining the relationship between two (similar) words
12
Levenshtein distance is a metric for evaluating the difference between two sequences, in particular, words E.g.: test=# SELECT levenshtein('GUMBO', 'GAMBOL'); E.g.: SELECT * FROM some_table WHERE levenshtein(code, 'AB123-lHdfj') <= 3 ORDER BY levenshtein(code, 'AB123-lHdfj') LIMIT 10 Used in particular, to detect nicknames
13
E.g., metaphone(text source, int max_output_length) returns text Similar to soundex Used to classify words according to their english pronunciation Apparently better for non-english languages, compared to soundex E.g.: SELECT * FROM users WHERE METAPHONE(users.first_name, 2) = METAPHONE('Willem', 2) should detect similarity to word William
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.