…and postgis & full text search & fuzzy comparisons.

…and postgis & full text search & fuzzy comparisons

 For manipulating 2D/3D spatial data  Points, lines, and polygons formed from points and lines  Can perform union, intersection, operations  Can project shapes into 2D areas  Has a 3D geometry type (relatively new) Can calculate accurate distances in meters Works with an open source server that allows folks to share geospatial data  Command line interface  Also supports some forms of raster data  Provides spatial indices  Has a notion of a geometric column

SELECT superhero.name FROM city, superhero WHERE ST_Contains(city.geom, superhero.geom) and city.name = 'Gotham'; SELECT AsBinary(the_geom) as wkb_geometry FROM river AS r, state AS s WHERE intersects(r.the_geom, s.the_geom)

 Used for OSM (open street map) data and uses postgis  Mapnik is an open source system for rendering maps  Used to design maps  Written in C++  It renders maps from postgis databases

 But first, not to be confused with the Like operator  Used % as the wild card  Or with regular expressions for character string comparison

 First, you index the words in a document and create an array of lexemes  Second, specify a boolean phrase using and, or, not, and parens  We typically don’t index “stop” words like and, or, the, etc.  Dictionaries are used to find roots of related words, like dead and dying  Thesauruses dictionaries are used to for recognition of domain-specific and similar words

 A document is a text attribute in a row of a table  Often we use part of a document or concatenate various parts of documents

 Define stop words that should not be indexed  Map synonyms to a single word.  Map phrases to a single word using a thesaurus.  Map different variations of a word to a canonical form

 Uses a match operator - @@  Basic search consists of asking about the relationship to a vector of words to a given document, which is also a vector  The vector can have and, or, etc. in it  tsvector – document – normalized lexemes  tsquery – query

SELECT title FROM pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table') ORDER BY last_mod_date DESC LIMIT 10; select the ten most recent documents that contain create and table in the title or body Results can be ranked

 soundex(text) returns text  Converts a string to its Soundex code  Based on pronunciation  difference(text, text) returns int  converts two strings to their Soundex codes and then reports the number of matching code positions 0 is a no match 4 is a full match  Def: A phonetic coding system intended to suppress spelling variation and determining the relationship between two (similar) words

 Levenshtein distance is a metric for evaluating the difference between two sequences, in particular, words  E.g.: test=# SELECT levenshtein('GUMBO', 'GAMBOL');  E.g.: SELECT * FROM some_table WHERE levenshtein(code, 'AB123-lHdfj') <= 3 ORDER BY levenshtein(code, 'AB123-lHdfj') LIMIT 10  Used in particular, to detect nicknames

 E.g., metaphone(text source, int max_output_length) returns text  Similar to soundex  Used to classify words according to their english pronunciation  Apparently better for non-english languages, compared to soundex  E.g.: SELECT * FROM users WHERE METAPHONE(users.first_name, 2) = METAPHONE('Willem', 2) should detect similarity to word William

…and postgis & full text search & fuzzy comparisons.

Similar presentations

Presentation on theme: "…and postgis & full text search & fuzzy comparisons."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

…and postgis & full text search & fuzzy comparisons.

Similar presentations

Presentation on theme: "…and postgis & full text search & fuzzy comparisons."— Presentation transcript:

Similar presentations

About project

Feedback