Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Similar presentations


Presentation on theme: "Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request."— Presentation transcript:

1 Spatial Issues in DBGlobe Dieter Pfoser

2 Location Parameter in Services Entering the harbor (x,y position)… …triggers information request

3 Spatial Data in DBGlobe n Spatial information might be the predominant type of data to structure information content n PMOs contain spatially (+temporally) referenced data n These data is distributed over a set of devices n How can we relate all these data to one spatial location  “What have we stored for this location?”  n This introduces space as the organizing criterion for data, i.e., a distinguished context

4 Spatial Data… (cont’d) n Each PMO contains a set of positions that reference content n The job of DBGlobe is now to find this content based on a given positional reference Position  { PMO (id)}  content n BUT! – Content is referenced by position as the only argument! – The question is of how to introduce further filters that only retrieve relevant (interesting) content based on additional parameters?

5 Distributed Indexes n Using tree-based structures, a global index needs to be constructed and some portion of the index replicated in the CAS n Given the set of locations for each PMO, one could compute a signature that the PMO communicates to a CAS (and further aggregated there) n This signature is used to potentially scan all PMOs for relevant spatial information

6 Bloom Filters: High Level Idea n Everyone thinks they need to know exactly what everyone else has. Give me a list of what you have. n Lists are long and unwieldy. n Using Bloom filters, you can get small, approximate lists. Give me information so I can figure out what you have.

7 A Bloom Filter: To check an object’s name against a Bloom filter summary, the name is hashed with n different hash functions (here, n=3) and bits corresponding to the result are checked. Bloom Filter Example Bit Vector Hash Functions

8 Bloom Filter n Multiple hash functions used for mapping of values on bit vector n Example: Web proxy cache sharing – Hashing URLs using the MD5 algorithm, which is a cryptographic message digest algorithm that hashes arbitrary length strings to 128 bits – Hash functions are built by n first calculating the MD5 signature of a URL  128 bits n dividing the 128 bits into four 32-bit word, and finally n taking the modulus of each 32-bit word by the table size

9 Spatial Hashing n Alphanumeric hashing, string  hash value n Spatial coordinates as string? – (Long/Lat) 23.123 deg. East, 38.01 deg. North – Equal to – 23.12 deg. East, 38.02 deg. North ??? n Hashing the two pairs of coordinates as strings their hash values would not match (be totally different, given a good hash function such as MD5) n Spatial data is different from alphanumeric data since its semantics have to be seen in the context of a reference system n In the context of matching hash values tolerance is needed to test for equality

10 Spatial Subdivisions n Regular subdivisions n Occupation-based, e.g., adaptive k-d-tree

11 Spatial Subdivision Earthquakes 1964-83 Earthquakes 1964-92 n Computing spatial subdivisions of space based on existing data

12 Spatial Hashing n Linearize the spatial subdivisions using space filling curves n Space filling curves as hash functions – Z-ordering (Peano curves) – Hilbert curves – … n Example: – Hashing positions using the above space-filling curves – Determine the spatial subdivision the position falls into – Compute respective linearization values for each of the space filling curves (hash functions) – taking the modulus of each value by the size of the bit vector

13 n PMO containing spatial data communicate signatures to CAS n CAS “ORs” signatures and keeps track of associations Overall Scenario

14 Questions n Types of queries, e.g., range queries vs. “point” queries n Spatial hash functions by using grids and space filling curves n Distinct type of data that deserves special treatment? n Can stand as a single query parameter? Needs more context?

15 END

16

17 n Given a set S = {x 1,x 2,x 3,…x n } on a universe U, want to answer queries of the form: n Example: a set of URLs from the universe of all possible URL strings. n Bloom filter provides an answer in – “Constant” time (time to hash). – Small amount of space. – But with some probability of being wrong. Lookup Problem

18 m/n = 8 Opt k = 8 ln 2 = 5.45 Optimal Choice of Parameters n Given m bits for filter and n elements, choose number k of hash functions n Find optimal at k = (ln 2)m/n by calculus

19 Spatial Subdivision


Download ppt "Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request."

Similar presentations


Ads by Google