Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

The Symbol Table Lecture 13 Wed, Feb 23, The Symbol Table When identifiers are found, they will be entered into a symbol table, which will hold.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Thomas ZahnCST1 Seminar: Information Management in the Web Query Processing Over Peer- to-Peer Data Sharing Systems (UC Santa Barbara)
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Bloom Filters Kira Radinsky Slides based on material from:
Spatial indexing PAMs (II).
Fast Statistical Spam Filter by Approximate Classifications Authors: Kang Li Zhenyu Zhong University of Georgia Reader: Deke Guo.
Redundant Bit Vectors for the Audio Fingerprinting Server John Platt Jonathan Goldstein Chris Burges.
Chapter 4  Hash Functions 1 Overview  Cryptographic hash functions are functions that: o Map an arbitrary-length (but finite) input to a fixed-size output.
Look-up problem IP address did we see the IP address before?
Indexing of Network Constrained Moving Objects Dieter Pfoser Christian S. Jensen Chia-Yu Chang.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
Time Series Data Analysis - II
Overview of File Organizations and Indexing Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
Section 11.4 Language Classes Based On Randomization
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
Ioana BurceaWon-Ho Park Electrical and Computer Engineering Department University of Toronto Algorithms for Implementation of Tuple Space Expert Topic.
Dept. of Electrical Engineering and Computer Science, Northwestern University Context-Aware Optimization of Continuous Query Maintenance for Trajectories.
Database Management 9. course. Execution of queries.
Author : Ozgun Erdogan and Pei Cao Publisher : IEEE Globecom 2005 (IJSN 2007) Presenter : Zong-Lin Sie Date : 2010/12/08 1.
1 Lecture 11: Bloom Filters, Final Review December 7, 2011 Dan Suciu -- CSEP544 Fall 2011.
Lesley Charles November 23, 2009.
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
MSU/CSE 260 Fall Functions Read Section 1.8.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
Computer Science CSC 474Dr. Peng Ning1 CSC 474 Information Systems Security Topic 2.3 Hash Functions.
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
IMinMax B.C. Ooi, K.-L Tan, C. Yu, S. Stephen. Indexing the Edges -- A Simple and Yet Efficient Approach to High dimensional Indexing. ACM SIGMOD-SIGACT-
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Rabin & Karp Algorithm. Rabin-Karp – the idea Compare a string's hash values, rather than the strings themselves. For efficiency, the hash value of the.
BITMAP INDEXES Barot Rushin (Id :- 108).
Presented by: Omar Alqahtani Fall 2016
Spencer MacBeth Supervisor - Dr. Ramon Lawrence
Multidimensional Access Structures
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
A way to detect a collision…
Rabin & Karp Algorithm.
Digital Signature Schemes and the Random Oracle Model
Chapter 15 QUERY EXECUTION.
Bloom filters Probability and Computing Michael Mitzenmacher Eli Upfal
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Statistical Optimal Hash-based Longest Prefix Match
Digital Signature Schemes and the Random Oracle Model
Bloom Filters Very fast set membership. Is x in S? False Positive
Locality Sensitive Hashing
Data abstraction, revisited
CS5112: Algorithms and Data Structures for Applications
Database Design and Programming
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Bloom filters From Probability and Computing
Hash Functions for Network Applications (II)
Lecture 1: Bloom Filters
Presentation transcript:

Spatial Issues in DBGlobe Dieter Pfoser

Location Parameter in Services Entering the harbor (x,y position)… …triggers information request

Spatial Data in DBGlobe n Spatial information might be the predominant type of data to structure information content n PMOs contain spatially (+temporally) referenced data n These data is distributed over a set of devices n How can we relate all these data to one spatial location  “What have we stored for this location?”  n This introduces space as the organizing criterion for data, i.e., a distinguished context

Spatial Data… (cont’d) n Each PMO contains a set of positions that reference content n The job of DBGlobe is now to find this content based on a given positional reference Position  { PMO (id)}  content n BUT! – Content is referenced by position as the only argument! – The question is of how to introduce further filters that only retrieve relevant (interesting) content based on additional parameters?

Distributed Indexes n Using tree-based structures, a global index needs to be constructed and some portion of the index replicated in the CAS n Given the set of locations for each PMO, one could compute a signature that the PMO communicates to a CAS (and further aggregated there) n This signature is used to potentially scan all PMOs for relevant spatial information

Bloom Filters: High Level Idea n Everyone thinks they need to know exactly what everyone else has. Give me a list of what you have. n Lists are long and unwieldy. n Using Bloom filters, you can get small, approximate lists. Give me information so I can figure out what you have.

A Bloom Filter: To check an object’s name against a Bloom filter summary, the name is hashed with n different hash functions (here, n=3) and bits corresponding to the result are checked. Bloom Filter Example Bit Vector Hash Functions

Bloom Filter n Multiple hash functions used for mapping of values on bit vector n Example: Web proxy cache sharing – Hashing URLs using the MD5 algorithm, which is a cryptographic message digest algorithm that hashes arbitrary length strings to 128 bits – Hash functions are built by n first calculating the MD5 signature of a URL  128 bits n dividing the 128 bits into four 32-bit word, and finally n taking the modulus of each 32-bit word by the table size

Spatial Hashing n Alphanumeric hashing, string  hash value n Spatial coordinates as string? – (Long/Lat) deg. East, deg. North – Equal to – deg. East, deg. North ??? n Hashing the two pairs of coordinates as strings their hash values would not match (be totally different, given a good hash function such as MD5) n Spatial data is different from alphanumeric data since its semantics have to be seen in the context of a reference system n In the context of matching hash values tolerance is needed to test for equality

Spatial Subdivisions n Regular subdivisions n Occupation-based, e.g., adaptive k-d-tree

Spatial Subdivision Earthquakes Earthquakes n Computing spatial subdivisions of space based on existing data

Spatial Hashing n Linearize the spatial subdivisions using space filling curves n Space filling curves as hash functions – Z-ordering (Peano curves) – Hilbert curves – … n Example: – Hashing positions using the above space-filling curves – Determine the spatial subdivision the position falls into – Compute respective linearization values for each of the space filling curves (hash functions) – taking the modulus of each value by the size of the bit vector

n PMO containing spatial data communicate signatures to CAS n CAS “ORs” signatures and keeps track of associations Overall Scenario

Questions n Types of queries, e.g., range queries vs. “point” queries n Spatial hash functions by using grids and space filling curves n Distinct type of data that deserves special treatment? n Can stand as a single query parameter? Needs more context?

END

n Given a set S = {x 1,x 2,x 3,…x n } on a universe U, want to answer queries of the form: n Example: a set of URLs from the universe of all possible URL strings. n Bloom filter provides an answer in – “Constant” time (time to hash). – Small amount of space. – But with some probability of being wrong. Lookup Problem

m/n = 8 Opt k = 8 ln 2 = 5.45 Optimal Choice of Parameters n Given m bits for filter and n elements, choose number k of hash functions n Find optimal at k = (ln 2)m/n by calculus

Spatial Subdivision