ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering.

Slides:



Advertisements
Similar presentations
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Advertisements

Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Reference-based Indexing of Sequence Databases Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci, Christopher Jermaine University of Florida-Gainesville.
Indexing and Range Queries in Spatio-Temporal Databases
Yasuhiro Fujiwara (NTT Cyber Space Labs)
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA Andrea Pugliese.
Fundamentals of Python: From First Programs Through Data Structures
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Intersections. Intersection Problem 3 Intersection Detection: Given two geometric objects, do they intersect? Intersection detection (test) is frequently.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
2-dimensional indexing structure
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing for NN retrieval
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Spatio-Temporal Databases
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Spatial Queries Nearest Neighbor and Join Queries.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Spatial Queries Nearest Neighbor Queries.
Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.
A Navigation Mesh for Dynamic Environments Wouter G. van Toll, Atlas F. Cook IV, Roland Geraerts CASA 2012.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Dept. of Electrical Engineering and Computer Science, Northwestern University Context-Aware Optimization of Continuous Query Maintenance for Trajectories.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 1) On Indexing Mobile Objects Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing.
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.
A Fault-Tolerant Environment for Large-Scale Query Processing Mehmet Can Kurt Gagan Agrawal Department of Computer Science and Engineering The Ohio State.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Bin Yao, Feifei Li, Piyush Kumar Presenter: Lian Liu.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
Adaptive Inlining Keith D. CooperTimothy J. Harvey Todd Waterman Department of Computer Science Rice University Houston, TX.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Spatial Data Management
Spatial Queries Nearest Neighbor and Join Queries.
Spatio-Temporal Databases
Subject Name: File Structures
RE-Tree: An Efficient Index Structure for Regular Expressions
Sameh Shohdy, Yu Su, and Gagan Agrawal
Supporting Fault-Tolerance in Streaming Grid Applications
Spatial Online Sampling and Aggregation
Spatio-Temporal Databases
Continuous Density Queries for Moving Objects
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering Closest-Point-of-Approach Join for Moving Object Histories 22 nd International Conference on Data Engineering

ICDE SELECT distinct (r, s) FROM R as r, S as s, TIME t WHERE dist (r, s, t) < 0.5 AND (r(t).altd - s(t).altd) ≥ AND (r(t).altd - s(t).altd) ≤ 1000 AND s(t)  C AND r(t)  C AND t ≥ 'JAN ’ AND t ≤ 'MAR ' “Find all commercial airliners that approached within 1000 vertical feet and 0.5 miles of a single engine plane in the BOS/JFK/EWR/LGA corridor C in the first three months of last year” CPA-Join Is Useful For Analysis Of Spatiotemporal Data Commercial airliners R, single engine planes S

ICDE Challenges 3-dimensional space + time Large # of objects Massive amount of data

ICDE CPA Illustration for Straight Line Trajectories Object p Object q CPA - Position at which two dynamically moving objects attain their closest possible distance

ICDE y x ,32 38,18 51,27 49,12 5,32 6,26 15,39 59,18 27,38 11,49 5,32 24,65 Time Object P Object Q y x Polyline approximation Sampled Positions Moving Object Trajectories dist cpa

ICDE Simple CPA-Join Procedure CPA (Object P, Object Q, distance d) 1. List result = {}; 2. for each pair of segments (p  P, q  Q) 3. if CPA_distance (p,q)  d 4. result += (p,q); 5. return result; Need to compare only those segments whose time interval overlaps Plane sweep Find all object pairs (p  P, q  Q) from relations P and Q such that CPA-distance (p,q)  d

ICDE CPA-Join using Simple Plane Sweep - First sort the segments in P and Q along time dimension (external sort) -While there is still some unprocessed data - Read in enough segments from P and Q to fill the main memory buffer -Next, sweep a vertical line along the time dimension. -Maintain a sweepline data structure which keeps tracks of all active segments that intersect the sweep line -As the sweep line progresses, the sweepline data structure is updated with insertions (new segments that became active) and deletions (segments whose time period has expired) -During updates to the sweepline structure, an all-pairs comparison returns valid results’

ICDE CPA-Join using Plane Sweep Sweep line has to pause at every new sample point encountered. Processing multi-gigabyte dataset can take a long time memory dis k

ICDE Group segments using a bounding box approximation dis k In the best case, just 1 comparison is needed memory dis k

ICDE Algorithm: Layered Plane Sweep While there is still some unprocessed data in disk Read in data from relations P and Q to fill in the buffer Construct MBR for the trajectory of every object in the buffer Sort MBRs along one of the spatial dimension and do a plane-sweep in it to identify qualifying MBR pairs Expand the MBRs to obtain the individual segments Sort segments along time dimension and do a plane-sweep along time to obtain the actual results

ICDE Layered Plane-Sweep Example But one size doesn’t fit all!

ICDE Indexes can be used to do CPA-Join -But (almost) all indexes use MBR approximation -And MBRs impose predefined granularities p q x y z A Note on Indexing

ICDE Layered Plane Sweep..what is the problem? Layered Plane Sweep always processes the entire fraction of data held in memory buffers When objects interact heavily such an approach may lead to no pruning at all In the best case, just one comparison is needed Though less buffer is processed initially, overall efficiency can be better Efficiency of layered technique is not tied to the amount of data processed, but to choosing a granularity that minimizes the # of distance computations

ICDE Cost to Process Data in Memory Buffer Cost can be approximated as a function of distance computations (which dominate execution time) cost = (n seg + n MBR ) where n seg is the # of segment level comparisons n MBR is the # of bounding box comparisons In general, cost for a fraction  (0 ≤  ≤ 1) of the buffer cost  = (n seg + n MBR ) * (1/  )

ICDE What we have Layered Plane Sweep processes large fraction (  is large) good when there is light interaction bad when there is heavy interaction Simple Plane Sweep processes tiny fraction (  is small) good when there is heavy interaction bad when there is light interaction What we want An Adaptive Algorithm processes a fraction that maximizes performance (  varies) Tunes to the characteristics of underlying data Provide superior performance under all scenarios

ICDE Algorithm: Adaptive Plane Sweep While there is still some unprocessed data in disk Read in data from relations P and Q to fill in the buffer Choose a fraction  of the data that maximizes performance Process the chosen fraction of data using Layered Plane Sweep

ICDE How many fractions should we consider? How to estimate the cost for a given fraction  ? “Evaluate increasing buffer fractions from 0 to 1 and choose the fraction with the minimum cost” Goal: Choose a fraction  of data that maximizes performance

ICDE Exact cost is known only after the fact! To know the cost associated with a given , we need to actually execute the join (layered plane sweep) at that granularity How to estimate Cost  for a given fraction  Estimate cost using a simple online sampling algorithm [HH97]

ICDE Cost Estimation through sampling Given: Relations P and Q and alpha Consider segments within  Construct MBRs for the objects in P Until the estimate of cost  is accurate to within +/- 10% –Pick randomly an object q 1 from Q and construct a MBR for its trajectory –Join q 1 with all objects in P –Compute n MBR,q1 and n seg,q1 –Estimate cost  How to estimate Cost  for a given fraction  (Contd.)

ICDE How many fractions to consider? –Computing cost for all  not practical..it will offset any benefit that we gain from the adaptive technique..we need a strategy to limit the # of fractions that we process “Evaluate increasing buffer fractions from 0 to 1 and choose the fraction with the minimum cost”

ICDE How many fractions to consider?  vs cost graph is not linear, it exhibits convexity Convex region represents the candidate region with the minimum cost We can get-away with evaluating the cost for a small k fractions of  Fraction considered Cost (millions)

ICDE How to choose the k fractions? K = 10; t start =32; t end =53 FractionTime rangeCost  1 = 0.11 [ ]90  2 = 0.14 [ ]71  3 = 0.18 [ ]52  4 = 0.23 [ ]37  5 = 0.30 [ ]31  6 = 0.38 [ ]35  7 = 0.48 [ ]41  8 = 0.61 [ ]52  9 = 0.78 [ ]59  10 = 1.0 [ ]71 Acceptable candidates r = t end - t start  1 = r (1/k) /r  i = (r.  1 ) i /r Fraction chosen can be fine-tuned through recursive calls

ICDE Putting it all together Fill Buffer Optimizer Layered Plane Sweep More data? Relation R, S; distance d; Parameter k Evaluate k fractions, choose best Process join on best fraction Read from relations R and S

ICDE Benchmarking Code: Implemented and tested the various alternatives in C/C++ –R-Trees, Simple Sweep, Layered Sweep, Adaptive Sweep with various parameter settings Workload: 2 relations, 100,000 objects (50 GB) –Physics-based Simulation data set –Synthetic data set Hardware: Linux 2.4 GHz pentium Xeon, 1 GB Main memory, 2 IDE drives 15,000 rpm Setup: 64 KB page size, buffer size 10,000 pages

ICDE Collision Data Set 100,000 objects, collision occurs during time range [ ] Snapshot at timetick 1500

ICDE Results - Execution Time for different Strategies % of join completed Execution time (seconds) R-tree simple sweep layered sweep adaptive sweep K=20 K=10 K=5

ICDE Buffer Choices made by the optimizer Virtual time line in the data set Fraction of buffer chosen

ICDE Discussion  R-trees couldn’t do enough pruning to make a difference  Simple plane-sweep works well when there is heavy interaction among objects  Layered plane-sweep works well when there is light interaction  Adaptive version transitions smoothly between these extremes  Recursive call to fine-tune candidate region doesn’t seem to help much

ICDE Conclusion… CPA-Join for spatiotemporal relations Proposed a novel adaptive join algorithm for moving object histories based on extension of the plane-sweep Many practical applications

ICDE Questions? Thank You! Subramanian