STAR-Tree Spatio-Temporal Self Adjusting R-Tree John Tran Duke University Department of Computer Science Adviser: Pankaj K. Agarwal
Problem Large Moving Data Sets Many static data structures exist, but not many account for motion, which is realistic
Examples of Use Geographic Information Systems Air-Traffic Control Protein Interactions Traffic Patterns
Defining the data Can represent data as points in R d For our problem: Set of data points in R 2 : S = {p1, p2, …, pn} Can parameterize points to p i = (x i (t), y i (t)) Piecewise differentiable velocities Bounding boxes can be represented by 2 points
Queries Query 1 – Report all points of S that lie inside rectangle R at time t
Queries Query 2 – Report all points of S that lie inside rectangle R at any time between t 1 and t 2
Queries Query 3 – Report the nearest neighbor of point in S
R-Tree Bounding Box Hierarchy All Children nodes are bound by parents bounding box Points are stored in leaf nodes
STAR-Tree Same concept as R-Tree Incorporate movement into tree structure
Conflicts As bounding boxes change, overlap occurs Need to adjust for these overlap conflicts
QT Implementation
OpenGL Implementation
Road Simplification Road data from US Bureau of Census (TIGER) Paths are determined using Dijkstra’s Shortest Path Algorithm Shapes of these paths are typically simple but include many vertices Simplify path using Douglas-Peucker heuristic (5 vertices max)
Road Simplification Simplify road network TIGER data is not perfect Polygonal chain with vertex lists Sometimes does not match roads that should be matched
Analysis of RDU Roads Vertices with n streets n streets
Analysis of RDU Roads n vertices Streets with n vertices
Road Simplification
Protein Shape Matching
Problem Match two proteins based on similarity or dissimilarity using intramolecular distance comparison
Data Start from PDB files Parse to get vertex list
Calculating Distance Matrix Given a vertex list
Calculating Distance Matrix Given a vertex list
Defining cost -GCTGATACTAGCT | |||| ||||| GGGTGAT-GTAGCT Let g(k) = + (k-1) is the cost of starting a new indel gap is the cost of continuing a gap
Cost Function E(i,j) = min{D(i,j-1) + , E(i,j-1) + } F(i,j) = min{D(i-1,j) + , F(i-1,j) + } D(i,j) = min{D(i-1,j-1) + (i,j), E(i,j), F(i,j)} Where (i,j) = normalized sum of difference distance between Ai and all the matched vertices and Bj to the corresponding matched vertices
Comparing identical Proteins
Test Cases