Download presentation
Presentation is loading. Please wait.
Published byAndrew Dawn Modified over 9 years ago
1
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia
2
Application Find an apartment that is closest to restaurant, bus stop and zoo ‘Closeness’ is measured by a monotonic scoring function Apartment Restaurant Bus Stop Zoo 2
3
Problem Definition 3 Given a set of query points S = {s 1, s 2, … s m } Given n sets of data points T 1, T 2, … T n Find k query points in S, whose aggregated distances to T 1, T 2, … T n are smallest: Distance(s j, {T 1, T 2, … T n }) = f(d(s j, NN(s j, T 1 )), d(s j, NN(s j, T 2 )), … d(s j, NN(s j, T n ))) where NN(s j, T i ) is the nearest neighbour of s j in T i d(s j, NN(s j, T i ) is the distance from s j to its nearest neighbour in T i * For simplicity, we use: d(x, y) is Euclidean Distance f(x 1, x 2,...x m ) =sum(x 1, x 2, …, x m )
4
Related Literature KNN – K Nearest Neighbour Given a query point q and a set of data points I, find k data points in I that are nearest neighbour of q RNN – Reverse Nearest Neighbour Given a query point q and a set of data points I, find k data points of which q is the nearest neighbour ANN – All Nearest Neighbour Given a set of query points Q and a set of data points I, find nearest neighbour in I for each query point in Q (Y.Chen, ICDE2007) Efficient evaluation of all-nearest- neighbor queries In solving our problem, we can retrieve ANN in each type and find top k queries 4
5
Our Contribution We introduced the problem of finding the sites with best accessibilities to amenities We proposed two algorithms to find top-k accessible sites among a set of possible locations We performed experiments on several real datasets 5
6
Baseline Apartment Restaurant Bus Stop Zoo 6 ANN is used to retrieve the nearest neighbour of each query for each type.
7
Baseline - Disadvantage I/O time Query data will be accessed n times, n is the number of types of index objects Memory usage Need find NN for all the query points Need to maintain a list of nearest neighbours of each type of each query 7
8
Separate Tree (Index Construction) Apartment Restaurant Bus Stop Zoo Q1 Q2Q3Q4 Z1 Query Tree Index Tree Z1 R1 R2 R3 R4 R1 R2 R3R4 R1B1 B2 B3 B4 B1 B2 B3B4 Q1 Q2 Q4 Q3 8
9
Separate Tree (Query Processing) Q1 Z1 R1 B1 MAXD={30, 305, 309} MIND={30, 0, 0} LBD=30 UBD=644 current_k_best = 644 9 R1B1 Apartment Restaurant Bus Stop Zoo Z1 R1 B1 Q1 Q2 Q4 Q3 MAXD Maximum distance from Q1 to all the nodes in the list MIND Minimum distance from Q1 to all the nodes in the list UBD Upper bound of the summed distance LBD Lower bound of the summed distance
10
Separate Tree (cont’d) current_k_best = 190 10 Apartment Restaurant Bus Stop Zoo Z1 R1 R2 R3 R4 R1B1 B2 B3 B4 Q1 Q2 Q4 Q3 Z1 R1 R2 R3R4 B1 B2 B3B4 Q1 Q2Q3Q4 Q3 Z1 R4 B2 MAXD={30, 100, 60} MIND={30, 0, 0} LBD=30 UBD=190 R3 Q4 Z1 R4 B3 MAXD={300, 150, 60} MIND={300, 60, 30} B4 LBD=360 UBD=510
11
More Improvement? Data points from different type can be put into one bounding box – To reduce I/O cost 11
12
One Tree (Index Construction) Apartment Restaurant Bus Stop Zoo I1 I2 I6 I3 I4 I5 I1 I2I3I4I5I6 Q1 Q2Q3Q4 I17 I18 I12 I9 I10 I11 I12I11 I7I8I13I14I15I16I9I10I18 I16 I15 I8 I14 I 13 I7 Query Tree Index Tree 12 Q1 Q2 Q4 Q3 Each node has a bitmap that indicates what types are contained in the node
13
One Tree (Query Processing) Apartment Restaurant Bus Stop Zoo Q1 I1 Q1 I1 MAXD={309, 309, 309} MIND={0, 0, 0} LBD=0 UBD=309*3=927 current_k_best = 972 13
14
One Tree (cont’d) Apartment Restaurant Bus Stop Zoo Q1 Q2 Q3 Q4 I1 I2 I6 I3 I4 I5 I1 I2I3I4I5I6 Q1 Q2Q3Q4 Q3 I4 I5 Q4 I6I5 MIND={0, 0, 30} MAXD={50, 50, 30} LBD=30 UBD=130 MIND={30, 30, 140} MAXD={50, 50, 140} LBD=100 UBD=240 current_k_best = 130 14
15
Experiments 15 DataSet: San Francisco Road Network (SF) & Road Network of North America (NA) Spatial query dataset, 2 dimensions Index: ~174k points (totally) Query: ~17k points Algorithm: Baseline Separate Tree One Tree Measurement: CPU time Number of leaf nodes access (I/O time)
16
Results (CPU Time VS. k) 16
17
Results (CPU Time VS. |T|) 17
18
Results (Leaf Node No. VS. k) 18
19
Results (Leaf Node No. VS. |T|) 19
20
Conclusion We proposed two algorithms: Separate tree: creates indexes for different types of points in separate R-trees One tree: indexes all the points in a single R- tree Both algorithms outperform the baseline algorithm with a speed-up up to 5.7 times Also, both algorithms only need access the Query tree once, which reduces I/O cost on accessing Query tree 20
21
21 Thank you! Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.