Da Yan, Raymond Chi-Wing Wong, and Wilfred Ng The Hong Kong University of Science and Technology.

Slides:



Advertisements
Similar presentations
The Optimal-Location Query
Advertisements

Nearest Neighbor Search
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Fast Algorithms For Hierarchical Range Histogram Constructions
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
 Distance Problems: › Post Office Problem › Nearest Neighbors and Closest Pair › Largest Empty and Smallest Enclosing Circle  Sub graphs of Delaunay.
Greedy Algorithms Greed is good. (Some of the time)
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
A (1+  )-Approximation Algorithm for 2-Line-Center P.K. Agarwal, C.M. Procopiuc, K.R. Varadarajan Computational Geometry 2003.
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Computational Geometry and Spatial Data Mining
SASH Spatial Approximation Sample Hierarchy
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
June 3, 2015Windows Scheduling Problems for Broadcast System 1 Amotz Bar-Noy, and Richard E. Ladner Presented by Qiaosheng Shi.
2-dimensional indexing structure
Polynomial-Time Approximation Schemes for Geometric Intersection Graphs Authors: T. Erlebach, L. Jansen, and E. Seidel Presented by: Ping Luo 10/17/2005.
On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.
1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.
An efficient algorithm for optimizing whole genome alignment with noise P. Wong, T. Lam, N. Lu, H. Ting, and S. Yiu Department of Computer Science, University.
Randomized Planning for Short Inspection Paths Tim Danner Lydia E. Kavraki Department of Computer Science Rice University.
The Load Distance Balancing Problem Eddie Bortnikov (Yahoo!) Samir Khuller (Maryland) Yishay Mansour (Google) Seffi Naor (Technion)
1 Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network Prof. Yu-Chee Tseng Department of Computer Science National Chiao-Tung University.
Improved Approximation Bounds for Planar Point Pattern Matching (under rigid motions) Minkyoung Cho Department of Computer Science University of Maryland.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
The Travelling Salesman Algorithm A Salesman has to visit lots of different stores and return to the starting base On a graph this means visiting every.
FLANN Fast Library for Approximate Nearest Neighbors
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Network Aware Resource Allocation in Distributed Clouds.
ENCI 303 Lecture PS-19 Optimization 2
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Dynamic Covering for Recommendation Systems Ioannis Antonellis Anish Das Sarma Shaddin Dughmi.
CSC 211 Data Structures Lecture 13
1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
© The McGraw-Hill Companies, Inc., Chapter 12 On-Line Algorithms.
CS 3343: Analysis of Algorithms Lecture 19: Introduction to Greedy Algorithms.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
11 -1 Chapter 12 On-Line Algorithms On-Line Algorithms On-line algorithms are used to solve on-line problems. The disk scheduling problem The requests.
Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network You-Chiun Wang, Chun-Chi Hu, and Yu-Chee Tseng IEEE Transactions on Mobile Computing.
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
Clustering Data Streams A presentation by George Toderici.
On the Intersection of Inverted Lists Yangjun Chen and Weixin Shen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Prof. Yu-Chee Tseng Department of Computer Science
Progressive Computation of The Min-Dist Optimal-Location Query
Bin Fu Department of Computer Science
Finding Fastest Paths on A Road Network with Speed Patterns
Locality Sensitive Hashing
Spatial Indexing I R-trees
Continuous Density Queries for Moving Objects
Donghui Zhang, Tian Xia Northeastern University
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Da Yan, Raymond Chi-Wing Wong, and Wilfred Ng The Hong Kong University of Science and Technology

Outline Introduction FILM Algorithm Experiments Conclusion 1

Introduction Given a set S of servers and a set C of clients, where to set up a new server to attract the greatest number of clients? 2 s1s1 s2s2 c1c1 c2c2 c3c3 c4c4 c5c5 S —— Convenience storesC —— Customers Where to set up a new store s 3 ?

s 3 wins customer c 1 from s 1 Introduction Assume that a client always visits its nearest server 3 s1s1 s2s2 c1c1 c2c2 c3c3 c4c4 c5c5 S —— Convenience storesC —— Customers s3s3 Customer c 1 ’s distance to its NN s 1

s 3 wins customer c 3 from s 2 Customer c 3 ’s distance to its NN s 2 Introduction Assume that a client always visits its nearest server 4 s1s1 s2s2 c1c1 c2c2 c3c3 c4c4 c5c5 S —— Convenience storesC —— Customers s3s3

The more overlap, the better s 3 wins customer c i from NN( c i ) if s 3 locates in NLC( c i ) Introduction 5 s1s1 s2s2 c1c1 c2c2 c3c3 c4c4 c5c5 S —— Convenience storesC —— Customers s3s3 Nearest Location Circle (NLC) NLC( c i ): a circle with center c i and radius ||c i, NN(c i )||

Region for optimal locations Introduction 6 s1s1 s2s2 c1c1 c2c2 c3c3 c4c4 c5c5 S —— Convenience storesC —— Customers ① ① ① ① ② ② ② ② ② ③ ③ ④ ④ ④ ④ ③ ③ ⑤ Nearest Location Circle (NLC) NLC( c i ): a circle with center c i and radius ||c i, NN(c i )||

Introduction 7 Other Applications Profile-based marketing Emergency schedules Military medical supply ……

Introduction 8 Limitation 1 : A client may not always visit its nearest server A restaurant 55 m away that serves better food is more attractive even if the nearest restaurant is 40 m away However, people may be reluctant to go to a restaurant 500 m away

Introduction 9 Relaxed Nearest Location Circle (RNLC) RNLC( c i ): a circle with center c i and radius (1+α)·||c i, NN(c i )||, where α > 0 cici s i = NN(c i ) NLC(c i )RNLC(c i )

Introduction 10 Influence Value Given a location p, its influence value inf(p) is the number of clients c i ∈ C such that p ∈ RNLC(c i ) Relaxed Optimal Location Query Given a set S of servers and a set C of clients, return a location p with maximum inf(p) K-Influential Location Query Locating k new servers to maximize the total number of clients attracted “collectively”

Introduction 11 Limitation 2 : Fastest existing algorithm is MaxOverlap (VLDB’ 09 ) MaxOverlap checks the intersection points between the NLC boundaries Time complexity of MaxOverlap is super- quadratic to the number of clients MaxOverlap takes hours to answer an optimal location query on typical real world datasets

Outline Introduction FILM Algorithm Experiments Conclusion 12

FILM Algorithm 13 Basic Algorithm: Bulk-load a balanced kd-tree on the server points in S For each client c ∈ C, find server s=NN(c) to obtain NLC(c) “Draw” the NLCs on the grid partitioning of the space How?

FILM Algorithm 14 Grid Partitioning: Each grid cell is a small square with side length ε A counter is attached with each grid cell to record the number of NLCs overlapping with it, which is initialized to 0 When “drawing” each NLC, we add counters of its overlapping grid cells by 1

FILM Algorithm 15 Grid Cells with Counters Added:

FILM Algorithm 16 Analysis If a grid cell g overlaps with NLC(c) with radius r ≥ δε (δ > 1), then any location in g is within the RNLC(c) with α ≥ sqrt(2)/δ ε r ≥ δεr ≥ δε r' c s s' ||c, s’|| ≤ r + sqrt(2) ε ≤ r + sqrt(2) (r/δ) ≤ (1 + sqrt(2)/δ) r ≤ (1 + α) r

FILM Algorithm 17 Relationship between α and δ For a grid with grid side length ε, δε defines the lower bound of the radius of any NLC “drawn” on it On the one hand, we require α ≥ sqrt(2)/δ, or δ ≤ sqrt(2)/α On the other hand, smaller ε leads to better approximation, and thus we want δ ≤ r/ε to be as large as possible So we have δ = sqrt(2)/α

FILM Algorithm 18 Grid cell counter value is a conservative estimation of the influence value of any location in it NLC(c) RNLC(c) Overlap with RNLC, but without counter added As δ → +∞ (α → 0 + ), Pr{underestimation} → 0

FILM Algorithm 19 Grid cell storage Grid cell format: key-value pair with key being the cell index and the value being the cell counter Cells are organized by a balanced search tree Only those cells that overlap with at least one NLC are stored in the tree

FILM Algorithm 20 Adaptive Gird Hierarchy One grid is insufficient for “drawing” all NLCs whose radius can be different by orders of magnitude Large NLCs may involve too many cells We need to adapt the grid structure to NLC size automatically

FILM Algorithm 21 Adaptive Gird Hierarchy Given a grid structure with grid side length ε, any NLC “drawn” on it should have radius δε ≤ r < δ 2 ε FILM uses a set of grids such that consecutive grids have grid side lengths being different by a factor of δ

FILM Algorithm 22 Algorithm for Influential Location Query Build grid hierarchy from NLCs Sort NLCs in non-decreasing order of radius A pass through the sorted list allocates the NLCs to the corresponding grids Evaluate the influence value estimation of each grid cell and pick the maximum one

FILM Algorithm 23 GList len ε min [start, end] [1, i 1 ] treeØ Sorted NLC List …… … len δε min [start, end] [i 1 +1, i 2 ] treeØ len δ 2 ε min [start, end] [i 2 +1, i 3 ] treeØ … i3i3 i 2 +2 … i 2 +1 i2i2 i 1 +2i 1 +1i1i1 21 C’ Algorithm for Influential Location Query Entry for a grid Grid side length Binary search tree to store grid cells Smallest radius r min ε min is chosen as r min /δ

FILM Algorithm 24 Algorithm for Influential Location Query Since each grid only handles the NLCs of a subset of clients, the counter value of a grid cell g is just a conservative influence value estimation on this subset To get a conservative influence value estimation for a cell in terms of the whole client set, we need to sum up the counter values of all its covering cells in the upper level grids, besides its own counter value

FILM Algorithm 25 Illustration NLCs c 1 and c 2 are drawn on the lower level grid NLCs c 3 and c 4 are drawn on the higher level grid counter(A) = 2 counter(g) = 2 O A c1c1 c2c2 c3c3 c4c4 Cell g

FILM Algorithm 26 Illustration c 3 and c 4 overlap with Cell g All locations in Cell g are in the RNLCs of c 3 and c 4 All locations in Cell A are in the RNLCs of c 3 and c 4 inf(A) = counter(A) + counter(g) = 4 O A c1c1 c2c2 c3c3 c4c4 Cell g

FILM Algorithm 27 K-Influential Location Query Equivalent to maximum coverage problem Though NP-hard, the greedy algorithm of choosing a subset which contains the largest number of uncovered elements at each stage, achieves an approximation ratio of 1 − 1/e

FILM Algorithm 28 K-Influential Location Query Our algorithm (after the previous cell g p is picked) Find the NLCs overlapping with g p, and cancel them out from the grid hierarchy (i.e. subtract the counters of relevant cells) Only grids of g p ’s level and higher are checked Pick the cell with maximum influence value estimation from the grid hierarchy as the next result cell

Outline Introduction FILM Algorithm Experiments Conclusion 29

Experiments 30 Real Dataset Populated places and cultural landmarks in North American, available from RTreePortal Other datasets from prior work (i.e. MaxOverlap)

Experiments 31 Result Quality Let the result cell be g, Ratio NLC = inf(g) / inf(OPT)

Experiments 32 Running Time Results from the NA dataset

Outline Introduction FILM Algorithm Experiments Conclusion 33

Conclusion 34 An efficient influential location miner called FILM is designed, which returns a small grid cell in which all locations have an influence guarantee FILM returns near-optimal locations in considerably less time than existing approaches FILM is practical for time-critical applications that require short response time of finding influential locations

Thank you! 35