Mining the Most Influential k-Location Set from Massive Trajectories Y. Li, J. Bao, Y. Li, Y. Zheng, Y. Wu, Z. Gong Presented by Mi Tian, Dhaval Deepark Dholakia, Deepan Ashok Sanghavi 9/21/2018
Motivating Scenarios Advertisement Placement Resource Allocation Place multiple billboards within a region To be seen by more unique people Resource Allocation Place the charging stations for EVs To serve more unique vehicles Chained Business Location Selection Collective location selection, e.g., Starbucks, KFC To sever more people 9/21/2018
The Goal Find k targets to cover the most unique trajectories Two spots to serve gas to the most cars? Two spots to attract the most customers? Two spots to best observe bird migration 9/21/2018
Term Definition Trajectory (tr): A sequence of GPS locations of a moving object through time. Location (v): A meaningful location on the map. Depends on the problem, can be a road intersection, a grid, a stay point, and etc. Spatial Network (G): A graph G = (V, E) formed by Locations and their connections. Coverage (Tr(v)): All trajectories passing a specific location. A B D C E 9/21/2018
Problem Definition Location-based Max-Cover Query Input: Output: Trajectories: T = { 𝑡𝑟 1 , 𝑡𝑟 2 , …, 𝑡𝑟 𝑛 } Road network: G={E, V} Integer k Spatial region R Output: Select k vertices in the spatial region R, which covers the maximum number of unique trajectories. Objective Minimize response time & improve system frequency 9/21/2018
Naïve Solution A B 2 D C E Question: find the k-set where k = 2? Let’s count coverage: Tr (n1) = {A, B} 2 tr Tr (n2) = {B, C, D} 3 tr Tr (n3) = {C, D, E} 3 tr How many different combos? C3 Which combo is the best? Tr (n2) + Tr (n3) = 6 is the biggest number, is it the right answer? It’s about unique trajectories A B 2 D C E 9/21/2018
Naïve Solution (cont.) A B D C E k Question: find the k-set where k = 2? Let’s count unique trajectories: Tr (n1, n2) = {A, B, C, D} 4 tr Tr (n2, n3) = {B, C, D, E} 4 tr Tr (n1, n3) = {A, B, C, D, E} 5 tr n1 & n3 is the winner, why? n2 & n3 had overlapped trajectories {C, D} Again, it’s about unique trajectories But, does this method scale? NO Cn A B D C E k 9/21/2018
Challenges Cn Lots of computation (NP-Hard) Dynamic requests Massive trajectories Large n and k Dynamic requests User can pick any region User can discard certain answers and ask for recalculate Software needs to be interactive (answer quickly) The goal of this paper is to Make It Fast, and Optimal for small or medium region Near optimal for large region Cn k 9/21/2018
System Overview k, region selection 9/21/2018
Pre-Processing Map-Matching Inverted Index Building Map raw trajectory data to road networks Tr={(lat1, lng1, t1), (lat2, lng2, t2), … (latn, lngn, tn)} -> {v1, v2, … vn} Trajectory-Vertex Index Inverted Index Building Vertex - Trajectory V1={trj1, trj2, … , trjn} Vertex-Vertex Index Building Shared trajectories between two vertices Spatial Indexing Index vertex location 9/21/2018
Optimal Solution Cn 11 k Compare (k = 2): Better than all other (G, G) Naïve Approach Group-based Pruning Approach Group the vercies Estimate coverage upper bound for every (G, G) combination Sort (G, G) by upper bound (high to low) For each (G, G) Find best k-set vertices with most unique coverage Could Stop Early! Compare (k = 2): Naïve: C (9, 2) = 36 Group-base Pruning: 3 * 3 (G) + 3 * 3 (V) = 15 (reduced!) Cn k Better than all other (G, G) 11 9/21/2018
Greedy Solution: Overview Optimal solution Not scale to large k and spatial region R Efficient when we need multiple round interactions from users. Greedy solution A good approximation to optimal solution (1-1/e) approximation Idea is very simple Main algorithm framework First Step: Select a set of candidate vertices in spatial region R A k-round selection process Selection the vertex covers most un-covered trajectories Updating remove the covered trajectories 9/21/2018
Greedy Solution: Basic Algorithm E.g., k = 2 Current selection: { 𝑣 1 } { 𝑣 1 , 𝑣 4 } Current covered trajectories: { 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 } 𝑡𝑟 7 , 𝑡𝑟 8 , 𝑡𝑟 9 } 𝑣 1 : 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 𝑣 2 : 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 8 , 𝑡𝑟 10 𝑣 3 : 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 , 𝑡𝑟 7 , 𝑡𝑟 9 𝑣 4 : 𝑡𝑟 1 , 𝑡𝑟 7 , 𝑡𝑟 8 , 𝑡𝑟 9 𝑣 5 : 𝑡𝑟 1 , 𝑡𝑟 3 , 𝑡𝑟 5 𝑣 6 : 𝑡𝑟 2 , 𝑡𝑟 4 , 𝑡𝑟 6 𝑡𝑟 1 : 𝑣 1 , 𝑣 2 , 𝑣 4 , 𝑣 5 𝑡𝑟 2 : 𝑣 1 , 𝑣 2 , 𝑣 6 𝑡𝑟 3 : 𝑣 1 , 𝑣 2 , 𝑣 5 𝑡𝑟 4 : 𝑣 1 , 𝑣 3 , 𝑣 6 𝑡𝑟 5 : 𝑣 1 , 𝑣 3 , 𝑣 5 𝑡𝑟 6 : 𝑣 1 , 𝑣 3 , 𝑣 6 𝑣 1 6 𝑣 2 5 𝑣 3 5 𝑣 4 4 𝑣 5 3 𝑣 6 3 𝑣 1 6 𝑣 2 3 𝑣 3 5 𝑣 4 3 𝑣 5 2 𝑣 6 2 𝑣 1 6 𝑣 2 4 𝑣 3 5 𝑣 4 3 𝑣 5 2 𝑣 6 3 𝑣 1 6 𝑣 2 2 𝑣 3 2 𝑣 4 3 𝑣 5 0 𝑣 6 0 Vertex coverage state Vertex coverage count Trajectories-vertex index 9/21/2018
Greedy Solution: Basic Algorithm Performance analysis The dominant cost is updating phase Scan the trajectory-vertex index “one-by-one” Can not scale to large trajectory dataset 𝑣 1 6 𝑣 2 5 𝑣 3 5 𝑣 4 4 𝑣 5 3 𝑣 6 3 𝑣 1 6 𝑣 2 2 𝑣 3 2 𝑣 4 3 𝑣 5 0 𝑣 6 0 Trajectory-vertex index Is it possible to update the coverage of each node by batch ? 9/21/2018
Greedy Solution : Partition Index Batch Updating Algorithm Main intuition To minimize the trajectory scan operation Update the coverage values by batch Main Techniques Smart Update Decision Index Partition Workload-based Optimization 9/21/2018
Greedy Solution : Partition Index Batch Updating Algorithm In this way, we always scan less number of trajectories for updating. Smart Update Decision Utilize vertex-vertex index Two cases Case 1: Major Coverage Overlap – Apply the basic updating method Case 2: Minor Coverage Overlap – Subtract and add back Vertex Coverage table Trajectory-vertex Index Vertex Coverage table Scan tr4 and tr5 to update 9/21/2018
Greedy Solution : Partition Index Batch Updating Algorithm Index Partition 9/21/2018
Greedy Solution : Partition Index Batch Updating Algorithm Index Partition Why and How? 9/21/2018
Greedy Solution : Partition Index Batch Updating Algorithm Workload-based Optimization Selective Indexing To store vertex-vertex index takes |V| * |V| space, (Tianjing Road has 900k vertices) To make partitions p -> p*|V|*|V| (impossible to store in the memory) Workload-based Partition Not possible to cluster the trajectories based on similarities (|N|*|N| similarity computing) Observation Many vertices are selected in a sequence Many of the vertices will not be selected 9/21/2018
Experiments Dataset Road Networks (Tianjing) Trajectories (Taxies) 99,007 vertices and 133,726 road segments covers a 123 × 187 km2 spatial region with a total length of 32,487 km Trajectories (Taxies) 3,501 taxicabs from Tianjin in 61 days. It contains 4,509,519 trajectories average sampling rate is 24:05 seconds per point 9/21/2018
Trajectory Distributions in Tianjin 9/21/2018
Ubuntu 12.04 Machine Intel Core 6-Cores (12-Threads) i7-3930K 3.2GHz and 16GBytes of main memory 9/21/2018
Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm scanned trajectories (bars) and the processing time (lines) for vertex selection iteration. Aim: Mine 10-location set PIBU is 5.02 times faster Basic Algorithm : 905, 623 trajectories PIBU : 87, 330 trajectories 9/21/2018
Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm the processing time (lines) and the total scanned trajectories (bars) for the two approaches, with different k values PIBU is 3.9 times faster 9/21/2018
Processing time (lines) and the number scanned trajectories (bars) versus the query region sizes PIBU 3.8 times faster Processing time (lines) and the number scanned trajectories (bars) by varying the size of trajectories datasets PIBU 3.2 times faster 9/21/2018
Case Study: Advertisement Placement Task: Put three billboards in New York City (NYC) for promotion Dataset: Location based social networking check ins dataset Divided the city into equal sized grids (a) graph : multiple check ins by same users (b) graph : Overlapped users in selected areas (c0 graph : Result of paper solution 9/21/2018
Case Study: Charging Station Placement Aim domain constraints: Space for parking POI categories locations Two location should be far enough 9/21/2018
Conclusion Most influential k-location set mining problem Covers Optimal and Approximate solutions Optimal solution works of small regions and k values Approximate works better on large regions and k values 2 case studies: Billboard placement in NYC based on location-based social network data EV charging station placement in Beijing 9/21/2018
Further Extensions Weighted Location Weighted Trajectories What if the prices for selecting the locations are different Weighted Trajectories What if each person (trajectory) has different profile, and you want to make advertisement for different items Spatio-temporal Selection What about the bar, night clubs? They just care about people travel at night.. 9/21/2018
Thanks Q A input hidden output 9/21/2018