Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining the Most Influential k-Location Set from Massive Trajectories

Similar presentations


Presentation on theme: "Mining the Most Influential k-Location Set from Massive Trajectories"β€” Presentation transcript:

1 Mining the Most Influential k-Location Set from Massive Trajectories
Y. Li, J. Bao, Y. Li, Y. Zheng, Y. Wu, Z. Gong Presented by Mi Tian, Dhaval Deepark Dholakia, Deepan Ashok Sanghavi 9/21/2018

2 Motivating Scenarios Advertisement Placement Resource Allocation
Place multiple billboards within a region To be seen by more unique people Resource Allocation Place the charging stations for EVs To serve more unique vehicles Chained Business Location Selection Collective location selection, e.g., Starbucks, KFC To sever more people 9/21/2018

3 The Goal Find k targets to cover the most unique trajectories
Two spots to serve gas to the most cars? Two spots to attract the most customers? Two spots to best observe bird migration 9/21/2018

4 Term Definition Trajectory (tr): A sequence of GPS locations of a moving object through time. Location (v): A meaningful location on the map. Depends on the problem, can be a road intersection, a grid, a stay point, and etc. Spatial Network (G): A graph G = (V, E) formed by Locations and their connections. Coverage (Tr(v)): All trajectories passing a specific location. A B D C E 9/21/2018

5 Problem Definition Location-based Max-Cover Query Input: Output:
Trajectories: T = { π‘‘π‘Ÿ 1 , π‘‘π‘Ÿ 2 , …, π‘‘π‘Ÿ 𝑛 } Road network: G={E, V} Integer k Spatial region R Output: Select k vertices in the spatial region R, which covers the maximum number of unique trajectories. Objective Minimize response time & improve system frequency 9/21/2018

6 NaΓ―ve Solution A B 2 D C E Question: find the k-set where k = 2?
Let’s count coverage: Tr (n1) = {A, B} οƒ  2 tr Tr (n2) = {B, C, D} οƒ  3 tr Tr (n3) = {C, D, E} οƒ  3 tr How many different combos? C3 Which combo is the best? Tr (n2) + Tr (n3) = 6 is the biggest number, is it the right answer? It’s about unique trajectories A B 2 D C E 9/21/2018

7 NaΓ―ve Solution (cont.) A B D C E k
Question: find the k-set where k = 2? Let’s count unique trajectories: Tr (n1, n2) = {A, B, C, D} οƒ  4 tr Tr (n2, n3) = {B, C, D, E} οƒ  4 tr Tr (n1, n3) = {A, B, C, D, E} οƒ  5 tr n1 & n3 is the winner, why? n2 & n3 had overlapped trajectories {C, D} Again, it’s about unique trajectories But, does this method scale? NO οƒ  Cn A B D C E k 9/21/2018

8 Challenges Cn Lots of computation (NP-Hard) Dynamic requests
Massive trajectories Large n and k οƒ  Dynamic requests User can pick any region User can discard certain answers and ask for recalculate Software needs to be interactive (answer quickly) The goal of this paper is to Make It Fast, and Optimal for small or medium region Near optimal for large region Cn k 9/21/2018

9 System Overview k, region selection 9/21/2018

10 Pre-Processing Map-Matching Inverted Index Building
Map raw trajectory data to road networks Tr={(lat1, lng1, t1), (lat2, lng2, t2), … (latn, lngn, tn)} -> {v1, v2, … vn} Trajectory-Vertex Index Inverted Index Building Vertex - Trajectory V1={trj1, trj2, … , trjn} Vertex-Vertex Index Building Shared trajectories between two vertices Spatial Indexing Index vertex location 9/21/2018

11 Optimal Solution Cn 11 k Compare (k = 2): Better than all other (G, G)
NaΓ―ve Approach οƒ  Group-based Pruning Approach Group the vercies Estimate coverage upper bound for every (G, G) combination Sort (G, G) by upper bound (high to low) For each (G, G) Find best k-set vertices with most unique coverage Could Stop Early! Compare (k = 2): NaΓ―ve: C (9, 2) = 36 Group-base Pruning: 3 * 3 (G) + 3 * 3 (V) = 15 (reduced!) Cn k Better than all other (G, G) 11 9/21/2018

12 Greedy Solution: Overview
Optimal solution Not scale to large k and spatial region R Efficient when we need multiple round interactions from users. Greedy solution A good approximation to optimal solution (1-1/e) approximation Idea is very simple Main algorithm framework First Step: Select a set of candidate vertices in spatial region R A k-round selection process Selection οƒ  the vertex covers most un-covered trajectories Updating οƒ  remove the covered trajectories 9/21/2018

13 Greedy Solution: Basic Algorithm
E.g., k = 2 Current selection: { 𝑣 1 } { 𝑣 1 , 𝑣 4 } Current covered trajectories: { π‘‘π‘Ÿ 1 , π‘‘π‘Ÿ 2 , π‘‘π‘Ÿ 3 , π‘‘π‘Ÿ 4 , π‘‘π‘Ÿ 5 , π‘‘π‘Ÿ 6 } π‘‘π‘Ÿ 7 , π‘‘π‘Ÿ 8 , π‘‘π‘Ÿ 9 } 𝑣 1 : π‘‘π‘Ÿ 1 , π‘‘π‘Ÿ 2 , π‘‘π‘Ÿ 3 , π‘‘π‘Ÿ 4 , π‘‘π‘Ÿ 5 , π‘‘π‘Ÿ 6 𝑣 2 : π‘‘π‘Ÿ 1 , π‘‘π‘Ÿ 2 , π‘‘π‘Ÿ 3 , π‘‘π‘Ÿ 8 , π‘‘π‘Ÿ 10 𝑣 3 : π‘‘π‘Ÿ 4 , π‘‘π‘Ÿ 5 , π‘‘π‘Ÿ 6 , π‘‘π‘Ÿ 7 , π‘‘π‘Ÿ 9 𝑣 4 : π‘‘π‘Ÿ 1 , π‘‘π‘Ÿ 7 , π‘‘π‘Ÿ 8 , π‘‘π‘Ÿ 9 𝑣 5 : π‘‘π‘Ÿ 1 , π‘‘π‘Ÿ 3 , π‘‘π‘Ÿ 5 𝑣 6 : π‘‘π‘Ÿ 2 , π‘‘π‘Ÿ 4 , π‘‘π‘Ÿ 6 π‘‘π‘Ÿ 1 : 𝑣 1 , 𝑣 2 , 𝑣 4 , 𝑣 5 π‘‘π‘Ÿ 2 : 𝑣 1 , 𝑣 2 , 𝑣 6 π‘‘π‘Ÿ 3 : 𝑣 1 , 𝑣 2 , 𝑣 5 π‘‘π‘Ÿ 4 : 𝑣 1 , 𝑣 3 , 𝑣 6 π‘‘π‘Ÿ 5 : 𝑣 1 , 𝑣 3 , 𝑣 5 π‘‘π‘Ÿ 6 : 𝑣 1 , 𝑣 3 , 𝑣 6 𝑣 1 οƒ  6 𝑣 2 οƒ  5 𝑣 3 οƒ  5 𝑣 4 οƒ  4 𝑣 5 οƒ  3 𝑣 6 οƒ  3 𝑣 1 οƒ  6 𝑣 2 οƒ  3 𝑣 3 οƒ  5 𝑣 4 οƒ  3 𝑣 5 οƒ  2 𝑣 6 οƒ  2 𝑣 1 οƒ  6 𝑣 2 οƒ  4 𝑣 3 οƒ  5 𝑣 4 οƒ  3 𝑣 5 οƒ  2 𝑣 6 οƒ  3 𝑣 1 οƒ  6 𝑣 2 οƒ  2 𝑣 3 οƒ  2 𝑣 4 οƒ  3 𝑣 5 οƒ  0 𝑣 6 οƒ  0 Vertex coverage state Vertex coverage count Trajectories-vertex index 9/21/2018

14 Greedy Solution: Basic Algorithm
Performance analysis The dominant cost is updating phase Scan the trajectory-vertex index β€œone-by-one” Can not scale to large trajectory dataset 𝑣 1 οƒ  6 𝑣 2 οƒ  5 𝑣 3 οƒ  5 𝑣 4 οƒ  4 𝑣 5 οƒ  3 𝑣 6 οƒ  3 𝑣 1 οƒ  6 𝑣 2 οƒ  2 𝑣 3 οƒ  2 𝑣 4 οƒ  3 𝑣 5 οƒ  0 𝑣 6 οƒ  0 Trajectory-vertex index Is it possible to update the coverage of each node by batch ? 9/21/2018

15 Greedy Solution : Partition Index Batch Updating Algorithm
Main intuition To minimize the trajectory scan operation Update the coverage values by batch Main Techniques Smart Update Decision Index Partition Workload-based Optimization 9/21/2018

16 Greedy Solution : Partition Index Batch Updating Algorithm
In this way, we always scan less number of trajectories for updating. Smart Update Decision Utilize vertex-vertex index Two cases Case 1: Major Coverage Overlap – Apply the basic updating method Case 2: Minor Coverage Overlap – Subtract and add back Vertex Coverage table Trajectory-vertex Index Vertex Coverage table Scan tr4 and tr5 to update 9/21/2018

17 Greedy Solution : Partition Index Batch Updating Algorithm
Index Partition 9/21/2018

18 Greedy Solution : Partition Index Batch Updating Algorithm
Index Partition Why and How? 9/21/2018

19 Greedy Solution : Partition Index Batch Updating Algorithm
Workload-based Optimization Selective Indexing To store vertex-vertex index takes |V| * |V| space, (Tianjing Road has 900k vertices) To make partitions p -> p*|V|*|V| (impossible to store in the memory) Workload-based Partition Not possible to cluster the trajectories based on similarities (|N|*|N| similarity computing) Observation Many vertices are selected in a sequence Many of the vertices will not be selected 9/21/2018

20 Experiments Dataset Road Networks (Tianjing) Trajectories (Taxies)
99,007 vertices and 133,726 road segments covers a 123 Γ— 187 km2 spatial region with a total length of 32,487 km Trajectories (Taxies) 3,501 taxicabs from Tianjin in 61 days. It contains 4,509,519 trajectories average sampling rate is 24:05 seconds per point 9/21/2018

21 Trajectory Distributions in Tianjin
9/21/2018

22 Ubuntu Machine Intel Core 6-Cores (12-Threads) i7-3930K 3.2GHz and 16GBytes of main memory 9/21/2018

23 Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm
scanned trajectories (bars) and the processing time (lines) for vertex selection iteration. Aim: Mine 10-location set PIBU is 5.02 times faster Basic Algorithm : 905, 623 trajectories PIBU : 87, 330 trajectories 9/21/2018

24 Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm
the processing time (lines) and the total scanned trajectories (bars) for the two approaches, with different k values PIBU is 3.9 times faster 9/21/2018

25 Processing time (lines) and the number scanned trajectories (bars) versus the query region sizes
PIBU 3.8 times faster Processing time (lines) and the number scanned trajectories (bars) by varying the size of trajectories datasets PIBU 3.2 times faster 9/21/2018

26 Case Study: Advertisement Placement
Task: Put three billboards in New York City (NYC) for promotion Dataset: Location based social networking check ins dataset Divided the city into equal sized grids (a) graph : multiple check ins by same users (b) graph : Overlapped users in selected areas (c0 graph : Result of paper solution 9/21/2018

27 Case Study: Charging Station Placement
Aim domain constraints: Space for parking POI categories locations Two location should be far enough 9/21/2018

28 Conclusion Most influential k-location set mining problem
Covers Optimal and Approximate solutions Optimal solution works of small regions and k values Approximate works better on large regions and k values 2 case studies: Billboard placement in NYC based on location-based social network data EV charging station placement in Beijing 9/21/2018

29 Further Extensions Weighted Location Weighted Trajectories
What if the prices for selecting the locations are different Weighted Trajectories What if each person (trajectory) has different profile, and you want to make advertisement for different items Spatio-temporal Selection What about the bar, night clubs? They just care about people travel at night.. 9/21/2018

30 Thanks Q A input hidden output 9/21/2018


Download ppt "Mining the Most Influential k-Location Set from Massive Trajectories"

Similar presentations


Ads by Google