Mining the Most Influential k-Location Set from Massive Trajectories

Mining the Most Influential k-Location Set from Massive Trajectories
Y. Li, J. Bao, Y. Li, Y. Zheng, Y. Wu, Z. Gong Presented by Mi Tian, Dhaval Deepark Dholakia, Deepan Ashok Sanghavi 9/21/2018

Motivating Scenarios Advertisement Placement Resource Allocation
Place multiple billboards within a region To be seen by more unique people Resource Allocation Place the charging stations for EVs To serve more unique vehicles Chained Business Location Selection Collective location selection, e.g., Starbucks, KFC To sever more people 9/21/2018

The Goal Find k targets to cover the most unique trajectories
Two spots to serve gas to the most cars? Two spots to attract the most customers? Two spots to best observe bird migration 9/21/2018

Term Definition Trajectory (tr): A sequence of GPS locations of a moving object through time. Location (v): A meaningful location on the map. Depends on the problem, can be a road intersection, a grid, a stay point, and etc. Spatial Network (G): A graph G = (V, E) formed by Locations and their connections. Coverage (Tr(v)): All trajectories passing a specific location. A B D C E 9/21/2018

Problem Definition Location-based Max-Cover Query Input: Output:
Trajectories: T = { 𝑡𝑟 1 , 𝑡𝑟 2 , …, 𝑡𝑟 𝑛 } Road network: G={E, V} Integer k Spatial region R Output: Select k vertices in the spatial region R, which covers the maximum number of unique trajectories. Objective Minimize response time & improve system frequency 9/21/2018

Naïve Solution A B 2 D C E Question: find the k-set where k = 2?
Let’s count coverage: Tr (n1) = {A, B}  2 tr Tr (n2) = {B, C, D}  3 tr Tr (n3) = {C, D, E}  3 tr How many different combos? C3 Which combo is the best? Tr (n2) + Tr (n3) = 6 is the biggest number, is it the right answer? It’s about unique trajectories A B 2 D C E 9/21/2018

Naïve Solution (cont.) A B D C E k
Question: find the k-set where k = 2? Let’s count unique trajectories: Tr (n1, n2) = {A, B, C, D}  4 tr Tr (n2, n3) = {B, C, D, E}  4 tr Tr (n1, n3) = {A, B, C, D, E}  5 tr n1 & n3 is the winner, why? n2 & n3 had overlapped trajectories {C, D} Again, it’s about unique trajectories But, does this method scale? NO  Cn A B D C E k 9/21/2018

Challenges Cn Lots of computation (NP-Hard) Dynamic requests
Massive trajectories Large n and k  Dynamic requests User can pick any region User can discard certain answers and ask for recalculate Software needs to be interactive (answer quickly) The goal of this paper is to Make It Fast, and Optimal for small or medium region Near optimal for large region Cn k 9/21/2018

System Overview k, region selection 9/21/2018

Pre-Processing Map-Matching Inverted Index Building
Map raw trajectory data to road networks Tr={(lat1, lng1, t1), (lat2, lng2, t2), … (latn, lngn, tn)} -> {v1, v2, … vn} Trajectory-Vertex Index Inverted Index Building Vertex - Trajectory V1={trj1, trj2, … , trjn} Vertex-Vertex Index Building Shared trajectories between two vertices Spatial Indexing Index vertex location 9/21/2018

Optimal Solution Cn 11 k Compare (k = 2): Better than all other (G, G)
Naïve Approach  Group-based Pruning Approach Group the vercies Estimate coverage upper bound for every (G, G) combination Sort (G, G) by upper bound (high to low) For each (G, G) Find best k-set vertices with most unique coverage Could Stop Early! Compare (k = 2): Naïve: C (9, 2) = 36 Group-base Pruning: 3 * 3 (G) + 3 * 3 (V) = 15 (reduced!) Cn k Better than all other (G, G) 11 9/21/2018

Greedy Solution： Overview
Optimal solution Not scale to large k and spatial region R Efficient when we need multiple round interactions from users. Greedy solution A good approximation to optimal solution (1-1/e) approximation Idea is very simple Main algorithm framework First Step: Select a set of candidate vertices in spatial region R A k-round selection process Selection  the vertex covers most un-covered trajectories Updating  remove the covered trajectories 9/21/2018

Greedy Solution： Basic Algorithm
E.g., k = 2 Current selection: { 𝑣 1 } { 𝑣 1 , 𝑣 4 } Current covered trajectories: { 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 } 𝑡𝑟 7 , 𝑡𝑟 8 , 𝑡𝑟 9 } 𝑣 1 : 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 𝑣 2 : 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 8 , 𝑡𝑟 10 𝑣 3 : 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 , 𝑡𝑟 7 , 𝑡𝑟 9 𝑣 4 : 𝑡𝑟 1 , 𝑡𝑟 7 , 𝑡𝑟 8 , 𝑡𝑟 9 𝑣 5 : 𝑡𝑟 1 , 𝑡𝑟 3 , 𝑡𝑟 5 𝑣 6 : 𝑡𝑟 2 , 𝑡𝑟 4 , 𝑡𝑟 6 𝑡𝑟 1 : 𝑣 1 , 𝑣 2 , 𝑣 4 , 𝑣 5 𝑡𝑟 2 : 𝑣 1 , 𝑣 2 , 𝑣 6 𝑡𝑟 3 : 𝑣 1 , 𝑣 2 , 𝑣 5 𝑡𝑟 4 : 𝑣 1 , 𝑣 3 , 𝑣 6 𝑡𝑟 5 : 𝑣 1 , 𝑣 3 , 𝑣 5 𝑡𝑟 6 : 𝑣 1 , 𝑣 3 , 𝑣 6 𝑣 1  6 𝑣 2  5 𝑣 3  5 𝑣 4  4 𝑣 5  3 𝑣 6  3 𝑣 1  6 𝑣 2  3 𝑣 3  5 𝑣 4  3 𝑣 5  2 𝑣 6  2 𝑣 1  6 𝑣 2  4 𝑣 3  5 𝑣 4  3 𝑣 5  2 𝑣 6  3 𝑣 1  6 𝑣 2  2 𝑣 3  2 𝑣 4  3 𝑣 5  0 𝑣 6  0 Vertex coverage state Vertex coverage count Trajectories-vertex index 9/21/2018

Greedy Solution： Basic Algorithm
Performance analysis The dominant cost is updating phase Scan the trajectory-vertex index “one-by-one” Can not scale to large trajectory dataset 𝑣 1  6 𝑣 2  5 𝑣 3  5 𝑣 4  4 𝑣 5  3 𝑣 6  3 𝑣 1  6 𝑣 2  2 𝑣 3  2 𝑣 4  3 𝑣 5  0 𝑣 6  0 Trajectory-vertex index Is it possible to update the coverage of each node by batch ? 9/21/2018

Greedy Solution ： Partition Index Batch Updating Algorithm
Main intuition To minimize the trajectory scan operation Update the coverage values by batch Main Techniques Smart Update Decision Index Partition Workload-based Optimization 9/21/2018

In this way, we always scan less number of trajectories for updating. Smart Update Decision Utilize vertex-vertex index Two cases Case 1： Major Coverage Overlap – Apply the basic updating method Case 2： Minor Coverage Overlap – Subtract and add back Vertex Coverage table Trajectory-vertex Index Vertex Coverage table Scan tr4 and tr5 to update 9/21/2018

Index Partition 9/21/2018

Index Partition Why and How? 9/21/2018

Workload-based Optimization Selective Indexing To store vertex-vertex index takes |V| * |V| space, (Tianjing Road has 900k vertices) To make partitions p -> p*|V|*|V| (impossible to store in the memory) Workload-based Partition Not possible to cluster the trajectories based on similarities (|N|*|N| similarity computing) Observation Many vertices are selected in a sequence Many of the vertices will not be selected 9/21/2018

Experiments Dataset Road Networks (Tianjing) Trajectories (Taxies)
99,007 vertices and 133,726 road segments covers a 123 × 187 km2 spatial region with a total length of 32,487 km Trajectories (Taxies) 3,501 taxicabs from Tianjin in 61 days. It contains 4,509,519 trajectories average sampling rate is 24:05 seconds per point 9/21/2018

Trajectory Distributions in Tianjin
9/21/2018

Ubuntu Machine Intel Core 6-Cores (12-Threads) i7-3930K 3.2GHz and 16GBytes of main memory 9/21/2018

Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm
scanned trajectories (bars) and the processing time (lines) for vertex selection iteration. Aim: Mine 10-location set PIBU is 5.02 times faster Basic Algorithm : 905, 623 trajectories PIBU : 87, 330 trajectories 9/21/2018

Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm
the processing time (lines) and the total scanned trajectories (bars) for the two approaches, with different k values PIBU is 3.9 times faster 9/21/2018

Processing time (lines) and the number scanned trajectories (bars) versus the query region sizes
PIBU 3.8 times faster Processing time (lines) and the number scanned trajectories (bars) by varying the size of trajectories datasets PIBU 3.2 times faster 9/21/2018

Case Study: Advertisement Placement
Task: Put three billboards in New York City (NYC) for promotion Dataset: Location based social networking check ins dataset Divided the city into equal sized grids (a) graph : multiple check ins by same users (b) graph : Overlapped users in selected areas (c0 graph : Result of paper solution 9/21/2018

Case Study: Charging Station Placement
Aim domain constraints: Space for parking POI categories locations Two location should be far enough 9/21/2018

Conclusion Most influential k-location set mining problem
Covers Optimal and Approximate solutions Optimal solution works of small regions and k values Approximate works better on large regions and k values 2 case studies: Billboard placement in NYC based on location-based social network data EV charging station placement in Beijing 9/21/2018

Further Extensions Weighted Location Weighted Trajectories
What if the prices for selecting the locations are different Weighted Trajectories What if each person (trajectory) has different profile, and you want to make advertisement for different items Spatio-temporal Selection What about the bar, night clubs? They just care about people travel at night.. 9/21/2018

Thanks Q A input hidden output 9/21/2018

Mining the Most Influential k-Location Set from Massive Trajectories

Similar presentations

Presentation on theme: "Mining the Most Influential k-Location Set from Massive Trajectories"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining the Most Influential k-Location Set from Massive Trajectories

Similar presentations

Presentation on theme: "Mining the Most Influential k-Location Set from Massive Trajectories"— Presentation transcript:

Similar presentations

About project

Feedback