Mining the Most Influential k-Location Set from Massive Trajectories

Slides:



Advertisements
Similar presentations
CrowdER - Crowdsourcing Entity Resolution
Advertisements

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz Liang Tang, Tao Li, Larisa Shwartz1 Liang Tang, Tao Li
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Evaluating Path Queries over Route Collections Panagiotis Bouros NTUA, Greece (supervised by Y. Vassiliou)
Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Travel Time Estimation of a Path using Sparse Trajectories
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
Critical Analysis Presentation: T-Drive: Driving Directions based on Taxi Trajectories Authors of Paper: Jing Yuan, Yu Zheng, Chengyang Zhang, Weilei Xie,
Computational problems, algorithms, runtime, hardness
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Answering Similar Region Search Queries Chang Sheng, Yu Zheng.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
CS4432: Database Systems II Query Processing- Part 2.
Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Introduction to NP Instructor: Neelima Gupta 1.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Managing Massive Trajectories on the Cloud
Cohesive Subgraph Computation over Large Graphs
Presented by: Mi Tian, Deepan Sanghavi, Dhaval Dholakia
Computational problems, algorithms, runtime, hardness
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Distributed Vehicle Routing Approximation
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Urban Sensing Based on Human Mobility
Progressive Computation of The Min-Dist Optimal-Location Query
Parallel Programming By J. H. Wang May 2, 2017.
Spatial Indexing.
RE-Tree: An Efficient Index Structure for Regular Expressions
Lecture 22 Complexity and Reductions
DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia
Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data
Yi Wu 9/17/2018.
Query Processing in Databases Dr. M. Gavrilova
Spatio-temporal Pattern Queries
Spatial Online Sampling and Aggregation
On Spatial Joins in MapReduce
CPU Scheduling G.Anuradha
On Efficient Graph Substructure Selection
Bin Fu Department of Computer Science
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Coverage Approximation Algorithms
Diversified Top-k Subgraph Querying in a Large Graph
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Range-Efficient Computation of F0 over Massive Data Streams
CPS 173 Computational problems, algorithms, runtime, hardness
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
What is Computer Science About? Part 2: Algorithms
Topological Signatures For Fast Mobility Analysis
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Mining the Most Influential k-Location Set from Massive Trajectories Y. Li, J. Bao, Y. Li, Y. Zheng, Y. Wu, Z. Gong Presented by Mi Tian, Dhaval Deepark Dholakia, Deepan Ashok Sanghavi 9/21/2018

Motivating Scenarios Advertisement Placement Resource Allocation Place multiple billboards within a region To be seen by more unique people Resource Allocation Place the charging stations for EVs To serve more unique vehicles Chained Business Location Selection Collective location selection, e.g., Starbucks, KFC To sever more people 9/21/2018

The Goal Find k targets to cover the most unique trajectories Two spots to serve gas to the most cars? Two spots to attract the most customers? Two spots to best observe bird migration 9/21/2018

Term Definition Trajectory (tr): A sequence of GPS locations of a moving object through time. Location (v): A meaningful location on the map. Depends on the problem, can be a road intersection, a grid, a stay point, and etc. Spatial Network (G): A graph G = (V, E) formed by Locations and their connections. Coverage (Tr(v)): All trajectories passing a specific location. A B D C E 9/21/2018

Problem Definition Location-based Max-Cover Query Input: Output: Trajectories: T = { 𝑡𝑟 1 , 𝑡𝑟 2 , …, 𝑡𝑟 𝑛 } Road network: G={E, V} Integer k Spatial region R Output: Select k vertices in the spatial region R, which covers the maximum number of unique trajectories. Objective Minimize response time & improve system frequency 9/21/2018

Naïve Solution A B 2 D C E Question: find the k-set where k = 2? Let’s count coverage: Tr (n1) = {A, B}  2 tr Tr (n2) = {B, C, D}  3 tr Tr (n3) = {C, D, E}  3 tr How many different combos? C3 Which combo is the best? Tr (n2) + Tr (n3) = 6 is the biggest number, is it the right answer? It’s about unique trajectories A B 2 D C E 9/21/2018

Naïve Solution (cont.) A B D C E k Question: find the k-set where k = 2? Let’s count unique trajectories: Tr (n1, n2) = {A, B, C, D}  4 tr Tr (n2, n3) = {B, C, D, E}  4 tr Tr (n1, n3) = {A, B, C, D, E}  5 tr n1 & n3 is the winner, why? n2 & n3 had overlapped trajectories {C, D} Again, it’s about unique trajectories But, does this method scale? NO  Cn A B D C E k 9/21/2018

Challenges Cn Lots of computation (NP-Hard) Dynamic requests Massive trajectories Large n and k  Dynamic requests User can pick any region User can discard certain answers and ask for recalculate Software needs to be interactive (answer quickly) The goal of this paper is to Make It Fast, and Optimal for small or medium region Near optimal for large region Cn k 9/21/2018

System Overview k, region selection 9/21/2018

Pre-Processing Map-Matching Inverted Index Building Map raw trajectory data to road networks Tr={(lat1, lng1, t1), (lat2, lng2, t2), … (latn, lngn, tn)} -> {v1, v2, … vn} Trajectory-Vertex Index Inverted Index Building Vertex - Trajectory V1={trj1, trj2, … , trjn} Vertex-Vertex Index Building Shared trajectories between two vertices Spatial Indexing Index vertex location 9/21/2018

Optimal Solution Cn 11 k Compare (k = 2): Better than all other (G, G) Naïve Approach  Group-based Pruning Approach Group the vercies Estimate coverage upper bound for every (G, G) combination Sort (G, G) by upper bound (high to low) For each (G, G) Find best k-set vertices with most unique coverage Could Stop Early! Compare (k = 2): Naïve: C (9, 2) = 36 Group-base Pruning: 3 * 3 (G) + 3 * 3 (V) = 15 (reduced!) Cn k Better than all other (G, G) 11 9/21/2018

Greedy Solution: Overview Optimal solution Not scale to large k and spatial region R Efficient when we need multiple round interactions from users. Greedy solution A good approximation to optimal solution (1-1/e) approximation Idea is very simple Main algorithm framework First Step: Select a set of candidate vertices in spatial region R A k-round selection process Selection  the vertex covers most un-covered trajectories Updating  remove the covered trajectories 9/21/2018

Greedy Solution: Basic Algorithm E.g., k = 2 Current selection: { 𝑣 1 } { 𝑣 1 , 𝑣 4 } Current covered trajectories: { 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 } 𝑡𝑟 7 , 𝑡𝑟 8 , 𝑡𝑟 9 } 𝑣 1 : 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 𝑣 2 : 𝑡𝑟 1 , 𝑡𝑟 2 , 𝑡𝑟 3 , 𝑡𝑟 8 , 𝑡𝑟 10 𝑣 3 : 𝑡𝑟 4 , 𝑡𝑟 5 , 𝑡𝑟 6 , 𝑡𝑟 7 , 𝑡𝑟 9 𝑣 4 : 𝑡𝑟 1 , 𝑡𝑟 7 , 𝑡𝑟 8 , 𝑡𝑟 9 𝑣 5 : 𝑡𝑟 1 , 𝑡𝑟 3 , 𝑡𝑟 5 𝑣 6 : 𝑡𝑟 2 , 𝑡𝑟 4 , 𝑡𝑟 6 𝑡𝑟 1 : 𝑣 1 , 𝑣 2 , 𝑣 4 , 𝑣 5 𝑡𝑟 2 : 𝑣 1 , 𝑣 2 , 𝑣 6 𝑡𝑟 3 : 𝑣 1 , 𝑣 2 , 𝑣 5 𝑡𝑟 4 : 𝑣 1 , 𝑣 3 , 𝑣 6 𝑡𝑟 5 : 𝑣 1 , 𝑣 3 , 𝑣 5 𝑡𝑟 6 : 𝑣 1 , 𝑣 3 , 𝑣 6 𝑣 1  6 𝑣 2  5 𝑣 3  5 𝑣 4  4 𝑣 5  3 𝑣 6  3 𝑣 1  6 𝑣 2  3 𝑣 3  5 𝑣 4  3 𝑣 5  2 𝑣 6  2 𝑣 1  6 𝑣 2  4 𝑣 3  5 𝑣 4  3 𝑣 5  2 𝑣 6  3 𝑣 1  6 𝑣 2  2 𝑣 3  2 𝑣 4  3 𝑣 5  0 𝑣 6  0 Vertex coverage state Vertex coverage count Trajectories-vertex index 9/21/2018

Greedy Solution: Basic Algorithm Performance analysis The dominant cost is updating phase Scan the trajectory-vertex index “one-by-one” Can not scale to large trajectory dataset 𝑣 1  6 𝑣 2  5 𝑣 3  5 𝑣 4  4 𝑣 5  3 𝑣 6  3 𝑣 1  6 𝑣 2  2 𝑣 3  2 𝑣 4  3 𝑣 5  0 𝑣 6  0 Trajectory-vertex index Is it possible to update the coverage of each node by batch ? 9/21/2018

Greedy Solution : Partition Index Batch Updating Algorithm Main intuition To minimize the trajectory scan operation Update the coverage values by batch Main Techniques Smart Update Decision Index Partition Workload-based Optimization 9/21/2018

Greedy Solution : Partition Index Batch Updating Algorithm In this way, we always scan less number of trajectories for updating. Smart Update Decision Utilize vertex-vertex index Two cases Case 1: Major Coverage Overlap – Apply the basic updating method Case 2: Minor Coverage Overlap – Subtract and add back Vertex Coverage table Trajectory-vertex Index Vertex Coverage table Scan tr4 and tr5 to update 9/21/2018

Greedy Solution : Partition Index Batch Updating Algorithm Index Partition 9/21/2018

Greedy Solution : Partition Index Batch Updating Algorithm Index Partition Why and How? 9/21/2018

Greedy Solution : Partition Index Batch Updating Algorithm Workload-based Optimization Selective Indexing To store vertex-vertex index takes |V| * |V| space, (Tianjing Road has 900k vertices) To make partitions p -> p*|V|*|V| (impossible to store in the memory) Workload-based Partition Not possible to cluster the trajectories based on similarities (|N|*|N| similarity computing) Observation Many vertices are selected in a sequence Many of the vertices will not be selected 9/21/2018

Experiments Dataset Road Networks (Tianjing) Trajectories (Taxies) 99,007 vertices and 133,726 road segments covers a 123 × 187 km2 spatial region with a total length of 32,487 km Trajectories (Taxies) 3,501 taxicabs from Tianjin in 61 days. It contains 4,509,519 trajectories average sampling rate is 24:05 seconds per point 9/21/2018

Trajectory Distributions in Tianjin 9/21/2018

Ubuntu 12.04 Machine Intel Core 6-Cores (12-Threads) i7-3930K 3.2GHz and 16GBytes of main memory 9/21/2018

Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm scanned trajectories (bars) and the processing time (lines) for vertex selection iteration. Aim: Mine 10-location set PIBU is 5.02 times faster Basic Algorithm : 905, 623 trajectories PIBU : 87, 330 trajectories 9/21/2018

Basic Updating Algorithm vs Partition Index Batch Updating (or PIBU) Algorithm the processing time (lines) and the total scanned trajectories (bars) for the two approaches, with different k values PIBU is 3.9 times faster 9/21/2018

Processing time (lines) and the number scanned trajectories (bars) versus the query region sizes PIBU 3.8 times faster Processing time (lines) and the number scanned trajectories (bars) by varying the size of trajectories datasets PIBU 3.2 times faster 9/21/2018

Case Study: Advertisement Placement Task: Put three billboards in New York City (NYC) for promotion Dataset: Location based social networking check ins dataset Divided the city into equal sized grids (a) graph : multiple check ins by same users (b) graph : Overlapped users in selected areas (c0 graph : Result of paper solution 9/21/2018

Case Study: Charging Station Placement Aim domain constraints: Space for parking POI categories locations Two location should be far enough 9/21/2018

Conclusion Most influential k-location set mining problem Covers Optimal and Approximate solutions Optimal solution works of small regions and k values Approximate works better on large regions and k values 2 case studies: Billboard placement in NYC based on location-based social network data EV charging station placement in Beijing 9/21/2018

Further Extensions Weighted Location Weighted Trajectories What if the prices for selecting the locations are different Weighted Trajectories What if each person (trajectory) has different profile, and you want to make advertisement for different items Spatio-temporal Selection What about the bar, night clubs? They just care about people travel at night.. 9/21/2018

Thanks Q A input hidden output 9/21/2018