DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

Slides:



Advertisements
Similar presentations
A Minimum Cost Path Search Algorithm Through Tile Obstacles Zhaoyun Xing and Russell Kao Sun Microsystems Laboratories.
Advertisements

AI Pathfinding Representing the Search Space
Social network partition Presenter: Xiaofei Cao Partick Berg.
Frequent Closed Pattern Search By Row and Feature Enumeration
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
UnInformed Search What to do when you don’t know anything.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
TCSS 343, version 1.1 Algorithms, Design and Analysis Transform and Conquer Algorithms Presorting HeapSort.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Time-Variant Spatial Network Model Vijay Gandhi, Betsy George (Group : G04) Group Project Overview of Database Research Fall 2006.
Backtracking.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
Busby, Dodge, Fleming, and Negrusa. Backtracking Algorithm Is used to solve problems for which a sequence of objects is to be selected from a set such.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Identifying Reversible Functions From an ROBDD Adam MacDonald.
WAES 3308 Numerical Methods for AI
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Search by partial solutions.  nodes are partial or complete states  graphs are DAGs (may be trees) source (root) is empty state sinks (leaves) are complete.
A Study of Balanced Search Trees: Brainstorming a New Balanced Search Tree Anthony Kim, 2005 Computer Systems Research.
COMP261 Lecture 6 Dijkstra’s Algorithm. Connectedness Is this graph connected or not? A Z FF C M N B Y BB S P DDGG AA R F G J L EE CC Q O V D T H W E.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
CS 206 Introduction to Computer Science II 11 / 16 / 2009 Instructor: Michael Eckmann.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Billy Timlen Mentor: Imran Saleemi.  Goal: Have an optimal matching  Given: List of key-points in each image/frame, Matrix of weights between nodes.
1 Utilizing Shared Vehicle Trajectories for Data Forwarding in Vehicular Networks IEEE INFOCOM MINI-CONFERENCE Fulong Xu, Shuo Gu, Jaehoon Jeong, Yu Gu,
Ricochet Robots Mitch Powell Daniel Tilgner. Abstract Ricochet robots is a board game created in Germany in A player is given 30 seconds to find.
CS 3343: Analysis of Algorithms Lecture 19: Introduction to Greedy Algorithms.
Adversarial Search 2 (Game Playing)
Beard & McLain, “Small Unmanned Aircraft,” Princeton University Press, 2012, Chapter 12: Slide 1 Chapter 12 Path Planning.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Adversarial Search and Game-Playing
Presented by: Mi Tian, Deepan Sanghavi, Dhaval Dholakia
Greedy Algorithms.
Data Structures and Algorithms
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
The minimum cost flow problem
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
Surviving Holes and Barriers in Geographic Data Reporting for
Mining the Most Influential k-Location Set from Massive Trajectories
Introduction to Operations Research
Multi - Way Number Partitioning
Spatio-temporal Pattern Queries
Multi-Core Parallel Routing
Finding Heuristics Using Abstraction
Network Optimization Research Laboratory
CS 3343: Analysis of Algorithms
Joining Interval Data in Relational Databases
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Finding Fastest Paths on A Road Network with Speed Patterns
Unit 4: Dynamic Programming
What to do when you don’t know anything know nothing
Outline This topic covers Prim’s algorithm:
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Backtracking and Branch-and-Bound
EE5900 Advanced Embedded System For Smart Infrastructure
Computational Advertising and
The Rich/Knight Implementation
Analysis and design of algorithm
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Under a Concurrent and Hierarchical Scheme
The Rich/Knight Implementation
Presentation transcript:

DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia A Traffic Flow Approach to Early Detection of Gathering Events X. Zhou, A. V. Khezerlou, A. Liu, Z. Shafiq, F. Zhang DS595/CS525 Team#2 - Mi Tian, Deepan Sanghavi, Dhaval Dholakia

Motivation Why detecting gathering events is IMPORTANT? 2

Challenge Why detecting gathering event is DIFFICULT? Many candidate gathering footprints Need to balance result quality and computation time 3

Outline Problem Formulation Computational Solution Case Study and Experimental Evaluations 4

Problem Formulation 5

Traffic Flow Spatial field “s” Directed edge e=(si,sj) where si and sj are adjacent grids. Observed traffic flow (Ce) Baseline traffic blow (Be) What is a abnormal flow (EBP model) 6

LLR EBP Test maximizes the likelihood ratio between H0 and H1 LLR Significant Flow Lemma 1 7

Definitions Shortest Path constraint Most likely destination? 8

G-Score 9

G-Graph k- dominant G-Graph set 10

Problem Definition 11

Computational Solution

Brute-Force Algorithm Significant flows are identified Construct G- graphs at each grid Significant flow are fetched for each root Find most likely path Calculate G-Score Sorted and Scanned Top k-graphs are reported Disadvantage: G-graph is suppose to be created only when one significant flow exist Costly exhaustive search No ability to prune candidate G- graphs

Smart Edge Algorithm To address the mentioned computational bottlenecks, we present a new algorithm SmartEdge with three design decisions for better computational efficiency: Candidate Root Filter Build G-graphs with dynamic processing G-graphs Pruning G-score Upper bound G-graphs Pruning Strategy

Candidate Root Filter Filter locations with no significant flow Candidate Root Index (CRI) data structure (hash table) Number of significant flows are stored in bins as required to calculate upper bound values Total number of significant flows in each bin and near the root are calculated When Esig is identified, find the roots of that flow and update the counter in CRI Roots with no values are filtered

Generate G-Graph For a given root and a list of significant flows nearby (fetched from the Candidate Root Index), the algorithm picks each significant flow and traverse all the grids in the rectangular area bounded by the root and this significant flow in a breadth-first manner The most likely path to the root from every grid is calculated until the significant flow is reached. After finding the most likely path, all the flows along this path will be added to the G-Graph

G-Score Upper Bound Ne(r) -> upper bound of insignificant flows LLR(eins) -> Maximum LLR score of insignificant flows Calculating LLR(eins) Calculating Ne(r)

G-Graph Pruning Higher G-score means higher number of significant flows Sorted in descending order based on number of significant flows G-score upper bound is calculated and compared with lowest value in Priority Queue If higher than Priority queue value, then it is pruned or actual G-score is calculated and compared. A particular node is not added until we can verify that it is not dominated by any other node This is done by recursively calling all the roots

Evaluations

Case Study Dataset Data Processing Run Algorithm 10,000 taxis in Shenzhen 31 days in August 2013 128x64 grids Data Processing Traffic flow: taxi GPS data pass two neighbor grids Baseline flow: average monthly flow crossing same boundary Separate baselines for weekends and weekdays Run Algorithm Find top 5 gathering event for every 10-minute interval

Result Pop music concert @ 8PM, August 16th Friday Two waves Direction information G = 329 G = 459 G = 493 G = 632 G = 634 G = 506 G = 717

Performance Experiment Same as case study For entire month of August 2013 Report average CPU time Conditions Brute Force CRF: candidate root filtering CRF+DP: add G-Graph building with Dynamic Programming CRF+DP+GPR: add G-Graph Pruning

Number of Grids From 16x32 to 64x128 SmartEdge sublinear increase, because it filters impossible roots and G-Graphs Brute-Force linear Best 50% time reduce .

Distance Threshold From 500m (1) to 4.5km (9) Brute-Force and CRF exponential growth, because they list all possible paths when generate G-Graph DP G-Graph building reduce to super-linear Best 49% time reduce When the distance threshold increase, the algorithm need to consider a bigger range for every G-Graph. As shown in the picture, from 500 meter to 4.5 kilometer, the Brute Force method and SmartEdge without optimization grow exponentially. The dynamic programming and G-Graph pruning were able to reduce this significantly to sub-linear. The best condition can still reduce time by 50% compared to Brute Force.

Result Size k From k = 1 to 10 Increase k only impact CPF+DP+GPR, because it uses top-k list to do pruning GPR no advantage for big k Best ~50% time reduce

P-Value Threshold From 0.01% to 0.1% More total number of significant flows All linear increase, but DP and GPR grow slower Best 52% time reduce

Conclusion An computationally effective solution to the problem of “early detection of gathering events” Offer gathering directions in addition to destinations G-Graph, G-Score Smart Edge algorithm with two optimizations 50% time reduce over Brute-Force approach

Questions