High-Performance Analytics on Large-Scale GPS Taxi Trip Records in NYC

Slides:

Advertisements

Similar presentations

An Interactive-Voting Based Map Matching Algorithm

Advertisements

Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs Visiting Faculty: Jianting Zhang, The City College of New York.

Smarter Outlier Detection and Deeper Understanding of Large-Scale Taxi Trip Records: A Case Study of NYC Jianting Zhang Department of Computer Science.

Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.

Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.

Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)

Data Parallel Quadtree Indexing and Spatial Query Processing of Complex Polygon Data on GPUs Jianting Zhang 1,2 Simin You 2, Le Gruenwald 3 1 Depart of.

T-Drive : Driving Directions Based on Taxi Trajectories Microsoft Research Asia University of North Texas Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie,

Distance Indexing on Road Networks A summary Andrew Chiang CS 4440.

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga.

TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.

Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.

Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Universität Stuttgart Institute of Parallel and Distributed Systems (IPVS) Universitätsstraße 38 D Stuttgart Scalable Processing of Trajectory-Based.

Large-scale Deep Unsupervised Learning using Graphics Processors

Presenter: Mathias Jahnke Authors: M. Zhang, M. Mustafa, F. Schimandl*, and L. Meng Department of Cartography, TU München *Chair of Traffic Engineering.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Analytical Queries on Road Networks: An Experimental Evaluation of Two System Architectures Shangfu PengHanan Samet Department.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

Presented by: Dardan Xhymshiti Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2 Sean Chester*Darius Sidlauskas`Ira Assent*Kenneth.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.

Overview Issues in Mobile Databases – Data management – Transaction management Mobile Databases and Information Retrieval.

A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.

Managing Massive Trajectories on the Cloud

Lynn Choi School of Electrical Engineering

T-Share: A Large-Scale Dynamic Taxi Ridesharing Service

Pagerank and Betweenness centrality on Big Taxi Trajectory Graph

Computer Network Topologies

Super Computing By RIsaj t r S3 ece, roll 50.

Constructing a system with multiple computers or processors

Real-Time Ray Tracing Stefan Popov.

Final Project – Anomalies Detection

Yi Wu 9/17/2018.

Towards GPU-Accelerated Web-GIS

Ray-Cast Rendering in VTK-m

Mining the Most Influential k-Location Set from Massive Trajectories

Jianting Zhang Department of Computer Science

Computer-Generated Force Acceleration using GPUs: Next Steps

Jianting Zhang City College of New York

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Faster File matching using GPGPU’s Deephan Mohan Professor: Dr

Predicting Traffic Dmitriy Bespalov.

CLUSTER COMPUTING.

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Hybrid Programming with OpenMP and MPI

Showcasing work by Jing Yuan, Yu Zheng, Xing Xie, Guangzhou Sun

Outline Summary an Future Work Introduction

Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data: A Case study of NYC Taxi Trip Records Jianting Zhang1,2.

Multicore and GPU Programming

Jianting Zhang1,2 Simin You2, Le Gruenwald3

Integrating Geospatial Technologies into Higher Education

Multicore and GPU Programming

Accelerating Regular Path Queries using FPGA

Jianting Zhang1,2,4, Le Gruenwald3

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

High-Performance Analytics on Large-Scale GPS Taxi Trip Records in NYC Jianting Zhang Department of Computer Science The City College of New York

Outline Background and Motivation Parallel Taxi data management on GPUs Efficient shortest path computation and applications

Background and Motivation Taxicabs 13,000 Medallion taxi cabs License priced at $600, 000 in 2007 Car services and taxi services are separate Taxi trip records ~170 million trips (300 million passengers) in 2009 1/5 of that of subway riders and 1/3 of that of bus riders in NYC 3 3

Background and Motivation Over all distributions of trip distance, time, speed and fare (2009)

Background and Motivation Other types of Origin-Destination (OD) Data Social network activities Cellular phone call logs 5 5

Background and Motivation How to manage OD data? Geographical Information System (GIS) Spatial Databases (SDB) Moving Object Databases (MOD) How good are they? Pretty good for small amount of data  But, rather poor for large-scale data 

Background and Motivation Example 1: Loading 170 million taxi pickup locations into PostgreSQL UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326); 105.8 hours! Example 2: Finding the nearest tax blocks for 170 million taxi pickup locations using open source libspatiaindex+GDAL 30.5 hours! Dual Quad-Core Intel Xeon 2.26G processors with 48G memory I do not have time to wait... Can we do better?

Background and Motivation Multicore CPUs Cloud computing+MapReduce+Hadoop GPGPU Computing: From Fermi to Kepler

Nvidia GTX Titan GPU being sold at Amazon http://www.amazon.com/EVGA-SuperClocked-Dual-Link-Graphics-06G-P4-2791-KR/dp/B00BL8BX7O Nvidia GTX Titan GPU being sold at Amazon Suggested retail prices $999 4.5 Teraflops of single precision and 1.3 Teraflops of double precision 288.4 GB/s memory bandwidth (10+X higher than multi-core CPUs)

ASCI Red: 1997 First 1 Teraflops (sustained) system with 9298 Intel Pentium II Xeon processors (in 72 Cabinets) $$$? Space? Power? Feb. 2013 7.1 billion transistors (551mm²) 2,688 processors Max bandwidth 288.4 GB/s PCI-E peripheral device 250 W (17.98 GFLOPS/W -SP) Suggested retail price: $999 What can we do today using a device that is more powerful than ASCI Red 16 years ago?

Background and Motivation The goal is to design a data management system to efficiently manage large-scale OD/trajectory data on massively data parallel GPUs With the help of new data models, data structures and algorithms To cut the runtimes from hours to seconds on a single commodity GPU device And support interactive queries and visual explorations

Outline Background and Motivation Parallel Taxi/Trajectory data management on GPUs Efficient shortest path computation and applications

O-D data management on GPUs Day Physical Data Layout Month Raw data Year Compression, aggregation and indexing Spatial Joins and Shortest Path Computation (Zhang, Gong, Kamga and Gruenwald, 2012)

O-D data management on GPUs 8 9 7 Trip_Pickup_Location Trip_Dropoff_Location Start_Zip_Code End_Zip_Code time_between_service distance_between_service start_x start_y end_x end_y (local projection) 3 Start_Lon Start_Lat End_Lon End_Lat Trip_Pickup_DateTime Trip_Dropoff_DateTime 2 5 Passenger_Count Fare_Amt Tolls_Amt Tip_Amt Medallion# Shift# Trip# Trip_Time Trip_Distance 6 4 1 Payment_Type Surcharge Total_Amt Rate_Code 10 vendor_name date_loaded store_and_forward 11 (Zhang, Gong, Kamga and Gruenwald, 2012)

O-D data management on GPUs Year City Week of the Year Borough Top level grid Month Day of the Year Community District Tax Lot Census Tract Day of the Week Police Precinct Day Hour Level k grid Census Block Tax Block Peak/ off-peak 15/30-minutes Street Segment Level 0 grid Pickup/drop-off locations Pickup/drop-off timestamps NYC taxi trip records Auxiliary data (weather, events…) (Zhang, You and Gruenwald, 2012)

O-D data management on GPUs (Zhang, Gong, Kamga and Gruenwald, 2012)

O-D data management on GPUs P2P-T P2N-D P2P-D (Zhang, You and Gruenwald, 2012a)

O-D data management on GPUs Single-Level Grid-File based Spatial Filtering Vertices (polygon/polyline) Points Perfect coalesced memory accesses Utilizing GPU floating point power Nested-Loop based Refinement (Zhang, You and Gruenwald, 2012a)

O-D data management on GPUs Taxi trip records: 300 million in two years (2008-2010), ~170 million in 2009 (~150 million in Manhattan) NYC DCPLION street network data: 147,011 street segments NYC Census 2000 blocks: 38,794 NYC MapPluto Tax blocks: 735,488 in four boroughs (excluding SI) and 43,252 in Manhattan Hardware Dell T5400 Dual Quadcore CPUs with 16 GB memory Nvidia Quadro 6000 with 448 cores and 6 GB memory

O-D data management on GPUs Top: grid size =256*256 resolution=128 feet Right: grid size =8192*8192 resolution=4 feet Spatial Aggregation 9,424 /326=30X (8192*8192) Temporal Aggregation 1709/198=8.6X (minute) 1598 /165 = 9.7X (hour) (Zhang, You and Gruenwald, 2012)

O-D data management on GPUs 38,794 census blocks (470,941 points) 735,488 tax blocks (4,698,986 points) 147,011 street segments P2N-D P2P-T P2P-D P2N-D P2P-T P2P-D - 15.2 h 30.5 h 10.9 s 11.2 s 33.1 s 4,900X 3,200X Algorithmic improvement: 3.7X Using main-memory: 37.4X Parallel Acceleration: 24.3X CPU time GPU Time Speedup (Zhang, You and Gruenwald, 2012a), (Zhang and You 2012a) Zhang and You 2012b)

Trajectory data management on GPUs (Zhang, You and Gruenwald, 2012b) T-Drive (Microsoft Research Asia): 10,357 taxis in a week in Beijing (15 million points) Complete GPS traces GeoLife (Microsoft Research Asia) 178 users, 17,621trajectories, 23,667,828 points “Walk” mode in Beijing: 3,245 trajectories with 1,440,823 points After removing “too large” trajectories 2341 trajectories Hausdorff Distance between two point sets Results: GPU Implementation using Thrust library (47.25 ms) CPU Implementation using STL library (4,110ms)87X speedup Results: 25-40X overall speedups

Outline Background and Motivation Parallel Taxi data management on GPUs Efficient shortest path computation and applications Mapping Betweenness Centralities Outlier Detection

SP computation and applications -overview Shortest path computation Dijkstra and A* New generation algorithms Contraction Hierarchy (CH) based (Sommer 2012) Regular edge 61 57 22 Shortcut 61 23 Contracted node Original: 34,36,57,61,22,23,53,50,11,60 (9 hops) Using shortcuts: 34,36,57,23,53,11,60 (6 hops)

SP computation and applications Network Centrality (Brandes, 2008) Node based Edge based Can be easily derived after shortest paths are computed Mapping node/edge between centrality can reveal the connection strengths among different parts of cities

SP computation and applications Mapping Betweenness Centralities 166 million trips 25 million unique after matching point locations to road segments/intersections Shortest path computation completes in less than 2 hours (5,952 seconds) on a single CPU core (2.0 GHZ) 4200 pairs computation per second 3 orders of magnitude faster than ArcGIS NA Mapping of Computed Shortest Paths Overlaid with NAVSTREET road network and NYC Community Districts Map

SP computation and applications Mapping Betweenness Centralities (All hours)

SP computation and applications 00H 02H 04H 06H 08H 10H 12H 14H 16H 18H 20H 22H Legend: Mapping Betweenness Centralities (bi-hourly)

SP computation and applications The data is not as clean as we had thought…outlier detection 8 9 7 Trip_Pickup_Location Trip_Dropoff_Location Start_Zip_Code End_Zip_Code time_between_service distance_between_service start_x start_y end_x end_y (local projection) 3 Start_Lon Start_Lat End_Lon End_Lat Trip_Pickup_DateTime Trip_Dropoff_DateTime 2 5 Passenger_Count Fare_Amt Tolls_Amt Tip_Amt Medallion# Shift# Trip# Trip_Time Trip_Distance 6 4 1 Meshed up on purpose due to privacy concerns Payment_Type Surcharge Total_Amt Rate_Code 10 vendor_name date_loaded store_and_forward 11

SP computation and applications The data is not as clean as we had thought…outlier detection In addition: Some of the data fields are empty Pickup and drop-off locations can be in Hudson River The recorded trip distance/duration can be unreasonable ... Outlier detections for data cleaning are needed

SP computation and applications Outlier Detection Existing approaches for outlier detection for urban computing Thresholding: e.g. 200m < dist < 30km Locating in unusual ranges of distributions Spatial analysis: within a region or a land use type Matching trajectory with road segments – treat unmatched ones as outliers Some techniques require complete GPS traces while we only have O-D locations Large-scale shortest path computing has not been used for outlier detection

SP computation and applications Raw Taxi trip data Type I outlier (spatial analysis) Match pickup/drop-off point locations to street segments within Distance D0 Successful? Aggregate unique (sid,tid) pairs Assign pickup/drop-off nodes by picking closer ones Type II outlier (network analysis) Compute shortest path CD >D1 AND CD>W*RD? CD: Compute shortest distance RD: Recorded trip distance Update centrality measurements

SP computation and applications -Discussion on outlier detection The approach is approximate in nature Taxi drivers do not always follow shortest path Especially for short trips and heavily congested areas But we only care about aggregated results and the errors have a chance to be cancelled out by each other Increasing D0 will reduce # of type I outliners, but the locations might be mismatched with segments Reducing D1 and/or W will increase # of type II outliers but may generate false positives.

SP computation and applications -Results of outlier detection Examples of Detected Type II Outliers D0=200 feet, D1=3 miles, W=2 ~2.5 millions (1.5%) type I outliers ~ 18,000 type II outliers

Ongoing Research @ GeoTECI CudaGIS - a general purposed GIS on GPUs More GIS modules in addition to indexing/query processing Data parallel designs make it easy to port to Intel MICs Will test its scalability on ONRL TITAN and CUNY HPCC supercomputers Trajectory data management on GPUs Online moving point location updating Segmentation/simplification/compression Map matching with road networks Aggregation and warehousing Indexing and query processing (Similarity join… ) Data mining (moving cluster, convoy, swarm...)

Q&A http://www-cs.ccny.cuny.edu/~jzhang/ jzhang@cs.ccny.cuny.edu