U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga.

Slides:



Advertisements
Similar presentations
An Interactive-Voting Based Map Matching Algorithm
Advertisements

GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs Visiting Faculty: Jianting Zhang, The City College of New York.
Smarter Outlier Detection and Deeper Understanding of Large-Scale Taxi Trip Records: A Case Study of NYC Jianting Zhang Department of Computer Science.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
Data Parallel Quadtree Indexing and Spatial Query Processing of Complex Polygon Data on GPUs Jianting Zhang 1,2 Simin You 2, Le Gruenwald 3 1 Depart of.
IBM TJ Watson Research Center © 2010 IBM Corporation – All Rights Reserved AFRL 2010 Anand Ranganathan Role of Stream Processing in Ad-Hoc Networks Where.
GPU Computing with CUDA as a focus Christie Donovan.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Towards.
ICPCA 2008 Research of architecture for digital campus LBS in Pervasive Computing Environment 1.
1 Vehicular Sensor Networks for Traffic Monitoring In proceedings of 17th International Conference on Computer Communications and Networks (ICCCN 2008)
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
Computer Graphics Graphics Hardware
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Accelerating Simulation of Agent-Based Models on Heterogeneous Architectures.
Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.
Computers in Urban Planning Computational aids – implementation of mathematical models, statistical analyses Data handling & intelligent maps – GIS (Geographic.
Utilizing Multi-threading, Parallel Processing, and Memory Management Techniques to Improve Transportation Model Performance Jim Lam Andres Rabinowicz.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding.
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
GPU Architecture and Programming
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Adam Wagner Kevin Forbes. Motivation  Take advantage of GPU architecture for highly parallel data-intensive application  Enhance image segmentation.
Travel Implications of MetroFuture Growth Scenarios Jie Xia (MCP1), Jingsi Xu (MCP2) Prof. Joseph Jr. Ferreira 05/13/ / Spatial Database.
GPUs – Graphics Processing Units Applications in Graphics Processing and Beyond COSC 3P93 – Parallel ComputingMatt Peskett.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.
AegisDB: Integrated realtime geo-stream processing and monitoring system Chengyang Zhang Computer Science Department University of North Texas.
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, and Gagan Agrawal Dept. of Computer.
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
NFV Group Report --Network Functions Virtualization LIU XU →
Accelerated B.S./M.S An approved Accelerated BS/MS program allows an undergraduate student to take up to 6 graduate level credits as an undergraduate.
Presented by: Omar Alqahtani Fall 2016
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Jianting Zhang Department of Computer Science
Computer-Generated Force Acceleration using GPUs: Next Steps
Jianting Zhang City College of New York
High-Performance Analytics on Large-Scale GPS Taxi Trip Records in NYC
Outline Summary an Future Work Introduction
Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data: A Case study of NYC Taxi Trip Records Jianting Zhang1,2.
Jianting Zhang1,2 Simin You2, Le Gruenwald3
Integrating Geospatial Technologies into Higher Education
Accelerating Regular Path Queries using FPGA
Jianting Zhang1,2,4, Le Gruenwald3
Presentation transcript:

U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga 24, Le Gruenwald 5 1 CUNY City College (CCNY), 2 CUNY Hunter College 3 CUNY Graduate Center 4 University Transportation Research Center Region II, 5 University of Oklahoma

Outline Introduction & Background System Architecture and Implementation Time Segmented Column-Oriented Data Layout Efficient Spatial -Temporal Aggregations Spatial Join with Infrastructural Data Case Studies and Performance Evaluations Conclusion and Future Work

Introduction 3 Ubiquitous Urban Sensing Origin-Destination Data (U 2 SOD) Taxi trips Cellular phone calls Social network activities

Introduction What do they have in common? –produced and collected by end users using commodity sensing devices and are rich in data volumes in urban areas –special type of spatial-temporal data –the intermediate locations between origins and destinations are either unavailable, inaccessible or unimportant –can be more effective to help understand the real dynamic of urban areas with respect to spatial/temporal resolutions and representativeness.

Introduction How to manage U 2 SOD data? –Geographical Information System (GIS) –Spatial Databases (SDB) –Moving Object Databases (MOD) How good are they? –Pretty good for small amount of data –But, rather poor for large-scale data 

Introduction Example 1: –Loading 170 million taxi pickup locations into PostgreSQL –UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326); –105.8 hours! Example 2: –Finding the nearest tax blocks for 170 million taxi pickup locations using open source libspatiaindex+GDAL –30.5 hours! I do not have time to wait... Can we do better?

Introduction Cloud computing+MapReduce+Hadoop Multicore CPUs GPGPU Computing: From Fermi to Kepler

The combination of architectural and organizational enhancements lead to 16 years of sustained growth in performance at an annual rate of 50% from 1986 to However, due to the combined power, memory and instruction-level parallelism problem, the growth rate has dropped to about 20% per year from 2002 to 2006 On the other hand, the growth in performance for GPU remains 50% per year. Quadro 6000 $4000 $500 $2500 Nvidia GTX 690: 3072 core (915 MHZ), 4GB GDDR5 memory, 384 GB/s bandwidth; under $1,000

Introduction So, the goal is to design a data management system to efficiently manage large-scale U 2 SOD data on massively data parallel GPUs And cut the runtimes from hours to seconds on a single commodity GPU device With the help of new data models, data structures and algorithms

System Design and Implementation Spatial Joins and Shortest Path Computation Day Month Year Raw data Compression, aggregation and indexing Physical Data Layout U 2 SOD-DB

System Design and Implementation Medallion# Shift# Trip# Trip_Pickup_DateTime Trip_Dropoff_DateTime Trip_Pickup_Location Trip_Dropoff_Location Start_Lon Start_Lat End_Lon End_Lat Payment_Type Surcharge Total_Amt Rate_Code Passenger_Count Fare_Amt Tolls_Amt Tip_Amt Trip_Time Trip_Distance vendor_name date_loaded store_and_forward time_between_service distance_between_service Start_Zip_Code End_Zip_Code start_x start_y end_x end_y (local projection)

System Design and Implementation Year Month Day Hour Day of the Year Week of the Year Day of the Week City Borough Community District Police Precinct Census Tract Census Block Street Segment Tax Lot Tax Block Pickup/drop-off locations Level 0 grid Level k grid Top level grid 15/30- minutes Pickup/drop-off timestamps NYC taxi trip records Peak/ off-peak Auxiliary data (weather, events…)

System Design and Implementation

P2P - T P2N-D P2P - D The three types of spatial joins are now supported by U 2 SOD-DB completely on GPUs with signficant speedups.

Case Studies and Performance Evaluations Data –Taxi trip records: 300 million in two years ( ), ~170 million in 2009 (~150 million in Manhattan) –NYC DCPLION street network data: 147,011 street segments –NYC Census 2000 blocks: 38,794 –NYC MapPluto Tax blocks: 735,488 in four boroughs (excluding SI) and 43,252 in Manhattan Hardware –Dell T5400 Dual Quadcore CPUs with 16 GB memory –Nvidia Quadro 6000 with 448 cores and 6 GB memory

Case Studies and Performance Evaluations Top: grid size =256*256 resolution=128 feet Right: grid size =8192*8192 resolution=4 feet Spatial Aggregation 9,424 /326=30X (8192*8192) Temporal Aggregation 1709/198=8.6X (minute) 1598 /165 = 9.7X (hour)

Case Studies and Performance Evaluations T-Drive dataset: 17,762,489 GPS point locations; milliseconds for aggregation (4,110 ms on CPU) using STL  87X speedup

Case Studies and Performance Evaluations P2P - T P2N-DP2P - D 147,011 street segments 38,794 census blocks ( points) 735,488 tax blocks (4,698,986 points) hours30.5 hours 10.9 seconds11.2 seconds33.1 seconds -4,900X3,200X CPU time GPU Time Speedup

Conclusion and Future Work We reported our design and implementation of U 2 SOD- DB, a column-oriented, GPU-accelerated, in-memory data management system targeted at large-scale ubiquitous urban sensing origin-destination data Experiments have demonstrated signficant speedups over serial CPU implementations in main-memory ( X) and traditional disk-resident systems ( X) for processing 170 million taxi trip records and their spatial joins with various types of urban infrastructure data

Conclusion and Future Work Extend U 2 SOD-DB to handle other types of OD data as well as trajectory data Further improve the performance by designing and implementing more efficient data structures and algorithms on GPUs Apply U 2 SOD-DB to in-depth analysis of trip purposes and urban dynamics in NYC by collaborating with transportation researchers, and urban geographers.