Presentation is loading. Please wait.

Presentation is loading. Please wait.

U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga.

Similar presentations


Presentation on theme: "U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga."— Presentation transcript:

1 U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga 24, Le Gruenwald 5 1 CUNY City College (CCNY), 2 CUNY Hunter College 3 CUNY Graduate Center 4 University Transportation Research Center Region II, 5 University of Oklahoma

2 Outline Introduction & Background System Architecture and Implementation Time Segmented Column-Oriented Data Layout Efficient Spatial -Temporal Aggregations Spatial Join with Infrastructural Data Case Studies and Performance Evaluations Conclusion and Future Work

3 Introduction 3 Ubiquitous Urban Sensing Origin-Destination Data (U 2 SOD) Taxi trips Cellular phone calls Social network activities

4 Introduction What do they have in common? –produced and collected by end users using commodity sensing devices and are rich in data volumes in urban areas –special type of spatial-temporal data –the intermediate locations between origins and destinations are either unavailable, inaccessible or unimportant –can be more effective to help understand the real dynamic of urban areas with respect to spatial/temporal resolutions and representativeness.

5 Introduction How to manage U 2 SOD data? –Geographical Information System (GIS) –Spatial Databases (SDB) –Moving Object Databases (MOD) How good are they? –Pretty good for small amount of data –But, rather poor for large-scale data 

6 Introduction Example 1: –Loading 170 million taxi pickup locations into PostgreSQL –UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326); –105.8 hours! Example 2: –Finding the nearest tax blocks for 170 million taxi pickup locations using open source libspatiaindex+GDAL –30.5 hours! I do not have time to wait... Can we do better?

7 Introduction Cloud computing+MapReduce+Hadoop Multicore CPUs GPGPU Computing: From Fermi to Kepler

8 The combination of architectural and organizational enhancements lead to 16 years of sustained growth in performance at an annual rate of 50% from 1986 to 2002. However, due to the combined power, memory and instruction-level parallelism problem, the growth rate has dropped to about 20% per year from 2002 to 2006 On the other hand, the growth in performance for GPU remains 50% per year. Quadro 6000 $4000 $500 $2500 Nvidia GTX 690: 3072 core (915 MHZ), 4GB GDDR5 memory, 384 GB/s bandwidth; under $1,000

9 Introduction So, the goal is to design a data management system to efficiently manage large-scale U 2 SOD data on massively data parallel GPUs And cut the runtimes from hours to seconds on a single commodity GPU device With the help of new data models, data structures and algorithms

10 System Design and Implementation Spatial Joins and Shortest Path Computation Day Month Year Raw data Compression, aggregation and indexing Physical Data Layout U 2 SOD-DB

11 System Design and Implementation Medallion# Shift# Trip# Trip_Pickup_DateTime Trip_Dropoff_DateTime Trip_Pickup_Location Trip_Dropoff_Location Start_Lon Start_Lat End_Lon End_Lat Payment_Type Surcharge Total_Amt Rate_Code Passenger_Count Fare_Amt Tolls_Amt Tip_Amt Trip_Time Trip_Distance vendor_name date_loaded store_and_forward time_between_service distance_between_service Start_Zip_Code End_Zip_Code start_x start_y end_x end_y (local projection) 1 2 3 4 5 6 7 8 9 11 10

12 System Design and Implementation Year Month Day Hour Day of the Year Week of the Year Day of the Week City Borough Community District Police Precinct Census Tract Census Block Street Segment Tax Lot Tax Block Pickup/drop-off locations Level 0 grid Level k grid Top level grid 15/30- minutes Pickup/drop-off timestamps NYC taxi trip records Peak/ off-peak Auxiliary data (weather, events…)

13 System Design and Implementation

14 P2P - T P2N-D P2P - D The three types of spatial joins are now supported by U 2 SOD-DB completely on GPUs with signficant speedups.

15 Case Studies and Performance Evaluations Data –Taxi trip records: 300 million in two years (2008-2010), ~170 million in 2009 (~150 million in Manhattan) –NYC DCPLION street network data: 147,011 street segments –NYC Census 2000 blocks: 38,794 –NYC MapPluto Tax blocks: 735,488 in four boroughs (excluding SI) and 43,252 in Manhattan Hardware –Dell T5400 Dual Quadcore CPUs with 16 GB memory –Nvidia Quadro 6000 with 448 cores and 6 GB memory

16 Case Studies and Performance Evaluations Top: grid size =256*256 resolution=128 feet Right: grid size =8192*8192 resolution=4 feet Spatial Aggregation 9,424 /326=30X (8192*8192) Temporal Aggregation 1709/198=8.6X (minute) 1598 /165 = 9.7X (hour)

17 Case Studies and Performance Evaluations T-Drive dataset: 17,762,489 GPS point locations; 47.25 milliseconds for aggregation (4,110 ms on CPU) using STL  87X speedup

18 Case Studies and Performance Evaluations P2P - T P2N-DP2P - D 147,011 street segments 38,794 census blocks (470941 points) 735,488 tax blocks (4,698,986 points) -15.2 hours30.5 hours 10.9 seconds11.2 seconds33.1 seconds -4,900X3,200X CPU time GPU Time Speedup

19 Conclusion and Future Work We reported our design and implementation of U 2 SOD- DB, a column-oriented, GPU-accelerated, in-memory data management system targeted at large-scale ubiquitous urban sensing origin-destination data Experiments have demonstrated signficant speedups over serial CPU implementations in main-memory (10- 100X) and traditional disk-resident systems (3000- 5000X) for processing 170 million taxi trip records and their spatial joins with various types of urban infrastructure data

20 Conclusion and Future Work Extend U 2 SOD-DB to handle other types of OD data as well as trajectory data Further improve the performance by designing and implementing more efficient data structures and algorithms on GPUs Apply U 2 SOD-DB to in-depth analysis of trip purposes and urban dynamics in NYC by collaborating with transportation researchers, and urban geographers.


Download ppt "U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga."

Similar presentations


Ads by Google