Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data: A Case study of NYC Taxi Trip Records Jianting Zhang1,2.

Similar presentations


Presentation on theme: "Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data: A Case study of NYC Taxi Trip Records Jianting Zhang1,2."— Presentation transcript:

1 Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data: A Case study of NYC Taxi Trip Records Jianting Zhang1,2 Simin You23, Yinglong Xia4 1 Department of Computer Science, CUNY City College (CCNY) 2 Department of Computer Science, CUNY Graduate Center 3 Pitney Bowes, Inc. 4 IBM T. J. Watson Research Center

2 Outline Introduction, Background and Motivation
System Architecture and Implementations Geospatial backend Graph Database for Social Network Analysis Web Frontend Experiments and Demonstrations Summary and Future Work

3 Taxi Trip OD Data in NYC Taxicabs 13,000 Medallion taxi cabs
Car services and taxi services are separate Taxi trip records ~170 million trips (300 million passengers) in 2009 1/5 of that of subway riders and 1/3 of that of bus riders in NYC 2013 and onward data are open ( 3 3

4 Other types of OD Data Social network activities
Call Detail Record (CDR) 4 4

5 Vis. GIS Big Data and HPC Web
Web-based High-Performance Visual Analytics Platform for Origin-Destination Data GIS Vis. Web See Section 2 for a more detailed review Web-GIS Big Data and HPC

6 Commodity Parallel Hardware
B C Thread Block CPU Host (CMP) Core Local Cache Shared Cache DRAM Disk SSD GPU SIMD PCI-E Ring Bus ... GDRAM MIC T0 T1 T2 T3 4-Threads In-Order 16 Intel Sandy Bridge CPU cores+ 128GB RAM + 8TB disk + GTX TITAN + Xeon Phi 3120A ~ $9994 (Jan. 2014)

7 Prototype System Architecture and Components
CCNY Geospatial Backend Web Proxy (PHP) Web-GIS API Large-scale geospatial data management Columnar data layout and storage Spatial query processing Web frontend for geospatial and geosocial visual exploration using NYC Taxi Trip Data Spatial and spatiotemporal aggregation to derive graph structures and weights IBM SystemG Backend Network/Web Communication Javascript asynchronous function call JSON string encoding and parsing User data and Web-GIS API binding Frontend Geometry Library MBR Indexing Point-in-Polygon Test Line-Polygon Intersection Graph Data Management and Analytics: Shortest Path, Centrality, PageRank… GUI Spatial selection: polygon drawing Temporal selection: dropdown list OD indication: arrow/polyline drawing 1 3 2

8 Geospatial backend 1 Dual role:
Online processing spatial queries through client side visual exploration interfaces Offline aggregating OD records to generate dynamic graphs for online social network analysis and visualization. Design Choices: Traditional GIS and spatial databases (for point aggregations over polygons) disk-resident, serial computing  slow for large scale data Standard programming interfaces/protocols (e.g., SQL, OGC specifications) easy to use Hardware accelerated parallel systems High-performance Robustness/usability concerns Observations: Interactively drawn ROI polygons are typically simple (low complexity) Linear scan of points is cache friendly and embarrassingly parallelizable) Our solution: a lightweight parallel backend for high-performance point aggregations

9 Geospatial backend 1 int pip_count(float vertices[][2], int num_vertices) { int count = 0; #pragma omp parallel for reduction(+:count) for (int i = 0; i < num_points; ++i) { double x = point_x[i]; double y = point_y[i]; if (x < xmin || x > xmax || y < ymin || y > ymax) continue; bool in_polygon = false; for (int j = 0; j < num_vertices-1; ++j) { double x0 = vertices[j][0]; double x1 = vertices[j+1][0]; double y0 = vertices[j][1]; double y1 = vertices[j+1][1]; if ((((y0 <= y) && (y < y1)) || ((y1 <= y) && (y < y0))) && (x < (x1 - x0) * (y - y0) / (y1 - y0) + x0)) in_polygon = !in_polygon; } if (in_polygon) ++count; return count; Simple OpenMP directive for parallelization on multi-core CPUs MBR filtering: many ROIs have small spatial extends In-memory processing: scanning ~170 million points in ~1/4s for a typical ROI polygon on a legacy machine (dual quad-core 2.0 GHZ released in 2007) PIP test code due to W. Randolph Franklin of RPI in 1990s

10 Graph Database 2 http://systemg.research.ibm.com/
IBM SystemG Backend Primarily use SystemG as a graph database backend to manage dynamical graphs and provide social network analysis functionality To respond to dynamic parameters (spatial, temporal and thematic) during a visual exploration process, retrieve and transform the corresponding graphs, perform required graph analytics and send back the results. A whole spectrum solution for large scale graph processing, including graph storage, runtime, analytics and visualization Use PageRank for demonstration purposes where graph weights are defined as the numbers of OD records between an OD pair Built-in support for web-based applications (socket mode and JSON support): easy to use and fast prototyping PageRank extension: consider not only graph structure (node degrees) but also edge weight

11 Web Frontend 3 All implemented in Javascript (Google Map API) Web frontend for geospatial and geosocial visual exploration using NYC Taxi Trip Data Network/Web Communication Javascript asynchronous function call JSON string encoding and parsing User data and Web-GIS API binding Frontend Geometry Library MBR Indexing Point-in-Polygon Test Line-Polygon Intersection Check the validity of interactively selected OD pairs Query graph weights of any OD pairs using a map interface To support information seeking mantra – “Overview First, filter and zoom and details on demand” GUI Spatial selection: polygon drawing Temporal selection: dropdown list OD indication: arrow/polyline drawing

12 Demonstration #1 http://134.74.112.65/ibmjsa/web/
Interactive Spatial Query Processing Demonstration after Users Draw a Pair of OD Polygons

13 Demonstration #2 http://134.74.112.65/~you/geosocial/
Thematic selection (datasets) Temporal selection (hours) OD Pair selection Mapping PageRank result

14 Summary and Future Work
Report our work on developing a high-performance research platform to visually explore large-scale urban OD data in a web computing environment. Integrates an in-memory parallel geospatial query processing backend and a graph database backend and provides several novel web frontend modules for both functionality and efficiency Demonstrate preliminary implementations using NYC taxi trip data Extend the geospatial backend to efficiently support more types of spatial queries, in addition to point-in-polygon test Work with IBM SystemG development team to integrate spatial data processing functionality to support in-graph spatial queries Develop more intuitive visual gadgets for temporal selection in the web frontend

15 Acknowledgement CISE/IIS Medium Collaborative Research Grants / : “Spatial Data and Trajectory Data Management on GPUs” Joint Study Agreement (JSA #W ) between IBM T. J. Watson Research Center and CCNY Q&A


Download ppt "Prototyping A Web-based High-Performance Visual Analytics Platform for Origin-Destination Data: A Case study of NYC Taxi Trip Records Jianting Zhang1,2."

Similar presentations


Ads by Google