Outline Summary an Future Work Introduction Big Spatial data, GPU Computing and Distributed Platforms Spatial query processing on GPUs ISP System Architecture Implementations Experiments Setup Single-Node performance Scalability on Amazon EC2 Clusters Summary an Future Work
Taxi trip data in NYC Taxicabs 13,000 Medallion taxi cabs License priced at > $1M Car services and taxi services are separate Taxi trip records ~170 million trips (300 million passengers) in 2009 1/5 of that of subway riders and 1/3 of that of bus riders in NYC 2 2
Taxi trip data in NYC Over all distributions of trip distance, time, speed and fare (2009)
Taxi trip data in NYC How to manage taxi trip data? How good are they? Geographical Information System (GIS) Spatial Databases (SDB) Moving Object Databases (MOD) How good are they? Pretty good for small amount of data But, rather poor for large-scale data
Taxi trip data in NYC Can we do better? Example 1: Loading 170 million taxi pickup locations into PostgreSQL UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326); 105.8 hours! Example 2: Finding the nearest tax blocks for 170 million taxi pickup locations using open source libspatiaindex+GDAL 30.5 hours! Intel Xeon 2.26 GHz processors with 48G memory I do not have time to wait... Can we do better?
Cloud computing+MapReduce+Hadoop B C Thread Block CPU Host (CMP) Core Local Cache Shared Cache DRAM HDD SSD GPU SIMD PCI-E Ring Bus ... GDRAM MIC T0 T1 T2 T3 4-Threads In-Order 16 Intel Sandy Bridge CPU cores+ 128GB RAM + 8TB disk + GTX TITAN + Xeon Phi 3120A ~ $9,994
Attractive Features Extension is challenging! ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters SQL Frontend: translate SQL queries into execution plans C/C++ backend with SSE4 support (for strings operations) Efficient implementations of hash-joins (partitioned and non-partitioned) LLVM-based JIT …. Attractive Features http://www.slideshare.net/hadooparchbook/impala-architecture-presentation Extension is challenging!
7.1 billion transistors (551mm²) 2,688 processors Feb. 2013 7.1 billion transistors (551mm²) 2,688 processors 4.5 TFLOPS SP and 1.3 TFLOPS DP Max bandwidth 288.4 GB/s PCI-E peripheral device 250 W (17.98 GFLOPS/W -SP) Suggested retail price: $999 ASCI Red: 1997 First 1 Teraflops (sustained) system with 9298 Intel Pentium II Xeon processors (in 72 Cabinets) What can we do today using a device that is more powerful than ASCI Red 16 years ago?
Outline Summary an Future Work Introduction Big Spatial data, GPU Computing and Distributed Platforms Spatial query processing on GPUs ISP System Architecture Implementations Experiments Setup Single-Node performance Scalability on Amazon EC2 Clusters Summary an Future Work
Spatial query processing on GPUs Single-Level Grid-File based Spatial Filtering Vertices (polygon/polyline) Points Perfect coalesced memory accesses Utilizing GPU floating point computing power Nested-Loop based Refinement J. Zhang, S. You and L. Gruenwald, "Parallel Online Spatial and Temporal Aggregations on Multi-core CPUs and Many-Core GPUs," Information Systems, vol. 44, p. 134–154, 2014.
Spatial query processing on GPUs 38,794 census blocks (470,941 points) 735,488 tax blocks (4,698,986 points) 147,011 street segments P2N-D P2P-T P2P-D P2N-D P2P-T P2P-D - 15.2 h 30.5 h 10.9 s 11.2 s 33.1 s 4,900X 3,200X Algorithmic improvement: 3.7X Using main-memory data structures: 37.4X GPU Acceleration: 24.3X CPU time GPU Time Speedup
Outline Summary an Future Work Introduction Big Spatial data, GPU Computing and Distributed Platforms Spatial query processing on GPUs ISP-MC+ and ISP-GPU System Architecture Implementations Experiments Setup Single-Node performance Scalability on Amazon EC2 Clusters Summary an Future Work
pip_join(…) nearest_join(…) create_rtree(…) class SpatialJoinNode : public BlockingJoinNode { public: SpatialJoinNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl& descs); virtual Status Prepare(RuntimeState* state); virtual Status GetNext(RuntimeState* state, RowBatch* row_batch, bool* eos); virtual void Close(RuntimeState* state); protected: virtual Status InitGetNext(TupleRow* first_left_row); virtual Status ConstructBuildSide(RuntimeState* state); private: boost::scoped_ptr<TPlanNode> thrift_plan_node_; RuntimeState* runtime_state_; … } pip_join(…) nearest_join(…) create_rtree(…)
ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters
Outline Summary an Future Work Introduction Big Spatial data, GPU Computing and Distributed Platforms Spatial query processing on GPUs ISP System Architecture Implementations Experiments Setup Single-Node performance Scalability on Amazon EC2 Clusters Summary an Future Work
Taxi trip data in NYC Taxicabs 13,000 Medallion taxi cabs License priced at > $1M Car services and taxi services are separate Taxi trip records ~170 million trips (300 million passengers) in 2009 1/5 of that of subway riders and 1/3 of that of bus riders in NYC 16 16
Global Biodiversity Data at GBIF http://gbif.org SELECT aoi_id, sp_id, sum (ST_area (inter_geom)) FROM ( SELECT aoi_id, sp_id, ST_Intersection (sp_geom, qw_geom) AS inter_geom FROM SP_TB, QW_TB WHERE ST_Intersects (sp_geometry, qw_geom) ) GROUP BY aoi_id, sp_id HAVING sum(ST_area(inter_geom)) >T; 17 17
Single-node results: 16core CPU/128GB, GTX Titan ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters Single-node results: 16core CPU/128GB, GTX Titan ISP-GPU ISP-MC+ GPU-Standalone MC-Standalone taxi-nycb (s) 96 130 50 89 GBIF-WWF(s) 1822 2816 1498 2664 Taxi-nycb: ~170 million points, ~40 thousand polygons (9 vertices/polygon) GBF-WWF: ~375 million points, ~15 thousand polygons (279 vertices/polygon) Cluster results: 2-10 nodes each with 8 vCPU cores/15GB, 1536 CUDA cores/4 GB (50 million species locations used due to memory constraint)
Outline Summary an Future Work Introduction Big Spatial data, GPU Computing and Distributed Platforms Spatial query processing on GPUs ISP System Architecture Implementations Experiments Setup Single-Node performance Scalability on Amazon EC2 Clusters Summary an Future Work
Summary and Future Work Designs and implementations of an in-memory spatial data management system on multi-core CPU and many-core GPU clusters by extending Cloudera Impala for distributed spatial join query processing Experiments on the initial implementations have revealed both advantages and disadvantages of extending a tightly-coupled big data system to support spatial data types and their operations. Alternative techniques are being developed to further improve efficiency, scalability, extensibility and portability.
SpatialSpark: Just Open-Sourced Alternative Techniques SpatialSpark: Just Open-Sourced http://simin.me/projects/spatialspark/ val sc = new SparkContext(conf) //reading left side data from HDFS and perform pre-processing val leftData = sc.textFile(leftFile, numPartitions).map(x => x.split(SEPARATOR)).zipWithIndex() val leftGeometryById = leftData.map(x => (x._2, Try(new WKTReader().read(x._1.apply(leftGeometryIndex))))) .filter(_._2.isSuccess).map(x => (x._1, x._2.get)) //similarly for right-side data…. //ready for spatial query (broadcast-based) val joinPredicate =SpatialOperator.Within // NearestD can be applied similarly var matchedPairs:RDD[(Long, Long)] = BroadcastSpatialJoin(sc, leftGeometryById, rightGeometryById, joinPredicate)
Alternative Techniques Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing http://www-cs.engr.ccny.cuny.edu/~jzhang/papers/lde_spatial_tr.pdf