Download presentation
Presentation is loading. Please wait.
Published byKerry Hunt Modified over 9 years ago
1
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P. Finn, Jing Li, and David Mattli ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based Services Suzhou, China 14 – 16 May 2014
2
HPC-Research/ Motivation Prime test case: Map projection/ reprojection for large raster datasets (“Big” Data?) pRasterBlaster: mapIMG in HPC environment Solve problems using multiple processors Currently testing within the NSF CyberGIS Project leveraging XSEDE (more traditional supercomputing (SC) environment) How does the same problem compare in a computation sense between CPU-dominate SC environment and a more light-weight General Purpose GPU-dominate environment?
3
CUDA A parallel computing platform and programming model invented by Nvidia Allows GPUs to be used for general purpose processing (not exclusively graphics) GPUs have a parallel throughput architecture that allows executing many concurrent threads slowly (rather than executing a single thread very quickly) Accessible to software developers through libraries, compiler directives, and extensions to programming languages, including C, C++ and Fortran
4
Accurate Raster Reprojection in Three (primary) Steps Step 1: Calculate and Partition Output Space Step 2: Read Input and Reproject Step 3: Combine Temporary Files
5
The Equations Projection Transformation Process: Framing – The frame of a raster dataset defines the extent of the dataset in the projection space. It also defines the alignment of projection space with the input (often) image coordinate system. X = ULprojX + ((sample – 1) * pixelSizeX)(1) Y = ULprojY – ((line – 1) * pixelSizeY)(2) – Alternatively: Sample = ((X – ULprojX) / PixelSizeX) + 1(3) Line = ((ULprojY – Y) / pixelSizeY) + 1(4)
6
CUDA implementation 4 corner point based map projection using CUDA
7
Raster Chunk Handling Cannot merge output chunks due to the limitation of computing resources
8
Results Configuration of the testing machine – Intel Quad-core CPU (i5-3450 CPU@3.10GHz) – GeForce GT 640, 384 GPU cores – 8G RAM – NVIDIA CUDA SDK 5.5 – Visual Studio 2010
9
Results CUDA configuration – Block size:256*1 – Chunk dimension: 1024 Resample GLC (original: ~900MB) FileResDimensionVolumeCPUGPURatio 1100004003*20037.64 MB2575.5446 250008006*400330.56 MB896 (C)21.635/17.553(C)51 3200020015*10008191.03 MB5548(C)105.173(C)52 4100040030*20015764.08 MBNA407.65(C) Equirectangular to Albers NA = Out of memory (8 Gb) on test machine
10
Results CUDA configuration – Block size:256*1 – Chunk dimension: 1024 Resample NLCD (original: 15.6G) Albers to Equirectangular FileResDimensionVolumeCPUGPURatio 112004030* 261110.03 MB3213.06104 2 C6008060* 522140.13 MB109112.5886 3 C30016119* 10442 160.52 MB435052.0183 4 C15032238* 20885 642.10 MBNA204.93
11
Issues (1 of 2) The inverse/ forward map projection for Molliweide is not accurate – Need to find the reasons why (should be a minor fix) – Therefore, restrained the current testing to Equirectangular and Albers The results of map projection were inaccurate due to misapplied resampling method (minor fix) The way to retrieve input data chunk based on the bounding box of output chunk may not be quite accurate – Problem identified: chunks near the edges of dataset need to have some overlap retrieved (negative coordinates)
12
Issues (2 of 2) Needs better memory management – CPU: Out of memory error even with chunk – Suspect test machine not releasing memory in timely fashion GPU: not stable always: kernels may fail during the execution; grid/ block setup – Workload may not be balanced very well. – Kernels can fail when sending too much data – Using remote desktop to manipulate the data may cause issue
13
Conclusion CUDA provides a light-weight, less-expensive alternative to CPU parallel environments like supercomputers Raster map projection behaves similarly in initial test to established pRasterBlaster testing in CPU-dominated HPC environments – Greater than one order of magnitude faster More work necessary/ issues remain
14
References Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia. Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67. Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Baltimore, Maryland. Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256. Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February. http://cegis.usgs.gov/ http://www.du.edu/nsm/departments/geography/ http://nationalmap.gov/3DEP/ http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster
15
Other Collaborators (primarily on the CyberGIS project) Shaowen Wang, Anand Padmanabhan, Yan Liu – University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel – USGS, Center of Excellence for Geospatial Information Science (CEGIS) Kristina H. Yamamoto – USGS, National Geospatial Technical Operations Center Babak Behzad – UIUC, Department of Computer Science Eric Shook – Kent State University, Department of Geography Qingfeng (Gene) Guan – China University of Geosciences
16
Disclaimer Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. Government.
17
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey QUESTIONS? ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based Services Suzhou, China 14 – 16 May 2014
19
The block size is not directly related to the chunking concept. Block size is the number of threads within each block of GPU. Another concept is grid size. CUDA can launch multiple threads at the same time (e.g., 512 threads). All threads in a block will be sent to the GPU processors at the same time but may not launch at the same time (depending how many GPU cores are available). In my implementation, I assign each cell of the output image/chunk to a thread. If the output image has a dimension of 256*256 and the block size is 16*16, then the grid size is (256/16)*(256/16) =16*16. If the output image has a dimension of 250*250 and the block size is 16*16, then the grid size is (256/16)*(250/16) = 16*15.x = 16*16. This implies that the last few blocks have less data (e.g., 16*10). So the selection of the block size is determined by the number of GPU cores as well as the dimension of the image. When dealing with large image, which cannot be read into CPU main memory all at once, the image should be divided into chunks. One chunk then becomes an input image. Then GPU starts processing the chunk..
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.