Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing.

Slides:



Advertisements
Similar presentations
MEMORY popo.
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Dan Iannuzzi Kevin Pine CS 680. Outline The Problem Recap of CS676 project Goal of this GPU Research Approach Parallelization attempts Results Difficulties.
Components of Computer
What is an Algorithm? (And how do we analyze one?)
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 20 – 11 – 2011 College Of Computer Science and Information, Information Systems.
Celso Ferreira¹, Francisco Olivera², Dean Djokic³ ¹ PH.D. Student, Civil Engineering, Texas A&M University ( ² Associate.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Optimizing and Auto-Tuning Belief Propagation on the GPU Scott Grauer-Gray and Dr. John Cavazos Computer and Information Sciences, University of Delaware.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
Word Processing, Web Browsing, File Access, etc. Windows Operating System (Kernel) Window (GUI) Platform Dependent Code Virtual Memory “Swap” Block Data.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Review student: Fan Bai Instructor: Dr. Sushil Prasad Andrew Nere, AtifHashmi, and MikkoLipasti University of Wisconsin –Madison IPDPS 2011.
GPGPU platforms GP - General Purpose computation using GPU
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Computing Hardware Starter.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Storage in Big Data Systems
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
IT253: Computer Organization
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
GPU Architecture and Programming
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Synthesizing Natural Textures Michael Ashikhmin University of Utah.
Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Activity 1 Review the work from last lesson so that you can explain the following: -What is the purpose of a CPU. -What steps does the CPU take to process.
Sunpyo Hong, Hyesoon Kim
Lecture Exam 1 Study Guide Albert Kalim. Chapter 1: Computer Basics 1. Explain why it’s essential to learn about computers today. 2. Discuss several ways.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Corn Yield Comparison Between EPIC-View Simulated Yield And Observed Yield Monitor Data by Chad M. Boshart Oklahoma State University.
December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Computer Graphics Graphics Hardware
Generalized and Hybrid Fast-ICA Implementation using GPU
Modeling Big Data Execution speed limited by: Model complexity
CS427 Multicore Architecture and Parallel Computing
How will execution time grow with SIZE?
Linux Operating System Architecture
Digital Terrain Analysis for Massive Grids
Spatial Analysis With Big Data
Faster File matching using GPGPU’s Deephan Mohan Professor: Dr
MASS CUDA Performance Analysis and Improvement
Storage Structure and Efficient File Access
Computer Graphics Graphics Hardware
Presentation transcript:

Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing multiple functions on a single dataset achieved even greater performance gains (Figure 4). A final test included slope processing of a 95GB, 25 billion pixel raster file in under 3,600 seconds, indicating continued linear performance even with enormous file sizes. Future work should include utilization of parallel CPU threads for reading data into RAM while the GPU processes data, which we believe will provide the most dramatic performance increase. Additionally, because the greatest efficiencies were demonstrated when performing multiple computations for the data on the GPU, future work should include exploration of similar models in climatology, ecology, remote sensing, and natural resource assessment. There are many GIS models that require multiple iterations that can see enormous speed improvements if redesigned within a GPU context. Results and Discussion: On average, the introduction of CUDA provided a 7x improvement over the same function used in ArcGIS. Execution time for the CUDA functions exhibited a very stable and predictable linear trend for all functions over the various size raster files, allowing us to better predict response times for larger data sets. To test the linear fit we forecasted execution time out to a 100GB (25 billion pixels) raster and predicted an execution time of XXX. Our actual processing of the file completed in under XXX seconds. Unfortunately, the larger file was too large to process in ArcGIS so a comparison between the products is impossible to determine. In contrast, while most of the execution times for ArcGIS appear linear with respect to size, both slope and aspect execution deviate substantially from a linear fit when the size of the input raster was between 1.4GB and 4.0GB. We suspect that the deviation be related to the ESRI ADF format’s separation of data into files between 1.2 GB and 2GB or separate algorithm for datasets over some threshold size. The greatest bottleneck was obtaining data from the disk and moving to RAM (Figure 1, step 1). One data was on the GPU, parallelization took over and made short work of the processing. Numerous strategies were used to improve the data acquisition step with noticeable results. Testing under other hardware configurations confirmed that having more GPU memory to load larger chunks of data played a bigger role in execution than having more processing streams. Parallelizing Raster-Based Functions in GIS with CUDA C Sean Kirby, University of Maryland, Eastern Shore William Kostan, University of Virginia Dr. Arthur J. Lembo, Jr., Salisbury University Model Instability Introduction: A continuing challenge in GIScience and many other fields is the efficient and economical processing of massive data sets. We describe how to leverage massive parallel processing through general purpose computing using the graphics processing unit (GPGPU) for raster GIS operations. Using video-gaming cards to create parallel processing solutions within the GPU is relatively new and almost nonexistent in GIScience. By designing a software add-on built to run select raster analysis operations on a CUDA-enabled GPU, we were able to achieve a 7x speed improvement compared to traditional (ArcGIS) raster processes. In addition, our approach allows us to perform raster processing in under an hour for 95GB files with over 25 billion pixels. Finally, we identify key areas of future work to improve execution time. Figure 3: Comparison of ArcGIS and CUDA-C generated performance of several raster functions as measured by execution time show that CUDA processes the data 3-22 times faster than the corresponding ArcGIS process. In addition, the CUDA process was perfectly linear (R 2 =.96), while the ArcGIS process exhibited serious instability between 1.25GB and 4GB for Slope and Aspect. Figure 4: CUDA vs. ArcGIS performance in executing six spatial analysis transformations per dataset as measured by execution time Figure 1: CUDA execution model * Additional speed improvements may be achieved by running this step as separate, parallel threads on the CPU, thereby allowing the CPU to prepare the next data block while the GPU is processing data Data and Methods: Fourteen ESRI FLT raster files (ranging from 24MB to 12GB) were processed and analyzed using 3x3 kernel functions for slope, aspect, terrain ruggedness, min, max, range, and mean. These functions were chosen as they represent “embarrassingly” parallel functions that can be run independent from one another, ideal for GPU processing. Each file was processed 20 times to determine the average speed using both ArcGIS and CUDA-C. The CUDA-C process (Figure 1) included: 1.Partition data by rows and read into an array in memory* 2.Copy data array to GPU. 3.Launch GPU kernel for raster processing. 4.Process data in parallel in GPU through a combination of blocks and threads. 5.Upon completion of each kernel, copy data from GPU to the CPU. 6.Write output to the appropriate file (one file per raster function) 7.Additional kernels are launched from data already on the GPU Figure 2: Oxford County, MD, LiDAR Imagery of Oxford County before (black and white) and after (hues of green and red) slope transformation. Figure 5: CUDA program integrated into ArcMap 10.0 as a toolbox with full graphical user interface (GUI) CUDA Toolbox Final Software Interface: One of the primary goals of the project was to create a simplistic add- on to ArcGIS for dramatic performance improvement. This was accomplished by incorporating the CUDA executable into a model in ArcToolbox for use any computer with ArcGIS and a CUDA-enabled GPU