Download presentation
Presentation is loading. Please wait.
Published byWidyawati Oesman Modified over 6 years ago
1
MASS CUDA Performance Analysis and Improvement
Ahmed Musazay Faculty Advisor: Dr. Munehiro Fukuda
2
MASS Multi Agent Spatial Simulation
Allows non-computing specialists to parallelize simulations Concept of Place and Agent objects Three versions: C++, Java, CUDA High-level abstraction to non-computing specialists
3
CUDA C/C++ extension by NVidia
A heterogeneous parallel programming interface. Host – CPU , and Device – GPU Functions executing on the GPU are called Kernel functions Take configuration parameters for number of threads -fast, but difficult to use, hard to tune up perf -utilize performance and also bring high level abstraction of mass
4
MASS-CUDA Current version – written by Nathaniel Hart for Master’s thesis Ported C++ version into current CUDA version Object oriented- allows users to extend Place and Agent objects Designed with intention of using multiple GPU cards Nate’s work- porting from mass to cuda
5
Problem Performance issues Difficult to tune performance
Goal of project: Understand MASS Library and how it works Write unit tests to find where performance issues occur Propose solutions that can be implemented to increase performance of MASS CUDA
6
Heat2D Fourier’s heat equation Place objects – Metal
Simulation describing spread of heat in a given region over period of time Place objects – Metal Ran at four different sizes 250x250, 500x500, 1000x1000, 2000x2000
7
Test Case: Running Heat2D - Primitive Array
Heat2D simulation using array of doubles No objects created to contain information as opposed to MASS Simulation functions written as kernel functions
8
Results
9
Proposed Solution Store all data in MASS as user-defined primitive type arrays Index mapping to unique element Pros Fast accesses Can run larger simulations, requiring less heap memory overhead Cons User programmability
10
Test Case: Running Heat2D - Place objects
Ran simulation with same objects used in MASS, without using library function calls Metal & MetalState derived from Library classes containing same memory and internal functions Simulation functions re-written in CUDA as kernel functions
11
Results
12
Proposed Solution Remove unnecessary functionality that may be slowing library down Excessive memory transfers between host and device Partitioning logic Pros Can work on adding only a single feature of library at time, making sure meeting performance standard More computation spent on actual simulation rather than management Cons Scalability of library will be missing early in development
13
Test Case: Running Heat2D – Coalesced Accesses
Ran simulation using primitive values, but taking advantage of coalesced memory accesses Kernel functions taking array parameters as native dimension – 2D array cudaMallocPitch(), cudaMalloc3D()
14
Results
15
Proposed Solution Let MASS run the simulation in its native dimension (1D, 2D, 3D) Pros Faster memory accesses, increasing performance Cons Extra overhead of determining dimensions to run function as Will only be able to natively run up to 3 dimensions
16
Conclusion Removing unused features, implementing one feature at a time Coalesced memory accesses – using native array dimensions Using primitive arrays Consider : Shared memory
17
Final Words Relevant courses: Special thanks to:
CSS 430 Operating Systems CSS 422 Hardware and Computer Organization Special thanks to: Dr. Fukuda Nathaniel Hart
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.