Download presentation
Presentation is loading. Please wait.
1
Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space Comp 790 GPGP Project Stephen Olivier
2
Currently… Have a working “naïve” implementation in which each thread computes one dot product (Similar to Sashi’s implementation) –1.26 GFLOPs, 7.56 GB/s for n=32k, nz/row=20 In the midst of implementing a version using the texture memory, which is cached, to store the input vector Also developing an analytic model to express the parameterization of work and data partitioning to suit the G80
3
Pertinent Constraints Available parallelism Potential reuse Capacity constraints of the various memories Multithreading constraints Thread/block/grid layout Data distribution and blocking for the memory hierarchy Amount of sequential work done for latency hiding
4
Resulting Analytic Model Model will approximate ideal parameters based on problem size, e.g. number of rows and (average) number of nonzeros per row Plan to verify the model by testing against a wide variation in the combinations of the parameters for some key sample problems Can implement the model as an “autotuner” for G80 SpMV in the spirit of ATLAS or FFTW Can integrate directly into code for g80 iterative methods, e.g. conjugate gradient
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.