Download presentation
Presentation is loading. Please wait.
Published byScarlett Webb Modified over 9 years ago
1
HiPC 2010 AN INTEGER PROGRAMMING FRAMEWORK FOR OPTIMIZING SHARED MEMORY USE ON GPUS Wenjing Ma Gagan Agrawal The Ohio State University
2
HiPC 2010 GPGPU General Purpose Programming on GPUs (accelerators) High performance/price ratio High language support CUDA Performance vs Productivity Hard to program Memory hierarchy to manage...
3
HiPC 2010 Architecture of GPU
4
HiPC 2010 Automatic code generation Device memory access is expensive Using shared memory Texture and constant memory Coalescing device memory access... Get High Performance from GPU And Make the Programming Simple!
5
HiPC 2010 FEATURES OF SHARED MEMORY Small, fast, like a cache 16KB on each multiprocessor (no more than 48KB even on the latest GPU) Read-write Software controlled __shared__ float data[n][n]; Allocating shared memory: Similar to register allocation
6
HiPC 2010 Problem Formulation for Shared Memory Arrangement Consider variables and basic blocks in a function Element of array, array, section of array Each variable can have several live ranges in the function Access feature of live range: read, write, read-write, temp Determine in which basic block a variable is allocated to shared memory Assign_point[i][k]: variable i, basic block k
7
HiPC 2010 Integer Programming Problem Integer Linear Programming Objective function Maximize z = C T x Constraints Solution Values of x Special case of linear programming All the unknown variables are integers (1-0 in our case) Solvable for reasonable size of problems
8
HiPC 2010 Integer Programming for Shared Memory Arrangement Objective Function Maximize shared memory usage Minimize data transfer between memory hierarchies
9
HiPC 2010 Integer Programming for Shared Memory Arrangement (cnt’d) Objective Function
10
HiPC 2010 An Example to Show size_alloc for (int i=0; i<n; i++) for (int j=0; j<m; j++) for (int k = 0; k<r; k++) C[k] += A[i][k]- B[j][k];......
11
HiPC 2010 Integer Programming for Shared Memory Arrangement (cnt’d) Constraints Total allocation does not exceed the limit of shared memory at any time Only at most one assign_point is 1 in each live range
12
HiPC 2010 Integer Programming for Shared Memory Arrangement (cnt’d) Obtaining parameters Using LLVM compiler framework Pass 1: get access features Read, write, read-write, temp Pass 2: get live ranges, loop information, indices, and all other parameters
13
HiPC 2010 Code Generation According to the shared memory arrangement obtained from the integer programming model Under the framework in previous work Move data to cover gap caused by data evicted from shared memory
14
HiPC 2010 An Example A: n*r B: m*r C: r n: 2048 m: 3 r: 3 NUM_THREADS: 256 assign_point[0][1]=1; assign_point[1][0]=1; assign_point[2][0]=1; /* all other elements of assign_point are 0 */ for (int i=0; i<n; i++) for (int j=0; j<m; j++) for (int k = 0; k<r; k++) C[k] += A[i][k]- B[j][k];...... Integer Programming Solver
15
HiPC 2010 An Example (cnt’d) Generated Code: __shared__ float s_B[m][r]; __shared__ float s_C[r*NUM_THREADS]; __shared__ float s_A[r*NUM_THREADS]; for(int i=0;i<m*r;i++) s_B[i]=B[i]; for(int i=0;i<n;i+=NUM_THREADS) { for(int j=0;j<r;j++) s_A[tid*r+j]=A[tid+i][j]; for(int j=0;j<m;j++) for(int k=0;k<r;k++) s_C[k*tid]+=s_A[tid*r+k]-s_B[j][k];...... } /* Synchronize and combination of C */
16
HiPC 2010 Suggesting Loop Transformation for (int rc = 0; rc < nRowCl; rc++) { tempDis = 0; for(int c = 0;c<numCol;c++) tempDis = tempDis + data[r][c] * Acomp[rc][colCL[c]]; } for (int rc = 0; rc < nRowCl; rc++) tempDis[rc] = 0; for(int c = 0;c<numCol;c++) { /* load into shared memory */ for (int rc = 0; rc < nRowCl; rc++) { tempDis[rc] += data[r][c] * Acomp[rc][colCL[c]]; }
17
HiPC 2010 Experiments Effectiveness of using shared memory Compare with intuitive approach in previous work Greedy sorting: sort all the variables in increasing order of size, and allocation them on shared memory until to the limit of shared memory Effectiveness of loop transformation suggested by the integer programming model
18
HiPC 2010 Experiment Results
19
HiPC 2010 Experiment Results K-means EM
20
HiPC 2010 Experiment Results (cnt’d) PCA Co-clustering
21
HiPC 2010 Effect of Loop Transformation PCA Co-clustering
22
HiPC 2010 Conclusion and Future Work Proposed an integer programming model for shared memory arrangement on GPU Consider numeric variable, array, and section of array Suggested loop transformation for optimization Got better results than the intuitive method Will automate the code generation and loop transformation selection in future
23
HiPC 2010 THANK YOU! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.