Local Memory optimizations SEMINAR 3 Local Memory optimizations
Outline of the seminar Student presentations Local memory optimization Results from last year Basis for grading the works
Local memory bank conflicts Local memory bank is 4 bytes wide and 256 bytes deep (AMD). 32 banks per CU Bank conflicts are checked checked within a half-wavefront Local memory performs best if there is one access to each bank by a half-wavefront or when the whole half-wavefront accesses the same bank (broadcast) Bank conflict means that work items within a half-wavefront request values from same banks in a single request Assuming a row work group (16,1,1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Explicit copy to local memory __local uchar *local_left, __local uchar *local_right PASSED IN AS KERNEL ARGUMENTS const int2 gid = (int2)(get_global_id(0), get_global_id(1)); const int2 lid = (int2)(get_local_id(0), get_local_id(1)); // fill the left local memory buffer for (Ly = 0; Ly < local_height; Ly += get_local_size(1)) { for (Lx = 0; Lx < local_width; Lx += get_local_size(0)) { Lindex = (lid.y + Ly)*local_width + lid.x + Lx; Gindex = (gid.y + Ly) * width + gid.x + Lx; local_left[Lindex] = srcL[Gindex]; local_right[Lindex] = srcR[Gindex - MAX_DISP]; }
Results from 2016
Mali-T624 (Honor 6) results from 2016 OS Android 5.1.1 C (cpu, single thread) 300.36 s OpenCL (gpu) 25.176 s OpenCL Vectorization (gpu) 7.243 s
Odroid instructions Username and password for odroids are both ”odroid” There is a Mali_SDK shortcut on the desktop Open it and then open the samples folder Copy the template folder and rename it Copy your files to the folder and edit the Makefile Replace the template.cpp with your .cpp files on the SOURCES line On HEADERS line, include the required header files On line EXECUTABLE, rename the executable if you wish Open the MATE terminal Go to the folder you created Eg. cd /home/odroid/Desktop/Mali_OpenCL_SDK_v1.1.0/samples/your_folder Build your project Type ”make” Run your project ./your_executable
Brief CodeXL instructions 1. In Visual Studio, open the CodeXL tap 2. Switch to profile mode 3. Choose the GPU: Performance Counters 4. Start CodeXL GPU Profiling
About the grading If the work returned before the deadline 12.4. at midnight 2. Everything works + final report and training diary returned 3. Minor optimizations (native functions, fast floating point math etc.) 4. Vector optimization 5. Local memory optimization Extra +1 can be granted if CodeXL profiling performed and possible further actions to optimize the code based on the profiling feedback is given in the final report CodeXL available in the workstations in TS135 and TS351 If e.g. Nvidia & Intel have similar tools they can be used as well