Download presentation
Presentation is loading. Please wait.
1
Local Memory optimizations
SEMINAR 3 Local Memory optimizations
2
Outline of the seminar Student presentations Local memory optimization
Results from last year Basis for grading the works
3
Local memory bank conflicts
Local memory bank is 4 bytes wide and 256 bytes deep (AMD). 32 banks per CU Bank conflicts are checked checked within a half-wavefront Local memory performs best if there is one access to each bank by a half-wavefront or when the whole half-wavefront accesses the same bank (broadcast) Bank conflict means that work items within a half-wavefront request values from same banks in a single request Assuming a row work group (16,1,1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4
Explicit copy to local memory
__local uchar *local_left, __local uchar *local_right PASSED IN AS KERNEL ARGUMENTS const int2 gid = (int2)(get_global_id(0), get_global_id(1)); const int2 lid = (int2)(get_local_id(0), get_local_id(1)); // fill the left local memory buffer for (Ly = 0; Ly < local_height; Ly += get_local_size(1)) { for (Lx = 0; Lx < local_width; Lx += get_local_size(0)) { Lindex = (lid.y + Ly)*local_width + lid.x + Lx; Gindex = (gid.y + Ly) * width + gid.x + Lx; local_left[Lindex] = srcL[Gindex]; local_right[Lindex] = srcR[Gindex - MAX_DISP]; }
5
Results from 2016
6
Mali-T624 (Honor 6) results from 2016 OS Android 5.1.1
C (cpu, single thread) s OpenCL (gpu) s OpenCL Vectorization (gpu) s
7
Odroid instructions Username and password for odroids are both ”odroid” There is a Mali_SDK shortcut on the desktop Open it and then open the samples folder Copy the template folder and rename it Copy your files to the folder and edit the Makefile Replace the template.cpp with your .cpp files on the SOURCES line On HEADERS line, include the required header files On line EXECUTABLE, rename the executable if you wish Open the MATE terminal Go to the folder you created Eg. cd /home/odroid/Desktop/Mali_OpenCL_SDK_v1.1.0/samples/your_folder Build your project Type ”make” Run your project ./your_executable
8
Brief CodeXL instructions
1. In Visual Studio, open the CodeXL tap 2. Switch to profile mode 3. Choose the GPU: Performance Counters 4. Start CodeXL GPU Profiling
9
About the grading If the work returned before the deadline at midnight 2. Everything works + final report and training diary returned 3. Minor optimizations (native functions, fast floating point math etc.) 4. Vector optimization 5. Local memory optimization Extra +1 can be granted if CodeXL profiling performed and possible further actions to optimize the code based on the profiling feedback is given in the final report CodeXL available in the workstations in TS135 and TS351 If e.g. Nvidia & Intel have similar tools they can be used as well
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.