OpenCL Usman Roshan Department of Computer Science NJIT
OpenCL Universal language for parallel programming Increasing usage in GPU computing Pros: your GPU program will run not just on NVIDIA but other GPUs as well (such as AMD) Cons: not as easy to program in as CUDA
SimpleOpenCL Open source API for writing OpenCL programs Main challenge in OpenCL programs is the setup SimpleOpenCL provides simple functions for setting up the GPU
Strategy to convert Chi2 in CUDA to OpenCL Define blocks and threads – With arrays global_work_size[2] and local_work_size[2] – global_work_size[0] = BLOCKS * THREADS; – global_work_size[1] = 1; – local_work_size[0] = THREADS; – local_work_size[1] = 1; Initialize hardware – hardware = sclGetAllHardware(&found); – sclPrintHardwareStatus(*hardware); Initialize software – software = sclGetCLSoftware(OPENCL_KERNEL_FILE, ”name_of_kernel_function", hardware[0]);
CUDA to OpenCL Device arrays defined with cl_mem Replace cudamalloc with – dev_results_clmem = sclMalloc( hardware[0], CL_MEM_READ_WRITE, size * sizeof(float) ); To write to GPU memory replace cudamemcpy with – sclWrite( hardware[0], size * sizeof(unsigned char), dev_dataT_clmem, (void*) dataT ); To read from GPU memory replace cudamemcpy with – sclRead( hardware[0], cols * sizeof(float), results_clmem, host_results );
CUDA to OpenCL Replace kernel call by first setting kernel parameters – sclSetKernelArg( software, 0, sizeof(uint), &var) – sclSetKernelArg( software, 1, sizeof(cl_mem), (void*) &dev_var_clmem) – sclSetKernelArg( software, 2, sizeof(cl_mem), (void*) &dev_const_var_clmem) Then call the kernel with – sclLaunchKernel( hardware[0], software, global_work_size, local_work_size );
Modifications to GPU kernel code Use __kernel to define kernel function Use __global and __local for global and local variables Use __constant for constant memory definitions Get thread id with get_global_id(0);