OpenCL. #include int main() { cl_platform_id platform[10]; cl_uint num_platforms; clGetPlatformIDs(10, platform, &num_platforms); cl_int clGetPlatformIDs.

Slides:

Advertisements

Similar presentations

Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),

Advertisements

GPGPU Labor 8.. CLWrapper OpenCL Framework primitívek – ClWrapper(cl_device_type _device_type); – cl_device_id device_id() – cl_context context() – cl_command_queue.

National Tsing Hua University ® copyright OIA National Tsing Hua University Image.

 Open standard for parallel programming across heterogenous devices  Devices can consist of CPUs, GPUs, embedded processors etc – uses all the processing.

OpenCL Ryan Renna. Overview  Introduction  History  Anatomy of OpenCL  Execution Model  Memory Model  Implementation  Applications  The Future.

China MCP 1 OpenCL. Agenda OpenCL Overview Usage Memory Model Synchronization Operational Flow Availability.

OPENCL OVERVIEW. Copyright Khronos 2009 OpenCL Commercial Objectives Grow the market for parallel computing For vendors of systems, silicon, middleware,

Barcelona, 2013 Vicenç Beltran, Programming with OmpSs Seminaris d’Empresa 2013.

Martin Kruliš Martin Kruliš Dfx Voodoo 1 ◦ First graphical (3D) accelerator for desktop PCs NVIDIA GeForce 256 ◦ First Transform&Lightning.

GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group.

National Tsing Hua University ® copyright OIA National Tsing Hua University OpenCL Tutorial.

Демидов А.В г. Операционные системы Лекция 4 Работа с файлами.

OpenCL Usman Roshan Department of Computer Science NJIT.

Illinois UPCRC Summer School 2010 The OpenCL Programming Model Part 1: Basic Concepts Wen-mei Hwu and John Stone with special contributions from Deepthi.

The Open Standard for Parallel Programming of Heterogeneous systems James Xu.

Instructor Notes This is a brief lecture which goes into some more details on OpenCL memory objects Describes various flags that can be used to change.

Open CL Hucai Huang. Introduction Today's computing environments are becoming more multifaceted, exploiting the capabilities of a range of multi-core.

Computer Graphics Ken-Yi Lee National Taiwan University.

Instructor Notes GPU debugging is still immature, but being improved daily. You should definitely check to see the latest options available before giving.

OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT

Open64 Developers Forum 2010 OpenCL Compiler Support Based on Open64 for MPUs+GPUs Yu-Te Lin, Chung-Ju Wu Chia-Han Lu, Shao-Chung Wang Jenq-Kuen Lee Department.

Краткое введение в OpenCL. Примеры использования в научных вычислениях А.С. Айриян 7 мая 2015 Лаборатория информационных технологий.

Dr. Lawlor, U. Alaska: EPGPU 1 EPGPU: Expressive Programming for GPGPU Dr. Orion Sky Lawlor U. Alaska Fairbanks

NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute, UIUC OpenCL: Molecular Modeling on Heterogeneous.

Instructor Notes This is a brief lecture which goes into some more details on OpenCL memory objects Describes various flags that can be used to change.

OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson,

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Feb 28, 2013, OpenCL.ppt OpenCL These notes will introduce OpenCL.

OpenCL Sathish Vadhiyar Sources: OpenCL quick overview from AMD OpenCL learning kit from AMD.

Instructor Notes Discusses synchronization, timing and profiling in OpenCL Coarse grain synchronization covered which discusses synchronizing on a command.

High Efficiency Computing with OpenCL and FPGAs Fernando Martinez June 2014.

Ilija Vukotic GP using GP GPU my experience with OpenCL Future computing in particle physics 15. Jun

OpenCL Programming James Perry EPCC The University of Edinburgh.

Portability with OpenCL 1. High Performance Landscape High Performance computing is trending towards large number of cores and accelerators It would be.

OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.

Микропроцессорные системы Программирование INTEL 8086 Системная программа Debug.

An Introduction to OpenCL

Copyright ©: Nahrstedt, Angrave, Abdelzaher

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.

Стеки и очереди 1. Абстрактный стек public interface Stack { static class Underflow extends Exception { public Underflow() { super("Stack underflow");

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.

Heterogeneous Computing with OpenCL Dr. Sergey Axyonov.

OpenCL The Open Standard for Heterogenous Parallel Programming.

Instructor Notes This is a straight-forward lecture. It introduces the OpenCL specification while building a simple vector addition program The Mona Lisa.

Heterogeneous Programming OpenCL and High Level Programming Tools Stéphane BIHAN, CAPS Stream Computing Workshop, Dec , KTH, Stockholm.

Introduction to CUDA Programming Introduction to OpenCL Andreas Moshovos Spring 2011 Based on:

Heterogeneous Computing using openCL lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.

© Wen-mei W. Hwu and John Stone, Urbana July 22, Illinois UPCRC Summer School 2010 The OpenCL Programming Model Part 2: Case Studies Wen-mei Hwu.

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL Ashwin M. Aji, Antonio J. Pena, Pavan Balaji and Wu-Chun Feng Virginia Tech and.

Matthew Royle Supervisor: Prof Shaun Bangay.  How do we implement OpenCL for CPUs  Differences in parallel architectures  Is our CPU implementation.

Heterogeneous Computing using openCL lecture 3 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.

Parallel Computing on Graphics Cards Keith Kelley, CS6260.

OpenCL. Sources Patrick Cozzi Spring 2011 NVIDIA CUDA Programming Guide CUDA by Example Programming Massively Parallel Processors.

Lecture 15 Introduction to OpenCL

GPGPU Architectures Martin Kruliš Martin Kruliš

An Introduction to OpenCL

Objective To Understand the OpenCL programming model

Command line arguments

An Introduction to GPU Computing

Patrick Cozzi University of Pennsylvania CIS Spring 2011

Краткое введение в программирование на языке OpenCL.

Lecture 11 – Related Programming Models: OpenCL

Antonio R. Miele Marco D. Santambrogio Politecnico di Milano

GPU Programming using OpenCL

© 2012 Elsevier, Inc. All rights reserved.

OpenCL introduction II.

OpenCL introduction.

OpenCL introduction III.

OpenCL introduction II.

© David Kirk/NVIDIA and Wen-mei W. Hwu,

Presentation transcript:

OpenCL

#include int main() { cl_platform_id platform[10]; cl_uint num_platforms; clGetPlatformIDs(10, platform, &num_platforms); cl_int clGetPlatformIDs (cl_uint numEntries, cl_platform_id *platforms, cl_uint *numPlatforms); numEntries размер массива, на который указывает platforms (0, если platforms == NULL); platforms массив для возврата информации о драйверах (NULL - не возвращать); numPlatforms возвращаемое количество драйверов OpenCL (NULL - не возвращать); OpenCL: получение списка драйверов, поддерживающих OpenCL

cl_device_id device; cl_uint num_gpgpu; clGetDeviceIDs(platform[0], CL_DEVICE_TYPE_GPU, 1, &device, &num_gpgpu); if (num_gpgpu == 0){ clGetDeviceIDs( platform[0], CL_DEVICE_TYPE_CPU, 1, &device, NULL); } OpenCL: обнаружение устройств cl_int clGetDeviceIDs (cl_platform_id platformID, cl_device_type nDeviceType, cl_uint nNumEntries, cl_device_id *pDevices, cl_uint *pnNumDevices); nDeviceType : CL_DEVICE_TYPE_CPU центральный процессор CL_DEVICE_TYPE_GPU видеокарта CL_DEVICE_TYPE_ACCELERATOR специализированный ускоритель CL_DEVICE_TYPE_DEFAULT устройство по молчанию в системе CL_DEVICE_TYPE_ALL все доступные устройства OpenCL

cl_context context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL); cl_command_queue queue = clCreateCommandQueue(context, device, 0, NULL); OpenCL: создание контекста и очереди на устройстве cl_context clCreateContext(const cl_context_properties * properties, cl_uint num_devices, const cl_device_id *devices, void (CL_CALLBACK *pfnNotify), void *userData, cl_int * errcode_ret); CL_CALLBACK *pfnNotify(const char *errinfo, const void *private_info, size_t cb, void *user_data) – функция обратного вызова cl_command_queue clCreateCommandQueue (cl_context context, cl_device_id device, cl_command_queue_properties properties, cl_int *errcode_ret) properties - список свойств очереди команд: можно ли выполнять команды не последовательно и разрешено ли профилирование команд

char *source = /* читаем из файла код kernel-функции */; cl_program program = clCreateProgramWithSource(context, 1, (const char **) &source, NULL, NULL); cl_int errcode = clBuildProgram(program, 1, &device, NULL, NULL, NULL); OpenCL: компиляция программы cl_program clCreateProgramWithSource (cl_context context, cl_uint count, const char **strings, const size_t *lengths, cl_int *errcode_ret) cl_int clBuildProgram (cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, void CL_CALLBACK *pfn_notify), void *user_data) CL_CALLBACK *pfn_notify(cl_program program, void *user_data) – функция обратного вызова

OpenCL: определение точки входа cl_kernel kernel = clCreateKernel(program, "kernel_function_name", NULL); __kernel void sum(__global int *x, int n, __global int *res) { uint i,j, id=get_global_id(0), sz = get_global_size(0); int s = 0; for (int i = id; i<n; i += sz) //for (int i = id*n/sz; i<(id+1)*n/sz; i++ ) s += x[i]; atomic_add(&res[0], s); //res[j] = s; } cl_kernel clCreateKernel (cl_program program, const char *kernel_name, cl_int *errcode_ret)

cl_mem x_gpu = clCreateBuffer( context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, n * sizeof(x[0]), x, NULL ); OpenCL: выделение памяти на ускорителе cl_mem clCreateBuffer (cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret) cl_mem_flags: CL_MEM_READ_WRITE чтение / запись CL_MEM_WRITE_ONLYтолько запись CL_MEM_READ_ONLYтолько запись CL_MEM_USE_HOST_PTRиспользование память в CPU напрямую CL_MEM_ALLOC_HOST_PTRвыделение памяти в CPU CL_MEM_COPY_HOST_PTRкопирование данных из CPU в GPU

clSetKernelArg (kernel, 0, sizeof(x_gpu), (void*) &x_gpu); clSetKernelArg(kernel, 1, sizeof(n), (void*) &n); OpenCL: инициализация аргументов kernel-функции cl_int clSetKernelArg (cl_kernel kernel, cl_uint arg_index, size_t arg_size, const void *arg_value) arg_index – порядковый номер параметра

OpenCL: добавление задачи в очередь size_t gridSize = 10; clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &gridSize, NULL, 0, NULL, NULL); clFinish(queue); // clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &gridSize, NULL, 0, NULL, &event); // clWaitForEvents(1, &event); cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *global_work_offset, const size_t *global_work_size, const size_t *local_work_size, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event) work_dim – размерность пространства Work-item (рабочих)

clEnqueueReadBuffer( queue, res_gpu, CL_TRUE, 0, res_size, res_cpu, 0, NULL, NULL); OpenCL: копирование данных из памяти ускорителя cl_int clEnqueueReadBuffer (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)

Проверка ошибок #define checkError(func) \ if (errcode != CL_SUCCESS){\ printf("Error in " #func "\nError code = %d\n", errcode);\ exit(1);\ } #define checkErrorEx(command) \ command; \ checkError(command);

OpenCL: замеры времени clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &gridSize, NULL, 0, NULL, &event); clWaitForEvents(1, &event); checkErrorEx (errcode = clGetEventProfilingInfo (event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL)); checkErrorEx (errcode = clGetEventProfilingInfo (event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL)); tgpu = (double)(time_end - time_start)/1e9; cl_int clGetEventProfilingInfo (cl_event event, cl_profiling_info param_name, size_t param_value_size, void *param_value, size_t param_value_size_ret)

Сборка программы gcc -std=c99 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include main.c -lOpenCL -O2 -o proga