GPGPU Labor 8.. CLWrapper OpenCL Framework primitívek – ClWrapper(cl_device_type _device_type); – cl_device_id device_id() – cl_context context() – cl_command_queue.

Slides:



Advertisements
Similar presentations
Florida State UniversityCOP5570 – Advanced Unix Programming IPC mechanisms Pipes Sockets System V IPC –Message Queues –Semaphores –Shared Memory.
Advertisements

GPGPU labor XIII. Folyadék szimuláció. Kezdeti teendők Tantárgy honlapja, Folyadék szimuláció A labor kiindulási alapjának letöltése (lab13_base.zip),
GPGPU labor VIII. OpenCL bevezetés. Kezdeti teendők Tantárgy honlapja, OpenCL bevezetés II. A labor kiindulási alapjának letöltése (lab8_base.zip), kitömörítés.
For(int i = 1; i
Factorial Preparatory Exercise #include using namespace std; double fact(double); int fact(int); int main(void) { const int n=20; ofstream my_file("results.txt");
Chapter 6 Advanced Function Features Pass by Value Pass by Reference Const parameters Overloaded functions.
GPU Acceleration in ITK v4
CS 450 Module R4. R4 Overview Due on March 11 th along with R3. R4 is a small yet critical part of the MPX system. In this module, you will add the functionality.
File and I/O system calls int open(const char* path, int flags, mode_t modes) int creat(const char *path, mode_t mode) ssize_t read(int fd, void *buf,
Templated Functions. Overloading vs Templating  Overloaded functions allow multiple functions with the same name.
OpenCL™ - Parallel computing for CPUs and GPUs Benedict R. Gaster AMD Products Group Lee Howes Office of the CTO.
APARAPI Java™ platform’s ‘Write Once Run Anywhere’ ® now includes the GPU Gary Frost AMD PMTS Java Runtime Team.
1 Class Vehicle #include #define N 10../.. 2 Class Vehicle class vehicle { public: float speed; char colour[N+1]; char make[N+1];
Tinaliah, S. Kom.. * * * * * * * * * * * * * * * * * #include using namespace std; void main () { for (int i = 1; i
Triana Elizabeth, S.Kom. #include using namespace std; void main () { for (int i = 1; i
Fork and Exec Unix Model Tutorial 3. Process Management Model The Unix process management model is split into two distinct operations : 1. The creation.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
The Open Standard for Parallel Programming of Heterogeneous systems James Xu.
Instructor Notes This is a brief lecture which goes into some more details on OpenCL memory objects Describes various flags that can be used to change.
Many-SC Programming Model Jaejin Lee Center for Manycore Programming Seoul National University
Real-time Systems Lab, Computer Science and Engineering, ASU Linux Input Systems (ESP – Fall 2014) Computer Science & Engineering Department Arizona State.
Open CL Hucai Huang. Introduction Today's computing environments are becoming more multifaceted, exploiting the capabilities of a range of multi-core.
Computer Graphics Ken-Yi Lee National Taiwan University.
OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT
AMD-SPL Runtime Programming Guide Jiawei. Outline.
Instructor Notes This is a brief lecture which goes into some more details on OpenCL memory objects Describes various flags that can be used to change.
OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson,
Presenter : kilroy 1. Introduction Experiment 1 - Simulate a virtual disk device Experiment 2 - Nand-flash simulation for wear leveling algo. Conclusion.
FIGURE 11.1 Mapping between OpenCL and CUDA data parallelism model concepts. KIRK CH:11 “Programming Massively Parallel Processors: A Hands-on Approach.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Feb 28, 2013, OpenCL.ppt OpenCL These notes will introduce OpenCL.
OpenCL Sathish Vadhiyar Sources: OpenCL quick overview from AMD OpenCL learning kit from AMD.
Instructor Notes Discusses synchronization, timing and profiling in OpenCL Coarse grain synchronization covered which discusses synchronizing on a command.
OpenCL Programming James Perry EPCC The University of Edinburgh.
Portability with OpenCL 1. High Performance Landscape High Performance computing is trending towards large number of cores and accelerators It would be.
OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.
Process Management Azzam Mourad COEN 346.
Threads A thread is an alternative model of program execution
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.
© 2009 IBM Corporation Tianhong Wang 2012/06/28 Android’s udev —— Vold.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
Heterogeneous Computing with OpenCL Dr. Sergey Axyonov.
Kelly Davis Monitoring and Event System Kelly Davis AEI.
Current Assignments Project 3 has been posted, due next Tuesday. Write a contact manager. Homework 6 will be posted this afternoon and will be due Friday.
OpenCL. #include int main() { cl_platform_id platform[10]; cl_uint num_platforms; clGetPlatformIDs(10, platform, &num_platforms); cl_int clGetPlatformIDs.
Introduction to CUDA Programming Introduction to OpenCL Andreas Moshovos Spring 2011 Based on:
Heterogeneous Computing using openCL lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
Matthew Royle Supervisor: Prof Shaun Bangay.  How do we implement OpenCL for CPUs  Differences in parallel architectures  Is our CPU implementation.
Heterogeneous Computing using openCL lecture 3 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
Lecture 15 Introduction to OpenCL
Predefined Macros (examples)
Objective To Understand the OpenCL programming model
Linux 202 Training Module Program and Process.
Lecture 11 – Related Programming Models: OpenCL
Managing Human Resources and Labor Relations.
Tips Need to Consider When Organizing a College Event
Fork and Exec Unix Model
Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos
null, true, and false are also reserved.
ماجستير إدارة المعارض من بريطانيا
Putting the I in IoT.
Honnappa Nagarahalli Principal Software Engineer Arm
Introduction to Programming
Default Arguments.
© 2012 Elsevier, Inc. All rights reserved.
Function Overloading.
OpenCL introduction.
OpenCL introduction III.
© David Kirk/NVIDIA and Wen-mei W. Hwu,
Inline Functions.
Presentation transcript:

GPGPU Labor 8.

CLWrapper OpenCL Framework primitívek – ClWrapper(cl_device_type _device_type); – cl_device_id device_id() – cl_context context() – cl_command_queue cqueue() – char* getPlatformInfo(cl_platform_info paramName) – void* getDeviceInfo(cl_device_info paramName) – cl_program createProgram(const char* fileName) – cl_kernel createKernel(cl_program program, const char* kernelName) – void printOpenCLInfo()

Kernel futási idő void printTimeStats(cl_event event){ cl_int err = CL_SUCCESS; if(event == NULL) { std::cerr << "No event object returned!" << std::endl; } else { clWaitForEvents(1, &event); } cl_ulong execStart, execEnd; err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &execStart, NULL); err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &execEnd, NULL); std::cout << "[start] " << execStart << " [end] " << execEnd << " [time] " << (execEnd - execStart) / 1e+06 << "ms." << std::endl; }

Párhuzamos primitívek Map Reduce Scan Histogram Compact

Map // TODO // // ID := get_global_id(0) // data[ID] := square(data[ID]) __kernel void map(__global float* data) { }

Reduce // TODO // // ID := get_global_id(0) // // FOR s = get_global_size(0) / 2 ; s > 0 ; s >>= 1 DO: // IF (ID < s) // data[ID] = max(data[ID], data[ID + s]) // BARRIER // __kernel void reduce_global(__global float* data) { }

Scan (exclusive) // TODO // // ID := get_global_id(0) // IF ID > 0 THEN data[ID] = data[ID - 1] // ELSE data[ID] = 0 // BARRIER // // FOR s = 1; s < get_global_size(0); s *= 2 DO: // tmp := data[ID] // IF ( ID + s < get_global_size(0) THEN // data[ID + s] += tmp; // BARRIER // // IF(ID = 0) THEN data[ID] = 0; __kernel void exscan_global(__global int* data) { }

Histogram // TODO // // histogram[data[id]] := histogram[data[id]] + 1 // // SYNCHRONIZATION! __kernel void histogram_global(__global int* data, __global int* histogram) { }

Histogram (lokális) Lokális memória allokálása host oldalról clSetKernelArg(histogramLocalKernel, 0, sizeof(cl_mem), &gData); clSetKernelArg(histogramLocalKernel, 1, sizeof(cl_mem), &gHist); clSetKernelArg(histogramLocalKernel, 2, sizeof(int) * histogramSize, NULL); clSetKernelArg(histogramLocalKernel, 3, sizeof(int), &histogramSize);

Histogram // TODO // // ID := get_global_id(0) // LID := get_local_id(0) // // IF LID < histogramSize DO: // lhistogram[LID] := 0 // BARRIER // // Add data to local histogram // // BARRIER // // IF LID < histogramSize DO: // histogram[LID] = lhistogram[LID] __kernel void histogram_local(__global int* data, __global int* histogram, __local int* lhistogram, const int histogramSize) { }

Compact // TODO // ID := get_global_id(0) // IF data[id] < 50 THEN // predicate = 1 // ELSE // predicate = 0 __kernel void compact_predicate(__global int* data, __global int* pred) { }

Compact // TODO // // exclusive scan pred to prefSum __kernel void compact_exscan(__global int* pred, __global int* prefSum) { }

Compact // TODO // // ID := get_global_id(0) // VALUE := data[ID] // BARRIER // IF pred[ID] == 1 THEN // data[prefSum[ID]] = VALUE __kernel void compact_compact(__global int* data, __global int* pred, __global int* prefSum) { }