OpenCL Usman Roshan Department of Computer Science NJIT.

Slides:



Advertisements
Similar presentations
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.
Advertisements

CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.
CUDACL: A Tool for CUDA and OpenCL Programmers Ferosh Jacob 1, David Whittaker 2, Sagar Thapaliya 2, Purushotham Bangalore 2, Marjan Memik 32, and Jeff.
GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
Intermediate GPGPU Programming in CUDA
INF5063 – GPU & CUDA Håkon Kvale Stensland iAD-lab, Department for Informatics.
Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.
1 ITCS 5/4145 Parallel computing, B. Wilkinson, April 11, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One.
GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
Instructor Notes This lecture discusses three important optimizations The performance impact of mapping threads to data on the GPU is subtle but extremely.
GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group.
OpenCL Peter Holvenstot. OpenCL Designed as an API and language specification Standards maintained by the Khronos group  Currently 1.0, 1.1, and 1.2.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
GPU Programming EPCC The University of Edinburgh.
An Introduction to Programming with CUDA Paul Richmond
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
The Open Standard for Parallel Programming of Heterogeneous systems James Xu.
GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
First CUDA Program. #include "stdio.h" int main() { printf("Hello, world\n"); return 0; } #include __global__ void kernel (void) { } int main (void) {
Open CL Hucai Huang. Introduction Today's computing environments are becoming more multifaceted, exploiting the capabilities of a range of multi-core.
High Performance Computing with GPUs: An Introduction Krešimir Ćosić, Thursday, August 12th, LSST All Hands Meeting 2010, Tucson, AZ GPU Tutorial:
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
CIS 565 Fall 2011 Qing Sun
GPU Architecture and Programming
CUDA - 2.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
OpenCL Programming James Perry EPCC The University of Edinburgh.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Portability with OpenCL 1. High Performance Landscape High Performance computing is trending towards large number of cores and accelerators It would be.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.
OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.
Contemporary Languages in Parallel Computing Raymond Hummel.
Martin Kruliš by Martin Kruliš (v1.0)1.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.
CS 732: Advance Machine Learning
© David Kirk/NVIDIA and Wen-mei W. Hwu, CS/EE 217 GPU Architecture and Programming Lecture 2: Introduction to CUDA C.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE 8823A GPU Architectures Module 2: Introduction.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Instructor Notes This is a straight-forward lecture. It introduces the OpenCL specification while building a simple vector addition program The Mona Lisa.
Introduction to CUDA Programming Introduction to OpenCL Andreas Moshovos Spring 2011 Based on:
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
OpenCL. Sources Patrick Cozzi Spring 2011 NVIDIA CUDA Programming Guide CUDA by Example Programming Massively Parallel Processors.
Computer Engg, IIT(BHU)
Prof. Zhang Gang School of Computer Sci. & Tech.
Objective To Understand the OpenCL programming model
An Introduction to GPU Computing
Patrick Cozzi University of Pennsylvania CIS Spring 2011
General Programming on Graphical Processing Units
General Programming on Graphical Processing Units
GPU Lab1 Discussion A MATRIX-MATRIX MULTIPLICATION EXAMPLE.
Presentation transcript:

OpenCL Usman Roshan Department of Computer Science NJIT

OpenCL Universal language for parallel programming Increasing usage in GPU computing Pros: your GPU program will run not just on NVIDIA but other GPUs as well (such as AMD) Cons: not as easy to program in as CUDA

SimpleOpenCL Open source API for writing OpenCL programs Main challenge in OpenCL programs is the setup SimpleOpenCL provides simple functions for setting up the GPU

Strategy to convert Chi2 in CUDA to OpenCL Define blocks and threads – With arrays global_work_size[2] and local_work_size[2] – global_work_size[0] = BLOCKS * THREADS; – global_work_size[1] = 1; – local_work_size[0] = THREADS; – local_work_size[1] = 1; Initialize hardware – hardware = sclGetAllHardware(&found); – sclPrintHardwareStatus(*hardware); Initialize software – software = sclGetCLSoftware(OPENCL_KERNEL_FILE, ”name_of_kernel_function", hardware[0]);

CUDA to OpenCL Device arrays defined with cl_mem Replace cudamalloc with – dev_results_clmem = sclMalloc( hardware[0], CL_MEM_READ_WRITE, size * sizeof(float) ); To write to GPU memory replace cudamemcpy with – sclWrite( hardware[0], size * sizeof(unsigned char), dev_dataT_clmem, (void*) dataT ); To read from GPU memory replace cudamemcpy with – sclRead( hardware[0], cols * sizeof(float), results_clmem, host_results );

CUDA to OpenCL Replace kernel call by first setting kernel parameters – sclSetKernelArg( software, 0, sizeof(uint), &var) – sclSetKernelArg( software, 1, sizeof(cl_mem), (void*) &dev_var_clmem) – sclSetKernelArg( software, 2, sizeof(cl_mem), (void*) &dev_const_var_clmem) Then call the kernel with – sclLaunchKernel( hardware[0], software, global_work_size, local_work_size );

Modifications to GPU kernel code Use __kernel to define kernel function Use __global and __local for global and local variables Use __constant for constant memory definitions Get thread id with get_global_id(0);