AMD-SPL Runtime Programming Guide Jiawei. Outline.

Slides:



Advertisements
Similar presentations
Lecture Computer Science I - Martin Hardwick The Programming Process rUse an editor to create a program file (source file). l contains the text of.
Advertisements

What's new in Microsoft Visual C Preview
Templated Functions. Overloading vs Templating  Overloaded functions allow multiple functions with the same name.
1 DOS vs. UNIX files Ending lines with “\r\n” vs. “\n” Reading an entire line at a time getline() To skip white space or not cin >> ch; vs. ch = cin.get();
CS 4800 By Brandon Andrews.  Specifications  Goals  Applications  Design Steps  Testing.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
1 Lecture-4 Chapter 2 C++ Syntax and Semantics, and the Program Development Process Dale/Weems/Headington.
1 Chapter 8 Scope, Lifetime, and More on Functions Dale/Weems/Headington.
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
From C++ to C#. Web programming The course is on web programming using ASP.Net and C# The course is on web programming using ASP.Net and C# ASP.Net is.
KEAN UNIVERSITY Visual C++ Dr. K. Shahrabi. Developer studio Is a self-contain environment for creating, compiling, linking and testing windows program.
POSIX: Files Introduction to Operating Systems: Discussion 1 Read Solaris System Interface Guide: Ch. 5.1 Basic File I/O.
Functions in C. Function Terminology Identifier scope Function declaration, definition, and use Parameters and arguments Parameter order, number, and.
Overview What is SQL Server? Creating databases Administration Security Backup.
1 Operator Overloading in C++ Copyright Kip Irvine, All rights reserved. Only students enrolled in COP 4338 at Florida International University may.
Stream Handling Streams - means flow of data to and from program variables. - We declare the variables in our C++ for holding data temporarily in the memory.
CS 192 Lecture 3 Winter 2003 December 5, 2003 Dr. Shafay Shamail.
Computing IV Visual C Introduction with OpenCV Example Xinwen Fu.
1 Chapter 9 Scope, Lifetime, and More on Functions.
FINAL MPX DELIVERABLE Due when you schedule your interview and presentation.
Introduction to C++ Systems Programming. Systems Programming: Introduction to C++ 2 Systems Programming: 2 Introduction to C++  Syntax differences between.
OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT
Program A computer program (also software, or just a program) is a sequence of instructions written in a sequence to perform a specified task with a computer.
1 Chapter 8 Scope, Lifetime, and More on Functions Dale/Weems/Headington.
Object-Oriented Programming in C++
1 Programs Composed of Several Functions Syntax Templates Legal C++ Identifiers Assigning Values to Variables Declaring Named Constants String Concatenation.
Computer Science and Software Engineering University of Wisconsin - Platteville 2. Pointer Yan Shi CS/SE2630 Lecture Notes.
Rossella Lau Lecture 1, DCO10105, Semester B, DCO10105 Object-Oriented Programming and Design  Lecture 1: Introduction What this course is about:
Lecture 11 Dynamic link libraries. Differences between static libraries and DLLs In static library code is added to the executable. In DLL, the code is.
Data & Data Types & Simple Math Operation 1 Data and Data Type Standard I/O Simple Math operation.
Copyright  Hannu Laine C++-programming Part 1 Hannu Laine.
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
Copyright © Curt Hill Generic Classes Template Classes or Container Classes.
10/29/2015 Asst.Prof.Muhammed Cinsdikici 1 Network Programming UBI 510 Chapter 1.
Introduction to C++ Version 1.1. Topics C++ Structure Primitive Data Types I/O Casting Strings Control Flow.
111 Introduction to OGRE3D Programming: Main Loop.
C++ Classes and Data Structures Jeffrey S. Childs
1 Chapter 2 C++ Syntax and Semantics, and the Program Development Process.
Lecture 19 CIS 208 Wednesday, April 06, Welcome to C++ Basic program style and I/O Class Creation Templates.
Khalid Rasheed Shaikh Computer Programming Theory 1.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Nachos Overview Lecturer: Hao-Hua Chu TA: Chun-Po Wang (Artoo) Date: 2008/09/18 Material Provided by Yuan-Hao Chang, Yung-Feng Lu.
C++ / G4MICE Course Session 1 - Introduction Edit text files in a UNIX environment. Use the g++ compiler to compile a single C++ file. Understand the C++
Chapter 3 Functions. 2 Overview u 3.2 Using C++ functions  Passing arguments  Header files & libraries u Writing C++ functions  Prototype  Definition.
1 Getting Started with C++ Part 2 Linux. 2 Getting Started on Linux Now we will look at Linux. See how to copy files between Windows and Linux Compile.
1 What is a Named Constant? A named constant is a location in memory that we can refer to by an identifier, and in which a data value that cannot be changed.
EEL 3801 C++ as an Enhancement of C. EEL 3801 – Lotzi Bölöni Comments  Can be done with // at the start of the commented line.  The end-of-line terminates.
COMP 3438 – Part I - Lecture 5 Character Device Drivers
Open project in Microsoft Visual Studio → build program in “Release” mode.
.NET Mobile Application Development XML Web Services.
1 Scope Lifetime Functions (the Sequel) Chapter 8.
My Coordinates Office EM G.27 contact time:
C++ Programming Michael Griffiths Corporate Information and Computing Services The University of Sheffield
Solvency II Tripartite template V2 and V3 Presentation of the conversion tools proposed by FundsXML France.
Chapter 15 - C++ As A "Better C"
Chapter 1.2 Introduction to C++ Programming
Chapter 13 Introduction to C++ Language
Test 2 Review Outline.
Chapter 1.2 Introduction to C++ Programming
Chapter 1.2 Introduction to C++ Programming
Chapter 1.2 Introduction to C++ Programming
Introduction to C++ Systems Programming.
C Basics.
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
Lecture 5: Process Creation
System Structure and Process Model
Lecture Topics: 11/1 General Operating System Concepts Processes
Let’s start from the beginning
SPL – PS1 Introduction to C++.
Presentation transcript:

AMD-SPL Runtime Programming Guide Jiawei

Outline

The Core of SPL Encapsulation Resource management Workflow controlOptimization Based on CAL

Goal Overcome limitations of Brook+ Provide friendly programming interface for CAL Support the development of SPL

What is in SPL Runtime SPL Runtime Program Management Buffer Management Device Management

Outline

Pre-Requirements Visual Studio 2005 AMD Stream SDK 1.4 beta AMD-SPL 1.0 beta or higher Windows …… Linux

Add Include Directories Add include path in VS2005 –CAL: “$(CALROOT)\include\” –SPL: “$(SPLROOT)\include\” –Runtime: “$(SPLROOT)\include\core\cal” Note: $(SPLROOT) is the root folder of SPL

Add Library Directories Add library directories in VS2005 –CAL: “$(CALROOT)\lib\lh32\”Vista 32bit “$(CALROOT)\lib\lh64\”Vista 64bit “$(CALROOT)\lib\xp32\”XP 32bit “$(CALROOT)\lib\xp64\”XP 64bit –SPL “$(SPLROOT)\lib Note: $(SPLROOT) is the root folder of SPL

Add Library Dependencies Add additional dependencies in VS2005 –CAL: aticalrt.lib aticalcl.lib –SPL: amd-spl_d.libDebug version amd-spl.libRelease version

Header and Namespaces Include proper header files –#include “cal.h”CAL header –#include “amdspl.h”SPL header –#include “RuntimeDefs.h”Runtime header Using namespaces –using namespace amdspl; –using namespace amdspl::core::cal;

DEFINE THE IL KERNEL

Code in IL AMD Stream Kernel Analyzer Generate IL from Brook+ kernel Easier to program Difficult to maintain and optimize Write IL manually Difficult to program and understand Easier to optimize Provide more GPU features

IL Kernel Sample kernel void k(out float o<>, float i<>, float c) { o = i + c; } kernel void k(out float o<>, float i<>, float c) { o = i + c; } il_ps_2_0 dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float) _fmtw(float) dcl_input_position_interp(linear_noperspective) v0.xy__ dcl_cb cb0[1] sample_resource(0)_sampler(0) r1, v0.xy00 add o0, r1, cb0[0] endmain end il_ps_2_0 dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float) _fmtw(float) dcl_input_position_interp(linear_noperspective) v0.xy__ dcl_cb cb0[1] sample_resource(0)_sampler(0) r1, v0.xy00 add o0, r1, cb0[0] endmain end The Brook+ kernel equivalent:

IL Source String const char * __sample_program_src__ = "il_ps_2_0\n" "dcl_output_generic o0\n" "dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmt z(float)_fmtw(float)\n" "dcl_input_position_interp(linear_noperspective) v0.xy__\n" "dcl_cb cb0[1]\n" "sample_resource(0)_sampler(0) r1, v0.xy00\n" "add o0, r1, cb0[0]\n" "endmain\n" "end\n"; const char * __sample_program_src__ = "il_ps_2_0\n" "dcl_output_generic o0\n" "dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmt z(float)_fmtw(float)\n" "dcl_input_position_interp(linear_noperspective) v0.xy__\n" "dcl_cb cb0[1]\n" "sample_resource(0)_sampler(0) r1, v0.xy00\n" "add o0, r1, cb0[0]\n" "endmain\n" "end\n";

Kernel Information Define the kernel using template class ProgramInfo –Kernel Parameters –ID of the Kernel –Source of the Kernel template <int outputsT, int inputsT = 0, int constantsT = 0, bool globalsT = false> class ProgramInfo { ProgramInfo(const char* ID, const char* source) {...}... }; template <int outputsT, int inputsT = 0, int constantsT = 0, bool globalsT = false> class ProgramInfo { ProgramInfo(const char* ID, const char* source) {...}... };

Define the IL Kernel in SPL Define a global object for the kernel typedef ProgramInfo SampleProgram; SampleProgram sampleProgInfo = SampleProgram("Sample Program", __sample_program_src__); typedef ProgramInfo SampleProgram; SampleProgram sampleProgInfo = SampleProgram("Sample Program", __sample_program_src__);

INITIALIZE SPL RUNTIME

Initialize SPL Runtime Get runtime instance Get device manager Get buffer manager Get program manager Runtime *runtime = Runtime::getInstance(); assert(runtime); DeviceManager *devMgr = runtime->getDeviceManager(); assert(devMgr); BufferManager *bufMgr = runtime->getBufferManager(); assert(bufMgr); ProgramManager* progMgr = runtime->getProgramManager(); assert(progMgr); Runtime *runtime = Runtime::getInstance(); assert(runtime); DeviceManager *devMgr = runtime->getDeviceManager(); assert(devMgr); BufferManager *bufMgr = runtime->getBufferManager(); assert(bufMgr); ProgramManager* progMgr = runtime->getProgramManager(); assert(progMgr);

Assign Device to SPL bool r; r = devMgr->assignDevice(0); assert(r); bool r; r = devMgr->assignDevice(0); assert(r); Assign device to device manager The device manager will handle device initialization and destroy. SPL cannot access device which is not assigned to it

DO GPGPU COMPUTING

Initialize CPU Buffer void fillBuffer(float buf[], int size) { for (int i = 0;i < size; i++) { buf[i] = (float)i; } float *cpuInBuf = new float[1024 * 512]; float *cpuOutBuf = new float[1024 * 512]; float constant = 3; fillBuffer(cpuInBuf, 1024 * 512); void fillBuffer(float buf[], int size) { for (int i = 0;i < size; i++) { buf[i] = (float)i; } float *cpuInBuf = new float[1024 * 512]; float *cpuOutBuf = new float[1024 * 512]; float constant = 3; fillBuffer(cpuInBuf, 1024 * 512);

Get Device Get the default device Get device by ID Device* device = devMgr->getDefaultDevice(); Device* device = devMgr->getDeviceByID(0); OR

Load Program Load the program using program manager –Pass in a ProgramInfo instance Program *prog = progMgr->loadProgram(sampleProgInfo); assert(prog); Program *prog = progMgr->loadProgram(sampleProgInfo); assert(prog);

Create Buffers Create local buffer for input Create remote buffer for output Get constant buffer from constant buffer pool Buffer* inBuf = bufMgr-> createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512); assert(inBuf); Buffer* outBuf = bufMgr->createRemoteBuffer( CAL_FORMAT_FLOAT_1, 1024, 512); assert(outBuf); ConstBuffer* constBuf = bufMgr->getConstBuffer(1); assert(constBuf); Buffer* inBuf = bufMgr-> createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512); assert(inBuf); Buffer* outBuf = bufMgr->createRemoteBuffer( CAL_FORMAT_FLOAT_1, 1024, 512); assert(outBuf); ConstBuffer* constBuf = bufMgr->getConstBuffer(1); assert(constBuf);

CPU to GPU Data Transfer Read in CPU buffer Set Constant bool r; r = inBuf->readData(cpuInBuf, 1024 * 512); assert(r); bool r; r = inBuf->readData(cpuInBuf, 1024 * 512); assert(r); r = constBuf->setConstant (&constant); assert(r); r = constBuf->setConstant (&constant); assert(r);

Bind Buffers Bind buffers to the program –Input, Output, Constant, Global r = prog->bindOutput(outBuf, 0); assert(r); r = prog->bindInput(inBuf, 0); assert(r); r = prog->bindConstant(constBuf, 0); assert(r); r = prog->bindOutput(outBuf, 0); assert(r); r = prog->bindInput(inBuf, 0); assert(r); r = prog->bindConstant(constBuf, 0); assert(r);

Execute Program Define the execution domain Run program Check the execution event CALdomain domain = {0, 0, 1024, 512}; Event *e = prog->run(domain); assert(e); CALdomain domain = {0, 0, 1024, 512}; Event *e = prog->run(domain); assert(e);

GPU to CPU Data Transfer Write in CPU buffer r = outBuf->writeData(cpuOutBuf, 1024 * 512); assert(r); r = outBuf->writeData(cpuOutBuf, 1024 * 512); assert(r);

RELEASE RESOURCE AND CLEAN UP

Unload Program Destroy program object –Unbind all the buffers Call Program::unbindAllBuffers(); –Unload module from context progMgr->unloadProgram(prog);

Destroy/Release Buffers Destroy buffers –InputBuffer, OutputBuffer Release ConstBuffer to the pool bufMgr->destroyBuffer(inBuf); bufMgr->destroyBuffer(outBuf); bufMgr->releaseConstBuffer(constBuf); bufMgr->destroyBuffer(inBuf); bufMgr->destroyBuffer(outBuf); bufMgr->releaseConstBuffer(constBuf);

Shutdown Runtime Not necessary! –Runtime will be destroy when application exits. Runtime::destroy();

The Whole Program #include "cal.h" #include "amdspl.h" #include "RuntimeDefs.h" using namespace amdspl; using namespace amdspl::core::cal; void fillBuffer(float buf[], int size) { for (int i = 0;i < size; i++) { buf[i] = (float)i; } #include "cal.h" #include "amdspl.h" #include "RuntimeDefs.h" using namespace amdspl; using namespace amdspl::core::cal; void fillBuffer(float buf[], int size) { for (int i = 0;i < size; i++) { buf[i] = (float)i; }

The Whole Program const char *__sample_program_src__ = "il_ps_2_0\n" "dcl_output_generic o0\n" "dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_f mtw(float)\n" "dcl_input_position_interp(linear_noperspective) v0.xy__\n" "dcl_cb cb0[1]\n" "sample_resource(0)_sampler(0) r1, v0.xy00\n" "add o0, r1, cb0[0]\n" "endmain\n" "end\n"; typedef ProgramInfo SampleProgram; SampleProgram sampleProgInfo = SampleProgram("Sample Program", __sample_program_src__); const char *__sample_program_src__ = "il_ps_2_0\n" "dcl_output_generic o0\n" "dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_f mtw(float)\n" "dcl_input_position_interp(linear_noperspective) v0.xy__\n" "dcl_cb cb0[1]\n" "sample_resource(0)_sampler(0) r1, v0.xy00\n" "add o0, r1, cb0[0]\n" "endmain\n" "end\n"; typedef ProgramInfo SampleProgram; SampleProgram sampleProgInfo = SampleProgram("Sample Program", __sample_program_src__);

The Whole Program int main(void) { float *cpuInBuf = new float[1024 * 512]; float *cpuOutBuf = new float[1024 * 512]; float constant = 3; fillBuffer(cpuInBuf, 1024 * 512); Runtime *runtime = Runtime::getInstance(); DeviceManager *devMgr = runtime->getDeviceManager(); BufferManager *bufMgr = runtime->getBufferManager(); ProgramManager* progMgr = runtime->getProgramManager(); devMgr->assignDevice(0); Device* device = devMgr->getDefaultDevice(); int main(void) { float *cpuInBuf = new float[1024 * 512]; float *cpuOutBuf = new float[1024 * 512]; float constant = 3; fillBuffer(cpuInBuf, 1024 * 512); Runtime *runtime = Runtime::getInstance(); DeviceManager *devMgr = runtime->getDeviceManager(); BufferManager *bufMgr = runtime->getBufferManager(); ProgramManager* progMgr = runtime->getProgramManager(); devMgr->assignDevice(0); Device* device = devMgr->getDefaultDevice();

The Whole Program Program *prog = progMgr->loadProgram(sampleProgInfo); Buffer* inBuf = bufMgr->createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512); Buffer* outBuf = bufMgr->createRemoteBuffer(CAL_FORMAT_FLOAT_1, 1024, 512); ConstBuffer* constBuf = bufMgr->getConstBuffer(1); inBuf->readData(cpuInBuf, 1024 * 512); constBuf->setConstant (&constant); prog->bindOutput(outBuf, 0); prog->bindInput(inBuf, 0); prog->bindConstant(constBuf, 0); CALdomain domain = {0, 0, 1024, 512}; Event *e = prog->run(domain); r = outBuf->writeData(cpuOutBuf, 1024 * 512); Program *prog = progMgr->loadProgram(sampleProgInfo); Buffer* inBuf = bufMgr->createLocalBuffer(device, CAL_FORMAT_FLOAT_1, 1024, 512); Buffer* outBuf = bufMgr->createRemoteBuffer(CAL_FORMAT_FLOAT_1, 1024, 512); ConstBuffer* constBuf = bufMgr->getConstBuffer(1); inBuf->readData(cpuInBuf, 1024 * 512); constBuf->setConstant (&constant); prog->bindOutput(outBuf, 0); prog->bindInput(inBuf, 0); prog->bindConstant(constBuf, 0); CALdomain domain = {0, 0, 1024, 512}; Event *e = prog->run(domain); r = outBuf->writeData(cpuOutBuf, 1024 * 512);......

The Entire Program..... progMgr->unloadProgram(prog); bufMgr->destroyBuffer(inBuf); bufMgr->destroyBuffer(outBuf); bufMgr->releaseConstBuffer(constBuf); Runtime::destroy(); delete [] cpuInBuf; delete [] cpuOutBuf; return 0; }..... progMgr->unloadProgram(prog); bufMgr->destroyBuffer(inBuf); bufMgr->destroyBuffer(outBuf); bufMgr->releaseConstBuffer(constBuf); Runtime::destroy(); delete [] cpuInBuf; delete [] cpuOutBuf; return 0; }

THANK YOU!