Programming with CUDA WS 08/09 Lecture 6 Thu, 11 Nov, 2008.

Slides:

Advertisements

Similar presentations

Etter/Ingber Arrays and Matrices. Etter/Ingber One-Dimensional Arrays 4 An array is an indexed data structure 4 All variables stored in an array are of.

Advertisements

CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.

Intermediate GPGPU Programming in CUDA

INF5063 – GPU & CUDA Håkon Kvale Stensland iAD-lab, Department for Informatics.

Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,

Core Java Lecture 4-5. What We Will Cover Today What Are Methods Scope and Life Time of Variables Command Line Arguments Use of static keyword in Java.

1 ITCS 5/4145 Parallel computing, B. Wilkinson, April 11, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One.

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

CS179: GPU Programming Lecture 8: More CUDA Runtime.

 Open standard for parallel programming across heterogenous devices  Devices can consist of CPUs, GPUs, embedded processors etc – uses all the processing.

1 100M CUDA GPUs Oil & GasFinanceMedicalBiophysicsNumericsAudioVideoImaging Heterogeneous Computing CPUCPU GPUGPU Joy Lee Senior SW Engineer, Development.

Programming with CUDA WS 08/09 Lecture 7 Thu, 13 Nov, 2008.

Programming with CUDA WS 08/09 Lecture 8 Thu, 18 Nov, 2008.

Programming with CUDA WS 08/09 Lecture 12 Tue, 02 Dec, 2008.

COMP1180 Review Date: 4 March, 2009 Time: 10:30am - 12:20pm Venue: –CS students -- FSC801C and FSC801D –IS and other students -- OEE1017 Remarks: – 1)

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

Parallel Programming using CUDA. Traditional Computing Von Neumann architecture: instructions are sent from memory to the CPU Serial execution: Instructions.

Programming with CUDA WS 08/09 Lecture 5 Thu, 6 Nov, 2008.

Programming with CUDA WS 08/09 Lecture 9 Thu, 20 Nov, 2008.

CUDA (Compute Unified Device Architecture) Supercomputing for the Masses by Peter Zalutski.

CUDA and the Memory Model (Part II). Code executed on GPU.

Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

.NET Data types. Introduction ٭ A "data type" is a class that is primarily used just to hold data. ٭ This is different from most other classes since they're.

Nvidia CUDA Programming Basics Xiaoming Li Department of Electrical and Computer Engineering University of Delaware.

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

Variable, Expressions, Statements and Operators By: Engr. Faisal ur Rehman CE-105 Fall 2007.

CUDA Advanced Memory Usage and Optimization Yukai Hung Department of Mathematics National Taiwan University Yukai Hung

UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS Textures.

NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.

GPU Architecture and Programming

© John A. Stratton 2009 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 24: Advanced CUDA Feature Highlights April 21, 2009.

C# C1 CSC 298 Elements of C# code (part 1). C# C2 Style for identifiers  Identifier: class, method, property (defined shortly) or variable names  class,

OpenCL Programming James Perry EPCC The University of Edinburgh.

CS1372: HELPING TO PUT THE COMPUTING IN ECE CS1372 Some Basics.

School of Computer Science & Information Technology G6DICP - Lecture 4 Variables, data types & decision making.

Programming with CUDA WS 08/09 Lecture 10 Tue, 25 Nov, 2008.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

1 CSC103: Introduction to Computer and Programming Lecture No 24.

Parallel Programming Basics  Things we need to consider:  Control  Synchronization  Communication  Parallel programming languages offer different.

CSI 3125, Preliminaries, page 1 Data Type, Variables.

1 ITCS 4/5010 GPU Programming, B. Wilkinson, Jan 21, CUDATiming.ppt Measuring Performance These notes introduce: Timing Program Execution How to.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

1.2 Primitive Data Types and Variables

ADVANCED POINTERS. Overview Review on pointers and arrays Common troubles with pointers Multidimensional arrays Pointers as function arguments Functions.

Quick Summary C++/CLI Basics Data Types Controls Arrays In-class assignments.

Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.

Introduction to CUDA Programming Textures Andreas Moshovos Winter 2009 Some material from: Matthew Bolitho’s slides.

Introduction C# program is collection of classes Classes are collection of methods and some statements That statements contains tokens C# includes five.

Arrays An array is a sequence of objects all of which have the same type. The objects are called the elements of the array and are numbered consecutively.

Lecture 10 CUDA Instructions

Computer Engg, IIT(BHU)

The Machine Model Memory

CIS3931 – Intro to JAVA Lecture Note Set 2 17-May-05.

CUDA Programming Model

CS427 Multicore Architecture and Parallel Computing

GPU Memory Details Martin Kruliš by Martin Kruliš (v1.1)

Understand Computer Storage and Data Types

Methods and Parameters

Dr Shahriar Bijani Winter 2017

Variables, Loops, Decision Statements, etc

Local Variables, Global Variables and Variable Scope

Measuring Performance

Data Types Imran Rashid CTO at ManiWeber Technologies.

C (and C++) Pointers April 4, 2019.

Java Programming Language

C Language B. DHIVYA 17PCA140 II MCA.

6- General Purpose GPU Programming

Presentation transcript:

Programming with CUDA WS 08/09 Lecture 6 Thu, 11 Nov, 2008

Previously CUDA API CUDA API –Extensions to C Function & variable type qualifiers Function & variable type qualifiers Built-in variables Built-in variables –Compilation with NVCC Synchronization & optimization Synchronization & optimization CUDA Runtime Component CUDA Runtime Component

Today CUDA API CUDA API –CUDA Runtime Component

CUDA Runtime Component Common Component Common Component Device Component Device Component Host Component Host Component

CUDA Runtime Component Common Component Common Component Device Component Device Component Host Component Host Component

Common Runtime Component Used by both host and device Used by both host and device Built-in vector types Built-in vector types – char1, uchar1, char2, uchar2, char3, uchar3, char4, uchar4, short1, ushort1, short2, ushort2, short3, ushort3, short4, ushort4, int1, uint1, int2, uint2, int3, uint3, int4, uint4, long1, ulong1, long2, ulong2, long3, ulong3, long4, ulong4, float1, float2, float3, float4, double2 –Default constructors float a,b,c,d; float4 f4 = make_float4 (a,b,c,d); // f4.x=a f4.y=b f4.z=c f4.w=d

Common Runtime Component Built-in vector types Built-in vector types – dim3 Based on uint3 Based on uint3 Uninitialized values default to 1 Uninitialized values default to 1 Math functions Math functions –Full listing in Appendix B of programming guide –Single and double (>= 1.3) precision floating point functions

Common Runtime Component Timing: clock_t clock()‏ Timing: clock_t clock()‏ –number of clock ticks since program launch clock_t start = clock(); // execute A threadSynchronize(); clock_t end = clock(); double gap = (end – start) / CLOCKS_PER_SEC; –Measures clock cycles taken during execution of A, not necessarily actual execution time, which may be lesser because of time slicing

Common Runtime Component Textures Textures –GPU has texturing hardware for graphics output –Some part is supported by CUDA –Texture memory can be faster than global memory –Texture fetch = reading texture memory –One parameter to a texture fetch is a texture reference.

Common Runtime Component Texture reference Texture reference –Defines area of texture memory to be fetched –Bound by host to some memory region, termed the texture –Textures may overlap –May be 1D, 2D or 3D and thus use 1, 2 or 3 texture coordinates –Elements in fetched texture are called texels, “texture elements”.

Common Runtime Component Texture reference Texture reference –Declaration: texture texRef; – Type : texel type, int, float, vector type – Dim : dimensionality of the texture reference Optional, defaults to 1 Optional, defaults to 1 May be 1, 2 or 3 May be 1, 2 or 3

Common Runtime Component Texture reference Texture reference –Declaration: texture texRef; – ReadMode : convert fetched texels? cudaReadModeElementType no conversion cudaReadModeElementType no conversion cudaReadModeNormalizedFloat integer types converted to cudaReadModeNormalizedFloat integer types converted to – [0.0, 1.0] for unsinged integer – [-1.0, 1.0] for signed integer

Common Runtime Component Texture reference Texture reference –Declaration: texture texRef; – ReadMode : convert fetched texels? cudaReadModeElementType cudaReadModeElementType cudaReadModeNormalizedFloat cudaReadModeNormalizedFloat Optional, defaults to cudaReadModeElementType Optional, defaults to cudaReadModeElementType

Common Runtime Component Texture reference Texture reference –Can be unnormalized: [0, N)‏ Clamped Clamped –Can be normalized: [0, 1)‏ Independent of actual texture size Independent of actual texture size Clamped by default Clamped by default Also offers a “wrap” mode, good for periodic textures Also offers a “wrap” mode, good for periodic textures

Common Runtime Component Texture reference Texture reference –Declaration: texture texRef; –Linear texture filtering Returns interpolated value Returns interpolated value Type should be floating point Type should be floating point e.g. texture of size 4 normalized idx: 0, 0.25, 0.5, 0.75 with filtering: full range [0,1)‏ e.g. texture of size 4 normalized idx: 0, 0.25, 0.5, 0.75 with filtering: full range [0,1)‏

All for today Next time Next time –Device Runtime Component –Host Runtime Component –Performance & Optimization

On to exercises!