1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2013, 3-DBlocks.ppt Addressing 2-D grids with 3-D blocks Class Discussion Notes.

Slides:

Advertisements

Similar presentations

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2011 GPUMemories.ppt GPU Memories These notes will introduce: The basic memory hierarchy.

Advertisements

1 ITCS 5/4145 Parallel computing, B. Wilkinson, April 11, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One.

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

More on threads, shared memory, synchronization

Introduction to CUDA Programming Histograms and Sparse Array Multiplication Andreas Moshovos Winter 2009 Based on documents from: NVIDIA & Appendix A of.

Tutorial on Distributed High Performance Computing 14:30 – 19:00 (2:30 pm – 7:00 pm) Wednesday November 17, 2010 Jornadas Chilenas de Computación 2010.

GPU PROGRAMMING David Gilbert California State University, Los Angeles.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, March 22, 2011 Branching.ppt Control Flow These notes will introduce scheduling control-flow.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, March 5, 2011, 3-DBlocks.ppt Addressing 2-D grids with 3-D blocks Class Discussion Notes.

CUDA Grids, Blocks, and Threads

© David Kirk/NVIDIA and Wen-mei W. Hwu, , SSL 2014, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

GPU Programming EPCC The University of Edinburgh.

An Introduction to Programming with CUDA Paul Richmond

More CUDA Examples. Different Levels of parallelism Thread parallelism – each thread is an independent thread of execution Data parallelism – across threads.

CUDA Programming. Floating Point Operations for the CPU and the GPU.

Introduction to CUDA Programming CUDA Programming Introduction Andreas Moshovos Winter 2009 Some slides/material from: UIUC course by Wen-Mei Hwu and David.

Introduction to CUDA 2 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2013.

CIS 565 Fall 2011 Qing Sun

CUDA programming (continue) Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395: CUDA Lecture 5 Memory coalescing (from.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Control Flow.

1 ECE 8823A GPU Architectures Module 3: CUDA Execution Model © David Kirk/NVIDIA and Wen-mei Hwu, ECE408/CS483/ECE498al, University of Illinois,

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 22, 2013 MemCoalescing.ppt Memory Coalescing These notes will demonstrate the effects.

1 ECE 8823A GPU Architectures Module 5: Execution and Resources - I.

Today’s lecture 2-Dimensional indexing Color Format Thread Synchronization within for- loops Shared Memory Tiling Review example programs Using Printf.

Lecture 6: Shared-memory Computing with GPU. Free download NVIDIA CUDA a-downloads CUDA programming on visual studio.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 ECE498AL Lecture 3: A Simple Example, Tools, and.

1 SC12 The International Conference for High Performance Computing, Networking, Storage and Analysis Salt Lake City, Utah. Workshop 119: An Educator's.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

1 GPU programming Dr. Bernhard Kainz. 2 Dr Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages.

Killdevil Running CUDA programs on cluster. Requesting permission bin/unc_id/services bin/unc_id/services.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.

CUDA All material not from online sources/textbook copyright © Travis Desell, 2012.

CUDA Memory Types All material not from online sources/textbook copyright © Travis Desell, 2012.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

1 GPU programming Dr. Bernhard Kainz. 2 Dr Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE 8823A GPU Architectures Module 2: Introduction.

1 ECE 8823A GPU Architectures Module 4: Memory Model and Locality © David Kirk/NVIDIA and Wen-mei Hwu, ECE408/CS483/ECE498al, University of Illinois,

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 CUDA Threads.

1 ITCS 5/4010 Parallel computing, B. Wilkinson, Jan 14, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One dimensional.

ECE408/CS483 Applied Parallel Programming Lecture 4: Kernel-Based Data Parallel Execution Model © David Kirk/NVIDIA and Wen-mei Hwu, , SSL 2014.

Would'a, CUDA, Should'a. CUDA: Compute Unified Device Architecture OU Supercomputing Symposium Highly-Threaded HPC.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.

Introduction to CUDA (2 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2011.

GPU Performance Optimisation Alan Gray EPCC The University of Edinburgh.

Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.

Introduction to CUDA Programming CUDA Programming Introduction Andreas Moshovos Winter 2009 Some slides/material from: UIUC course by Wen-Mei Hwu and David.

Matrix Multiplication in CUDA Kyeo-Reh Park Kyeo-Reh Park Nuclear & Quantum EngineeringNuclear & Quantum Engineering.

1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE The 42 nd ACM Technical.

Image Convolution with CUDA

ECE408/CS483 Fall 2015 Applied Parallel Programming Lecture 7: DRAM Bandwidth ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University.

Some things are naturally parallel

CUDA Parallelism Model

CUDA Grids, Blocks, and Threads

Antonio R. Miele Marco D. Santambrogio Politecnico di Milano

Memory Coalescing These notes will demonstrate the effects of memory coalescing Use of matrix transpose to improve matrix multiplication performance B.

Using Shared memory These notes will demonstrate the improvements achieved by using shared memory, with code and results running on coit-grid06.uncc.edu.

Memory and Data Locality

ECE 8823A GPU Architectures Module 4: Memory Model and Locality

ECE 8823A GPU Architectures Module 3: CUDA Execution Model -I

Antonio R. Miele Marco D. Santambrogio Politecnico di Milano

ECE 8823A GPU Architectures Module 5: Execution and Resources - I

CUDA Grids, Blocks, and Threads

What 2 math models are these write the equations.

© David Kirk/NVIDIA and Wen-mei W. Hwu,

Chapter 4:Parallel Programming in CUDA C

Quiz Questions CUDA ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2013, QuizCUDA.ppt Nov 12, 2014.

6- General Purpose GPU Programming

Parallel Computing 18: CUDA - I

Presentation transcript:

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2013, 3-DBlocks.ppt Addressing 2-D grids with 3-D blocks Class Discussion Notes

2 Given two-dimensional addressing, row, column Use the general 2-D to 1-D flattening equation: index = col + row * N col where N col is the total number of columns in a row to get a unique ID. Equation can be used multiple times in 2D/3D grids/blocks to get unique thread ID. General Approach

3 Thread blockIdx.x blockIdx.y Block Grid threadID.x threadID.y x y We have already considered 2-D grids and 2-D blocks

4 Applicable when mapping 2-D data array onto grid. Determine number of threads to the chosen thread, row and column: col = blockIdx.x*blockDim.x+threadIdx.x row = blockIdx.y*blockDim.y+threadIdx.y Then use: ThreadID = col + row * N where N = no of columns of threads in grid. N = blockDim.x * gridDim.x Substituting we would get: ThreadID = (blockIdx.x*blockDim.x+threadIdx.x) + (blockIdx.y*blockDim.y+threadIdx.y)* (blockDim.x * gridDim.x) = blockIdx.x*blockDim.x+threadIdx.x+ blockIdx.y*blockDim.y* blockDim.x * gridDim.x + threadIdx.y*blockDim.x *gridDim.x Global thread ID – one approach

5 Using the general 2-D to 1-D flattening equation: index = column + row * N column Block ID within grid: blockID = blockIdx.x + blockIdx.y * gridDim.x Thread ID within block: BlockthreadID = threadIdx.x + threadIdx.y * blockDim.x Then substitute BlockthreadID and blockID into flattening equation again to get threadID Global thread ID - Another approach

6 2-D Grids and 3-D blocks threadID.z Thread blockIdx.y Block Grid threadID.y threadID.x blockIdx.x

7 We have global thread ID not considering z direction. Call it threadID xy Using the general 2-D to 1-D flattening equation: index = col + row * N col threadID = threadID.z + threadID xy * blockDim.z 3D Global thread ID One approach

8 2-D Address, (x, y) and block sizes D x and D y Unique global thread ID = x + y D x 3-D Address, (x, y, z) and block sizes D x, D y, and D z. Unique global thread ID = x + y D x + zD x D y 3D Global thread ID Another approach

Questions