GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond

GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond GPUComputing@Sheffield http://gpucomputing.sites.sheffield.ac.uk/

Dynamic Parallelism (CUDA 5+) GPU Object Linking (CUDA 5+) Unified Memory (CUDA 6+) Other Developer Tools Overview

Before CUDA 5 threads had to be launched from the host Limited ability to perform recursive functions Dynamic Parallelism allows threads to be launched from the device Improved load balancing Deep Recursion Dynamic Parallelism CPU Kernel A Kernel B Kernel C Kernel D GPU

//Host Code... A >>(data); B >>(data); C >>(data); //Kernel Code __global__ void vectorAdd(float *data) { do_stuff(data); X >>(data); do_more stuff(data); } An Example

CUDA 4 required a single source file for a single kernel No linking of compiled device code CUDA 5.0+ Allows different object files to be linked Kernels and host code can be built independently GPU Object Linking Main.cpp ___________________________ a.cu____________________b.cu____________________c.cu____________________ a.ob.oc.o + Program.exe

Objects can also be built into static libraries Shared by different sources Much better code reuse Reduces compilation time Closed source device libraries GPU Object Linking Main.cpp ___________________________ a.cu____________________b.cu____________________ a.ob.o ab.culib + Program.exe + + Main2.cpp ___________________________ ab.culib Program2.exe + + foo.cubar.cu...

Developer view is that GPU and CPU have separate memory Memory must be explicitly copied Deep copies required for complex data structures Unified Memory changes that view Single pointer to data accessible anywhere Simpler code porting Unified Memory System Memory GPU Memory CPUGPU Unified Memory CPUGPU

Unified Memory Example void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); fread(data, 1, N, fp); qsort(data, N, 1, compare); use_data(data); free(data); } void sortfile(FILE *fp, int N) { char *data; cudaMallocManaged(&data, N); fread(data, 1, N, fp); qsort(data, N, 1, compare); cudaDeviceSynchronize(); use_data(data); free(data); }

XT and Drop-in libraries cuFFT and cuBLAS optimised for multi GPU (on the same node) GPUDirect Direct Transfer between GPUs (cut out the host) To support direct transfer via Infiniband (over a network) Developer Tools Remote Development using Nsight Eclipse Enhanced Visual Profiler Other Developer Tools

GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond

Similar presentations

Presentation on theme: "GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond

Similar presentations

Presentation on theme: "GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond"— Presentation transcript:

Similar presentations

About project

Feedback