CUDA Libraries and Language Extensions for GKLEE.

CUDA Libraries and Language Extensions for GKLEE

What's the point? ● We want to make GKLEE as friendly and easy/practical to use as possible to make it a real-world development tool ● CUDA extensions and APIs cover a lot of ground – how do we approch GKLEE's complete handling?

What to Handle? ● There are many items to cover ● Break down into levels: – Language intrinsics / vital functions – Easy to implement items (make stubs) – Items that require a large degree of emulation/virtualization in GKLEE – are vital to semantics of kernels / programs and should either be handled properly or cause termination with message – Items that are unrelated and can be ignored

Top Priority Items ● Language Intrinsics ● Vital Functions

CUDA C/C++ Extensions ● Function Type Qualifiers ● Variable Type Qualifiers ● Types (Vector Types) ● Built-in Variables ● Execution Configurations (also see the configuration functions in Runtime API)

Function Type Qualifiers __device__ (executed on device, callable only from device) __global__ (a kernel declaration, callable from host only) __host__ exec on host, callable on host (__device__ and __host__ can be used together) __noinline__ hints not to inline functions __forceinline__ forces inline

Variable Type Qualifiers __device__ a var that resides on device __constant__ fully accessible from all threads in grid and host __shared__ **THIS IS CURRENTLY BROKEN (in some cases) __restrict__ ensures to the compiler that pointer is not aliased so it can do reordering and common sub-expression elimination -- supported by LLVM?

Built-in Variables GridDim BlockDim BlockIdx ThreadIdx warpSize

Vital Functions ● Memory Fence Functions ● Synchronization ● Warp Vote Functions ● On-Device Asserts

Variations on __syncthreads

Warp Vote Functions

On device Assertion

**Execution Configuration**

Lower Priority

Heap Memory Alloc

Profiler Counter Function

Launch Bounds

#pragma unroll

Runtime API – Streams Functions cudaError_t cudaStreamCreate (cudaStream_t pStream) Create an asynchronous stream. cudaError_t cudaStreamDestroy (cudaStream_t stream) Destroys and cleans up an asynchronous stream. cudaError_t cudaStreamQuery (cudaStream_t stream) Queries an asynchronous stream for completion status. cudaError_t cudaStreamSynchronize (cudaStream_t stream) Waits for stream tasks to complete. cudaError_t cudaStreamWaitEvent (cudaStream_t stream, cudaEvent_t event, unsigned int flags) Make a compute stream wait on an event.

RT API – Events Functions cudaError_t cudaEventCreate (cudaEvent_t event) Creates an event object. cudaError_t cudaEventCreateWithFlags (cudaEvent_t event, unsigned int flags) Creates an event object with the specified flags. cudaError_t cudaEventDestroy (cudaEvent_t event) Destroys an event object. cudaError_t cudaEventElapsedTime (float ms, cudaEvent_t start, cudaEvent_t end) Computes the elapsed time between events. cudaError_t cudaEventQuery (cudaEvent_t event) Queries an event’s status. cudaError_t cudaEventRecord (cudaEvent_t event, cudaStream_t stream=0) Records an event. cudaError_t cudaEventSynchronize (cudaEvent_t event) Waits for an event to complete.

RT API – Execution Control cudaError_t cudaConfigureCall (dim3 gridDim, dim3 blockDim, size_t sharedMem=0, cudaStream_- t stream=0) Configure a device-launch. cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes attr, const char func) Find out attributes for a given function. cudaError_t cudaFuncSetCacheConfig (const char func, enum cudaFuncCache cacheConfig) Sets the preferred cache configuration for a device function. cudaError_t cudaLaunch (const char entry) Launches a device function. cudaError_t cudaSetDoubleForDevice (double d) Converts a double argument to be executed on a device. cudaError_t cudaSetDoubleForHost (double d) Converts a double argument after execution on a device. cudaError_t cudaSetupArgument (const void arg, size_t size, size_t offset) Configure a device launch.

RT API – Memory Management (first 6 of ~50) cudaError_t cudaArrayGetInfo (struct cudaChannelFormatDesc desc, struct cudaExtent extent, unsigned int flags, struct cudaArray array) Gets info about the specified cudaArray. cudaError_t cudaFree (void devPtr) Frees memory on the device. cudaError_t cudaFreeArray (struct cudaArray array) Frees an array on the device. cudaError_t cudaFreeHost (void ptr) Frees page-locked memory. cudaError_t cudaGetSymbolAddress (void devPtr, const char symbol) Finds the address associated with a CUDA symbol. cudaError_t cudaGetSymbolSize (size_t size, const char symbol) Finds the size of the object associated with a CUDA symbol.

RT API – Unified Address Space ● Allows host and device memory to be handled with a unified address cudaError_t cudaPointerGetAttributes (struct cudaPointerAttributes attributes, const void ptr) Returns attributes about a specified pointer.

RT API – direct peer mem access cudaError_t cudaDeviceCanAccessPeer (int canAccessPeer, int device, int peerDevice) Queries if a device may directly access a peer device’s memory. cudaError_t cudaDeviceDisablePeerAccess (int peerDevice) Disables direct access to memory allocations on a peer device and unregisters any registered allocations from that device. cudaError_t cudaDeviceEnablePeerAccess (int peerDevice, unsigned int flags) Enables direct access to memory allocations on a peer device.

RT API – Graphics Interfaces ● OpenGL ● Direct3D ● VDPAU – Video Decode and Presentation API for Unix ● General graphics interop ● Texture ● Surface

RT API – Version Info cudaError_t cudaDriverGetVersion (int driverVersion) Returns the CUDA driver version. cudaError_t cudaRuntimeGetVersion (int runtimeVersion) Returns the CUDA Runtime version.

RT API – C++ Bindings ● (Sample functions – use templates to bind class) template cudaError_t cudaBindSurfaceToArray (const struct surface &surf, const struct cudaArray array) [C++ API] Binds an array to a surface template cudaError_t cudaBindSurfaceToArray (const struct surface &surf, const struct cudaArray array, const struct cudaChannelFormatDesc &desc) [C++ API] Binds an array to a surface

RT API – Profiler Control cudaError_t cudaProfilerInitialize (const char configFile, const char outputFile, cudaOutputMode_t output- Mode) Initialize the profiling. cudaError_t cudaProfilerStart (void) Start the profiling. cudaError_t cudaProfilerStop (void) Stop the profiling.

Specific API (RT & Driver) ● Data Structures ● Enumerations ● #defines

RT API Driver API Interactions ● Init/Tear Down ● Contexts ● Streams ● Events ● Arrays ● Graphics

Driver API (lower level access) ● Initialization ● Version management ● Device management ● Context management ● Module management ● Memory management ● Unified addressing ● Streams ● Events ● Exec Control ●...

CUDA Libraries and Language Extensions for GKLEE.

Similar presentations

Presentation on theme: "CUDA Libraries and Language Extensions for GKLEE."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CUDA Libraries and Language Extensions for GKLEE.

Similar presentations

Presentation on theme: "CUDA Libraries and Language Extensions for GKLEE."— Presentation transcript:

Similar presentations

About project

Feedback