Download presentation
Presentation is loading. Please wait.
Published byMarcus Walters Modified over 8 years ago
1
CUDA Libraries and Language Extensions for GKLEE
2
What's the point? ● We want to make GKLEE as friendly and easy/practical to use as possible to make it a real-world development tool ● CUDA extensions and APIs cover a lot of ground – how do we approch GKLEE's complete handling?
3
What to Handle? ● There are many items to cover ● Break down into levels: – Language intrinsics / vital functions – Easy to implement items (make stubs) – Items that require a large degree of emulation/virtualization in GKLEE – are vital to semantics of kernels / programs and should either be handled properly or cause termination with message – Items that are unrelated and can be ignored
4
Top Priority Items ● Language Intrinsics ● Vital Functions
5
CUDA C/C++ Extensions ● Function Type Qualifiers ● Variable Type Qualifiers ● Types (Vector Types) ● Built-in Variables ● Execution Configurations (also see the configuration functions in Runtime API)
6
Function Type Qualifiers __device__ (executed on device, callable only from device) __global__ (a kernel declaration, callable from host only) __host__ exec on host, callable on host (__device__ and __host__ can be used together) __noinline__ hints not to inline functions __forceinline__ forces inline
7
Variable Type Qualifiers __device__ a var that resides on device __constant__ fully accessible from all threads in grid and host __shared__ **THIS IS CURRENTLY BROKEN (in some cases) __restrict__ ensures to the compiler that pointer is not aliased so it can do reordering and common sub-expression elimination -- supported by LLVM?
8
Built-in Variables GridDim BlockDim BlockIdx ThreadIdx warpSize
9
Vital Functions ● Memory Fence Functions ● Synchronization ● Warp Vote Functions ● On-Device Asserts
11
Variations on __syncthreads
12
Warp Vote Functions
13
On device Assertion
14
**Execution Configuration**
15
Lower Priority
16
Heap Memory Alloc
17
Profiler Counter Function
18
Launch Bounds
19
#pragma unroll
20
Runtime API – Streams Functions cudaError_t cudaStreamCreate (cudaStream_t pStream) Create an asynchronous stream. cudaError_t cudaStreamDestroy (cudaStream_t stream) Destroys and cleans up an asynchronous stream. cudaError_t cudaStreamQuery (cudaStream_t stream) Queries an asynchronous stream for completion status. cudaError_t cudaStreamSynchronize (cudaStream_t stream) Waits for stream tasks to complete. cudaError_t cudaStreamWaitEvent (cudaStream_t stream, cudaEvent_t event, unsigned int flags) Make a compute stream wait on an event.
21
RT API – Events Functions cudaError_t cudaEventCreate (cudaEvent_t event) Creates an event object. cudaError_t cudaEventCreateWithFlags (cudaEvent_t event, unsigned int flags) Creates an event object with the specified flags. cudaError_t cudaEventDestroy (cudaEvent_t event) Destroys an event object. cudaError_t cudaEventElapsedTime (float ms, cudaEvent_t start, cudaEvent_t end) Computes the elapsed time between events. cudaError_t cudaEventQuery (cudaEvent_t event) Queries an event’s status. cudaError_t cudaEventRecord (cudaEvent_t event, cudaStream_t stream=0) Records an event. cudaError_t cudaEventSynchronize (cudaEvent_t event) Waits for an event to complete.
22
RT API – Execution Control cudaError_t cudaConfigureCall (dim3 gridDim, dim3 blockDim, size_t sharedMem=0, cudaStream_- t stream=0) Configure a device-launch. cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes attr, const char func) Find out attributes for a given function. cudaError_t cudaFuncSetCacheConfig (const char func, enum cudaFuncCache cacheConfig) Sets the preferred cache configuration for a device function. cudaError_t cudaLaunch (const char entry) Launches a device function. cudaError_t cudaSetDoubleForDevice (double d) Converts a double argument to be executed on a device. cudaError_t cudaSetDoubleForHost (double d) Converts a double argument after execution on a device. cudaError_t cudaSetupArgument (const void arg, size_t size, size_t offset) Configure a device launch.
23
RT API – Memory Management (first 6 of ~50) cudaError_t cudaArrayGetInfo (struct cudaChannelFormatDesc desc, struct cudaExtent extent, unsigned int flags, struct cudaArray array) Gets info about the specified cudaArray. cudaError_t cudaFree (void devPtr) Frees memory on the device. cudaError_t cudaFreeArray (struct cudaArray array) Frees an array on the device. cudaError_t cudaFreeHost (void ptr) Frees page-locked memory. cudaError_t cudaGetSymbolAddress (void devPtr, const char symbol) Finds the address associated with a CUDA symbol. cudaError_t cudaGetSymbolSize (size_t size, const char symbol) Finds the size of the object associated with a CUDA symbol.
24
RT API – Unified Address Space ● Allows host and device memory to be handled with a unified address cudaError_t cudaPointerGetAttributes (struct cudaPointerAttributes attributes, const void ptr) Returns attributes about a specified pointer.
25
RT API – direct peer mem access cudaError_t cudaDeviceCanAccessPeer (int canAccessPeer, int device, int peerDevice) Queries if a device may directly access a peer device’s memory. cudaError_t cudaDeviceDisablePeerAccess (int peerDevice) Disables direct access to memory allocations on a peer device and unregisters any registered allocations from that device. cudaError_t cudaDeviceEnablePeerAccess (int peerDevice, unsigned int flags) Enables direct access to memory allocations on a peer device.
26
RT API – Graphics Interfaces ● OpenGL ● Direct3D ● VDPAU – Video Decode and Presentation API for Unix ● General graphics interop ● Texture ● Surface
27
RT API – Version Info cudaError_t cudaDriverGetVersion (int driverVersion) Returns the CUDA driver version. cudaError_t cudaRuntimeGetVersion (int runtimeVersion) Returns the CUDA Runtime version.
28
RT API – C++ Bindings ● (Sample functions – use templates to bind class) template cudaError_t cudaBindSurfaceToArray (const struct surface &surf, const struct cudaArray array) [C++ API] Binds an array to a surface template cudaError_t cudaBindSurfaceToArray (const struct surface &surf, const struct cudaArray array, const struct cudaChannelFormatDesc &desc) [C++ API] Binds an array to a surface
29
RT API – Profiler Control cudaError_t cudaProfilerInitialize (const char configFile, const char outputFile, cudaOutputMode_t output- Mode) Initialize the profiling. cudaError_t cudaProfilerStart (void) Start the profiling. cudaError_t cudaProfilerStop (void) Stop the profiling.
30
Specific API (RT & Driver) ● Data Structures ● Enumerations ● #defines
31
RT API Driver API Interactions ● Init/Tear Down ● Contexts ● Streams ● Events ● Arrays ● Graphics
32
Driver API (lower level access) ● Initialization ● Version management ● Device management ● Context management ● Module management ● Memory management ● Unified addressing ● Streams ● Events ● Exec Control ●...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.