Presentation is loading. Please wait.

Presentation is loading. Please wait.

GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group.

Similar presentations


Presentation on theme: "GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group."— Presentation transcript:

1 GPU Processing for Distributed Live Video Database Jun Ye jye@cs.ucf.edu Data Systems Group

2 Outline Introduction to GPU GPU language (OpenCL or CUDA) OpenCL programming Case Study: Live Video Database Management System (LVDBMS)

3 Introduction Current GPUs are more than graphics cards for rendering the images for video games. They are used for more general purposes of all kinds of parallel computing. (e.g. mining the Bitcoin, training the deep neural network in deep learning) GPGPU: general purpose GPU. nVidia Tesla K20nVidia Gforce GTX 580

4 GPU language Two main components: CUDA and OpenCL CUDA (2007) Compute Unified Device Architecture created and owned by nVidia OpenCL (2009) Open Computing Language. Designed by Apple and Khronos, public standard.

5 CUDA or OpenCL ? Proprietary Only work in nVidia’s card Normally has a higher performance without any tuning Open Standards A lot of hardware support: ATI, intel, Apple, nVidia, Qualcomm, Xilinx, and more… Heterogeneous: PC, mobile device, FPGA, DSP …. Performance is generally not as good as CUDA Needs knowledge of the hardware to tune the performance

6 Tip One thing for sure: ATI has a better support for OpenCL than nVidia. OpenCL+ATI seems a better option than OpenCL+nVidia.

7 Brief intro to OpenCL Programming Best fit for problems of parallel computing (1D, 2D, 3D data) A big number of simple computations E.g. Array addition, matrix multiplication, image processing (e.g. Gaussian blur) Greatly enhance the speed by orders of magnitude (hardware specific) Overhead, resource initialization, GPU/CPU memory swap

8 OpenCL programming GPU memory model http://de.wikipedia.org/wiki/Datei:OpenCL_Memory_model.svg

9 OpenCL programming GPU memory model NDrange configuration Global work size Local work size Thread http://gpgpu-computing4.blogspot.com/2009/09/matrix-multiplication-2-opencl.html

10 OpenCL programming coding Host code: runs in the CPU (can be c/c++, python, matlab, javascript) Initialize resources, Config environment (global, local work item size, ) Buffer swapping Kernel code: runs in the device (GPU) (kernel language:.cl) Execute the parallel computing

11 OpenCL programming An example (C) Matrix multiplication A,B are all 1024by 1024 square matrix, Compute C=AxB

12 OpenCL programming Hosting code: #include Initialize device clGetPlatformIDs clGetDeviceIDs clCreateContext clCreateCommandQueue Create program LoadOpenCLKernel(“*.cl”) clCreateProgramWithSource clBuildProgram clCreateKernel

13 OpenCL programming Hosting code: (opencl binding code) Create buffer clCreateBuffer clSetKernelArg Set localworksize (must consider the hardware specs) Set globalworksize (the dimension of your problem) Buffer enque clEnqueueNDRangeKernel Read result from kernel clEnqueueReadBuffer

14 OpenCL programming /* kernel.cl Matrix multiplication: C = A * B. */ // OpenCL Kernel __kernel void matrixMul(__global float* C, __global float* A, __global float* B, int wA, int wB) { int tx = get_global_id(0); int ty = get_global_id(1); // value stores the element that is computed by the thread float value = 0; for (int k = 0; k < wA; ++k) { float elementA = A[ty * wA + k]; float elementB = B[k * wB + tx]; value += elementA * elementB; } // Write the matrix to device memory each // thread writes one element C[ty * wA + tx] = value; }

15 Demo I will show you the execution of the program And compare it against a naive CPU solution Source code available at http://www.es.ele.tue.nl/~mwijtvliet/5KK73/?page=mmopencl

16 Case Study 1. Realistic ray tracing rendering http://webcl.nokiaresearch.com/ 2. Real-time 3D spatial-query in live video database http://www.eecs.ucf.edu/~jye/demo.html Jun Ye and Kien A. Hua, "Octree-based 3D Logic and Computation of Spatial Relationships in Live Video Query Processing," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 11 (2), December 2014. Jun Ye and Kien A. Hua, "Exploiting Depth Camera for 3D Spatial Relationship Interpretation," in proceedings of ACM Multimedia Systems 2013, Oslo, Norway.

17 Real-time 3D spatial-query in live video database Background: A live video database management system Technique: Distributed Live video computing Components: Distributed 3D cameras (Microsoft Kinect) Camera servers Query processing servers

18 Real-time 3D spatial-query in live video database 3D spatial operators GPU-accelerated computing algorithm

19 Real-time 3D spatial-query in live video database Spatial-temporal event query E.g. a person walks out of a room and enter the room next door

20 Real-time 3D spatial-query in live video database

21 Thank you. Questions?


Download ppt "GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group."

Similar presentations


Ads by Google