Network coding on the GPU Péter Vingelmann Supervisor: Frank H.P. Fitzek.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
GPGPU platforms GP - General Purpose computation using GPU
OpenSSL acceleration using Graphics Processing Units
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Enhancing GPU for Scientific Computing Some thoughts.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Cg Programming Mapping Computational Concepts to GPUs.
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395 Winter 2014 Lecture 17 Introduction to Accelerator.
Accelerating image recognition on mobile devices using GPGPU
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
GPU Architecture and Programming
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Implementation and Optimization of SIFT on a OpenCL GPU Final Project 5/5/2010 Guy-Richard Kayombya.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
1 GPU programming Dr. Bernhard Kainz. 2 Dr Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
From VIC (VRVS) to ViEVO (EVO) 3 years of experiences with developing of video application VIC for VRVS allowed us to develop a new video application.
Computer Engg, IIT(BHU)
Computer Graphics Graphics Hardware
GPU Architecture and Its Application
Graphics Processing Unit
Lecture 2: Intro to the simd lifestyle and GPU internals
Computer Graphics Graphics Hardware
Graphics Processing Unit
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Network coding on the GPU Péter Vingelmann Supervisor: Frank H.P. Fitzek

What is network coding? Traditional routing in packet networks: data is simply forwarded by the intermediate nodes Network coding breaks with this principle: nodes may recombine several input packets into one or several output packets Linear network coding: form linear combinations of incoming packets

What are the benefits of network coding? Throughtput The butterfly network: Robustness Each encoded packet is “equally important” Complexity Less complex protocols (e.g. for content distribution) Security It is more difficult to “overhear” anything that makes sense

What is the problem? The computational overhead introduced by network coding operations is not negligible There is no dedicated network coding hardware yet A possible solution: use the Graphics Processing Unit (GPU) to perform the necessary calculations

Overview of network coding Definition: coding at a node in a packet network All operations are performed over a Galois Field GF(2 s ), packets are divided into s bit-long symbols The process of network coding can be divided into two separate parts: K bits s bits Symbol 1 s bits Symbol 2 Symbol K/s

Encoding Encoded packets are linear combinations of the original packets, where addition and multiplication are performed over GF(2 s ) We can use random coding coefficients

Decoding Assume a node has received M encoded packets (together with its coefficients) Linear system with M equations and N unknowns We need M ≥ N to have a chance of solving this system of equations using standard Gaussian elimination At least N linearly independent encoded packets must be received in order to recover all the original data packets

CPU implementation A simple C++ console application with some customizable parameters: L: packet length N: generation size Object-oriented implementation: Encoder and Decoder classes Addition and subtraction over the Galois Field are simply XOR operations on the CPU Galois multiplication and division tables are pre-calculated and stored in arrays: both operations can be performed by array lookups Gauss-Jordan elimination is used for decoding: “on-the-fly” version of the standard Gaussian elimination It is used as a reference implementation

Graphics card Originally designed for real-time rendering of 3D graphics The past: fixed-function pipeline They evolved into programmable parallel processors with enormous computing power The present: programmable pipeline Now they can even perform general-purpose computations with some restrictions The future: General Purpose Graphics Processing Unit (GPGPU)

OpenGL & CG implementation OpenGL is a standard cross-platform API for computer graphics It cannot be used on its own, a shader language is also necessary to implement custom algorithms A shader is a short program which is used to program certain stages of the rendering pipeline I chose NVIDIA’s CG toolkit as a shader language The developer is forced to think with the traditional concepts of 3D graphics (e.g. vertices, pixels, triangles, lines and points)

Encoder shader in CG A regular bitmap image serves as input data Coefficients and data packets are stored in textures (2D arrays of bytes in graphics memory that can be accessed efficiently) The XOR operation and Galois multiplication are also implemented by texture look-ups: a 256x256-sized black&white texture is necessary for each The encoded packets are rendered (computed) line- by-line onto the screen and they are saved into a texture

Decoder shaders in CG The decoding algorithm is more complex It must be decomposed into 3 different shaders These shaders correspond to the 3 consecutive phases of the Gauss-Jordan elimination: 1. Forward substitution: reduce the new packet by the existing rows 2. Finding the pivot element in the reduced packet 3. Backward substitute the reduced and normalized packet into the existing rows

NVIDIA’s CUDA toolkit Compute Unified Device Architecture (CUDA) Parallel computing applications in the C language Modern GPUs have many processor cores and they can launch thousands of threads with zero scheduling overhead Terminology: host = CPU device = GPU kernel = a function executed on the GPU A kernel is executed in the Single Program Multiple Data (SPMD) model, meaning that a user- specified number of threads execute the same program.

CUDA implementation A CUDA-capable device is required! NVIDIA GeForce 8 series at minimum This is a more native approach, we have fewer restrictions A large number of threads must be launched to achieve the GPU’s peak performance All data structures are stored in CUDA arrays, which are bound to texture references if necessary Computations are visualized using an OpenGL GUI

Encoder kernel in CUDA Encoding is a matrix multiplication in the GF domain, and can be considered as a highly parallel computation problem We can achieve a very fine granularity by launching a thread for every single byte to be computed Galois multiplication is implemented by array look-ups, but we have a native XOR operator The encoder kernel is quite simple

Decoder kernels in CUDA Gauss-Jordan elimination means that the decoding of each coded packet can only start after the decoding of the previous coded packets has finished => we have a sequential algorithm!!! Parallelization is only possible within the decoding of the current coded packet We need 2 separate kernels for forward and backward substitution A search for the first non-zero element must be performed on the CPU side, because synchronization is not possible between all GPU threads => the CPU must assist the GPU!

Graphical User Interface

Performance evaluation It is difficult to compare the actual performance of these implementations A lot of factors has to be taken into consideration:  Shader/kernel execution times  Memory transfers between host and device memory  Shader/kernel initialization & parameter setup  CPU-GPU synchronization Measurement results are not uniform, because we cannot have exclusive control over the GPU: other applications may have a negative impact

CPU implementation

OpenGL & CG implementation

CUDA implementation

Achievements It has been shown that the GPU is capable of performing network coding calculations What’s more, it can outperform the CPU by a significant margin in some cases We have a submitted and accepted paper at European Wireless ’09 by the title: Implementation of Random Linear Network Coding on OpenGL-enabled Graphics Cards

Demonstration CPU implementation OpenGL & CG implementation CUDA implementation

Questions??? Thank you for your kind attention!