GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.

Slides:



Advertisements
Similar presentations
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Advertisements

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
An Effective GPU Implementation of Breadth-First Search Lijuan Luo, Martin Wong and Wen-mei Hwu Department of Electrical and Computer Engineering, UIUC.
Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
A many-core GPU architecture.. Price, performance, and evolution.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Real-World GPGPU Mark Harris NVIDIA Developer Technology.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU platforms GP - General Purpose computation using GPU
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Christopher Mitchell CDA 6938, Spring The Discrete Cosine Transform  In the same family as the Fourier Transform  Converts data to frequency domain.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
NVIDIA Tesla GPU Zhuting Xue EE126. GPU Graphics Processing Unit The "brain" of graphics, which determines the quality of performance of the graphics.
GPU-Accelerated Surface Denoising and Morphing with LBM Scheme Ye Zhao Kent State University, Ohio.
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
A Neural Network Implementation on the GPU By Sean M. O’Connell CSC 7333 Spring 2008.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
1/13 Future computing for particle physics, June 2011, Edinburgh A GPU-based Kalman filter for ATLAS Level 2 Trigger Dmitry Emeliyanov Particle Physics.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
Computer Graphics Graphics Hardware
NFV Compute Acceleration APIs and Evaluation
A Dynamic Scheduling Framework for Emerging Heterogeneous Systems
Graphics Processing Unit
From Turing Machine to Global Illumination
All-Pairs Shortest Paths
Computer Graphics Graphics Hardware
Computer Evolution and Performance
Ray Tracing on Programmable Graphics Hardware
Presentation transcript:

GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte

Contents Summary of the paper EvaluationQuestions

Using commercial graphic cards to speed up execution of network simulation models. Network simulators high fidelity performance evaluation  more detailed models higher computation cost  speed up technique GPU = graphics processing unit Computational power GPU against CPU widening.

Computational power of GPU and CPU (courtesy of Ian Buck, Standford Univ.)

GPU superior because: Stream processing model Spatial parallelism Necessities for GPU usage: Identification data parallelism in network simultions Software abstraction Goal: Design evaluation platform architecture Efficient utilisation of computational processors of GPUs and CPU, memory, IO and other recources. Available in commodity desktops.

Commodity desktop equipped with multiple GPUs With Vidia SLI technology more GPUs in singel system.

Suitability for different types of computation: CPU = high performance on single thread of execution GPU = many more arithmetic units extremely high data parallel and instruction parallel execution Evaluating process high-fidelity network modeling involves: task-parallel computation  multi CPU data-parallel computation  GPUs Features necessary for GPU acceleration: highly data parallel arithmetic-intensive

Power of GPUs is showed by implementing two cases from a network environment in both CPU and GPU. Compared are speed and acurracy of the simulation results. Two cases: Fluid-flow-based TCP model = predicts the traffic dynamics at active queue management routers. Adaptive antenna model =calculates weight of the beam former in direction minimizing mean squared error.

Fluid-flow-based TCP model TCP flows and active queue management Routers are modelled with Stochastic differential equationsTCP flows and active queue management Routers are modelled with Stochastic differential equations Transform Stochastic differential equations into ordinary differential equations (ODEs) for CPU useTransform Stochastic differential equations into ordinary differential equations (ODEs) for CPU use CPU-based implementation uses a ODE solver, ODE45, provided in MatlabCPU-based implementation uses a ODE solver, ODE45, provided in Matlab GPU maps all data structures in CPU to on-board memory in GPUGPU maps all data structures in CPU to on-board memory in GPU

Fluid-flow-based TCP model Time varying state of routers require recomputation of ODE solvers periodicallyTime varying state of routers require recomputation of ODE solvers periodically Execution speed of model is significantly affected by execution speed of ODE solversExecution speed of model is significantly affected by execution speed of ODE solvers Implementing ODE solver in GPU can significantly increase size of network that can be evaluatedImplementing ODE solver in GPU can significantly increase size of network that can be evaluated

Adaptive antenna model recursively updates weights of the beamformers in the direction minimizing mean squared error (MSE) recursively updates weights of the beamformers in the direction minimizing mean squared error (MSE) Recursive least squares (RLS) algorithm is usedRecursive least squares (RLS) algorithm is used Implement data layout and operations of arrays of complex numbers in GPUImplement data layout and operations of arrays of complex numbers in GPU

Evaluation Strong points Weak points Simulation models Conclusion & Future work

Strong Points Highly data-parallel Arithmetic-intensive

Weak Points Processes constitute largely sequential operations Processes require bit-wise operations Solution: Use DSP platform Real-time simulation

Evaluation simulation models Hardware Platform: Dell Dimension desktop Intel (dual core) 3GHz Pentium 4 CPUIntel (dual core) 3GHz Pentium 4 CPU 1GB DDR2 memory nVidia GeForce 7900GTXnVidia GeForce 7900GTX 512MB texture memory Vertex & fragment program: programmed with openGL and GLSL programmed with openGL and GLSL

Simulation models  Differences between GPU & CPU based simulation for Fluid-flow-based TCP model Difference in prediction of traffic dynamicsDifference in prediction of traffic dynamics Difference in execution timeDifference in execution time GPU outperforms CPU for with 256 flows & 256 queues or more because of larger number of iterations in GPU based ODE solver GPU outperforms CPU for with 256 flows & 256 queues or more because of larger number of iterations in GPU based ODE solver

Normalized ODE solver Evaluation Time

Simulation models Adaptive antenna model GPU-based simulation runs faster than CPU- based one when antenna array size exceeds 256GPU-based simulation runs faster than CPU- based one when antenna array size exceeds 256 Execution time of GPU-based implementation linear decreases with respect to the number of sub-carriers due to parallel processingExecution time of GPU-based implementation linear decreases with respect to the number of sub-carriers due to parallel processing

Simulation Execution Times

Conclusions & Future work GPU’s can achieve a speedup of 10x without loss of accuracy High fidelity network simulations can be accelerated by parallel use of CPU & GPU units Integrate GPU-implemented modules into existing simulation-based network evaluation platform

Questions?