Brad Baker, Wayne Haney, Dr. Charles Choi

Slides:

Advertisements

Similar presentations

DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)

Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow

Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

A many-core GPU architecture.. Price, performance, and evolution.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

GPU Computing with CUDA as a focus Christie Donovan.

Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware

Parallelization and CUDA libraries Lei Zhou, Yafeng Yin, Hong Man.

Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.

Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.

Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

CSU0021 Computer Graphics © Chun-Fa Chang CSU0021 Computer Graphics September 10, 2014.

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Use/User:LabServerField Engineer Electrical Engineer Software Engineer Mechanical Engineer Requirements: Small form factor.

Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm:

GPU in HPC Scott A. Friedman ATS Research Computing Technologies.

Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

HPEC-1 MIT Lincoln Laboratory Session 1: GPU: Graphics Processing Units Miriam Leeser / Northeastern University HPEC Conference 15 September 2010.

Radar Pulse Compression Using the NVIDIA CUDA SDK

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

GPU Architectural Considerations for Cellular Automata Programming A comparison of performance between a x86 CPU and nVidia Graphics Card Stephen Orchowski,

GPU Architecture and Programming

- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.

GPU-Accelerated Computing and Case-Based Reasoning Yanzhi Ren, Jiadi Yu, Yingying Chen Department of Electrical and Computer Engineering, Stevens Institute.

Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Understanding Parallel Computers Parallel Processing EE 613.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.

Parallel Programming Models

Computer Engg, IIT(BHU)

NFV Compute Acceleration APIs and Evaluation

Community Grids Laboratory

Low-Cost High-Performance Computing Via Consumer GPUs

Our Graphics Environment

Thilina Gunarathne, Bimalee Salpitkorala, Arun Chauhan, Geoffrey Fox

Enabling machine learning in embedded systems

Multicore, Multithreaded, Multi-GPU Kernel VSIPL Standardization, Implementation, & Programming Impacts Anthony Skjellum, Ph.D.

Multi-Layer Perceptron On A GPU

Parallel Computers Today

“The Brain”… I will rule the world!

The Yin and Yang of Processing Data Warehousing Queries on GPUs

MASS CUDA Performance Analysis and Improvement

Compiler Back End Panel

Compiler Back End Panel

Coe818 Advanced Computer Architecture

Efficient software checkpointing framework for speculative techniques

The Free Lunch Ended 7 Years Ago

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Multicore and GPU Programming

6- General Purpose GPU Programming

CSE 502: Computer Architecture

Multicore and GPU Programming

Presentation transcript:

Brad Baker, Wayne Haney, Dr. Charles Choi Implementation of Parallel Processing Techniques on Graphical Processing Units Poster C.4 Who we are Where we are located Our specialties? Brad Baker, Wayne Haney, Dr. Charles Choi

Industry Direction High performance COTS computing is moving to multi-core and heterogeneous silicon Multi-core CPU with smaller individual cores GPU co-processor Multi-core CPU with 1-3 GPU co-processors Heterogeneous multi-core (IBM Cell) Smaller, heterogeneous cores on same silicon

Math Benchmark and Signal Processing String Results Fundamental Math Benchmarks Software Platform (GPU) 1k SGEMM Gflops 1k 1d Complex FFT Peakstream (AMD r520) 80.13 8.7 CUDA (Nvidia g80) 95 43.4 RapidMind (Nvidia g80) 24 7.5 RapidMind (AMD r520) 26 4.9 Intel Core 2 Quad QX6700 12 14.2 AMD Opteron 265HE 8.8 4.8 Nvidia Cuda Platform Utilizes most powerful GPU on market Most extensive pre-built math library Application Results Architecture Execution Time VSIPL++ PNB Algorithm on Intel Core 2 Quad QX6700 CPU 735.78 msec CUDA Batch Style PNB Algorithm on Nvidia g80 GPU 367.23 msec Describe GPU software platforms used Why we picked the rudimentary benchmarks – SGEMM (mmult), FFT (common in our processing) Size was limited to 1k – was the easiest common denominator we could do at the time. Signal processing application – might not have been the best application; ported over FFT to the GPU as an initial test – 2x as fast just batch processing the FFTs – mention the batch size – and fft size. Approx. 50% Performance Gain Using CUDA’s FFT

Brad Baker, Wayne Haney, Dr. Charles Choi Implementation of Parallel Processing Techniques on Graphical Processing Units Poster C.4 Who we are Where we are located Our specialties? Brad Baker, Wayne Haney, Dr. Charles Choi