Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013.

Slides:

Advertisements

Similar presentations

DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)

Advertisements

Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.

ALICE HLT High Speed Tracking and Vertexing Real-Time 2010 Conference Lisboa, May 25, 2010 Sergey Gorbunov 1,2 1 Frankfurt Institute for Advanced Studies,

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.

Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.

Panda: MapReduce Framework on GPU’s and CPU’s

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

Event Reconstruction in STS I. Kisel GSI CBM-RF-JINR Meeting Dubna, May 21, 2009.

Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.

Computer Graphics Graphics Hardware

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October.

Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.

Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm:

Status of the L1 STS Tracking I. Kisel GSI / KIP CBM Collaboration Meeting GSI, March 12, 2009.

GPU in HPC Scott A. Friedman ATS Research Computing Technologies.

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.

Future farm technologies & architectures John Baines 1.

Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.

Status of Reconstruction in CBM

Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Trigger Software Upgrades John Baines & Tomasz Bold 1.

1 Evolving ATLAS Computing Model and Requirements Michael Ernst, BNL With slides from Borut Kersevan and Karsten Koeneke U.S. ATLAS Distributed Facilities.

Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.

Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

Upgrade Letter of Intent High Level Trigger Thorsten Kollegger ALICE | Offline Week |

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.

LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.

Moore vs. Moore Rainer Schwemmer, LHCb Computing Workshop 2015.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

ALICE O 2 | 2015 | Pierre Vande Vyvre O 2 Project Pierre VANDE VYVRE.

Workshop ALICE Upgrade Overview Thorsten Kollegger for the ALICE Collaboration ALICE | Workshop |

16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,

APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,

Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling Tom Henderson NOAA Global Systems Division

GPU Architecture and Its Application

Workshop Concluding Remarks

Fast Parallel Event Reconstruction

ALICE HLT tracking running on GPU

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

ALICE Computing Model in Run3

Computing at the HL-LHC

Multicore and GPU Programming

Multicore and GPU Programming

Presentation transcript:

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |

GPUs for General Purpose Computing In the last 5+ years, increased usage of GPUs (or more general accelerator cards) in High Performance Computing Systems Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | Top 500 list NVIDIA AMD Intel

GPUs for General Purpose Computing Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | Driven by (theoretical) peak performance GPU: O(1) TFLOP/s (NVIDIA TESLA K20: 3.2 TFLOP/s) CPU: O(0.1) TFLOP/s (Intel Xeon E : 243 GFLOP/s) Can this theoretical peak performance be used efficiently for the typical HEP workload?

GPGPU Processing Model Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | Pre-Conditions for effective GPU speed-up of applications Computationally intensive — Time needed for computing much larger then time need for data transfer to GPU Massively parallel — Hundreds of independent computing tasks Few complex CPU cores vs many simple GPU cores Programming Languages: CUDA, OpenCL OpenACC, OpenMP, OpenHMPP, TBB, MPI

What to expect? Typical success stories of GPGPU usage report >x100 speedup However: The expected speedup is strongly depending on workloads. Comparing optimized multi-core CPU versions with optimized GPU versions for most workloads speedup’s of ~5 are measured Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG |

GPGPUs in HEP Lots of R&D activities in the experiments ongoing Mostly focused on Trigger or High-Level-Trigger systems, HW decisions easier than in heterogeneous GRID systems R&D projects I know of, for sure incomplete: ALICE, ATLAS, CMS, LHC, CERN SPS, CERN CBM, FAIR, GSI, Germany RHIC, BNL, USA GEANT 4 … ALICE HLT is using GPUs in production since 2010/2011 Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG |

ALICE HLT Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | Input data rate: ~1 kHz, 20 GByte/s Event size ranging from <1 MByte (p+p) to 80 MByte (central Pb+Pb) Full online reconstruction including tracking of TPC+IST (intermediate) results replace raw data to limit storage space Compute nodes (CN/CNGPU) Full event reconstruction nodes with NVIDIA GTX 480/580 GTX580 newly installed in 2011

ALICE HLT TPC Tracker TPC tracking algorithm based on Cellular Automaton approach Optimized for multi-core CPUs to fulfill latency requirements 2009 ported to CUDA for use on NVIDIA GTX285 consumer cards, changed to use single precision 2010 ported to GTX added GTX580, fully commisioned Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG |

ALICE HLT TPC Tracker Speedup Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | fold Speedup compared to optimized CPU version Note: frees CPUs on CN for other operations (tagging/trigger)

ALICE HLT GPU Experience Experience quite promising, will continue/expand in Run 2 Allowed to reduce system size by factor 3 Stable operation even with consumer hardware Comes with some cost Initial porting to CUDA, change to SP: 1.5 PhD students/1 year Every new GPU generation requires re-tuning (even same chip) Need to support two versions (CPU for simulation, GPU) Full loading of GPU requires quite some effort: currently at 67% Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG |

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | GPUs in the NA62 TDAQ system RO board L0TP L1 PC GPU L1TP L2 PC GPU 1 MHz1 MHz100 kHz RO board L0 GPU L0TP 10 MHz 1 MHz1 MHz Max 1 ms latency The use of the GPU at the software levels (L1/2) is “straightforward”: put the video card in the PC. No particular changes to the hardware are needed The main advantages is to exploit the power of GPUs to reduce the number of PCs in the L1 farms The use of GPU at L0 is more challenging: Fixed and small latency (dimension of the L0 buffers) Deterministic behavior (synchronous trigger) Very fast algorithms (high rate) 11 Slide from Gianluca Lamann (CERN)

Some recent trends Direct transfer of data from e.g. network to GPU w/o involving CPU (AMD: DirectGMA, NVIDIA: GPU Direct 2) APUs: Integrate GPU with CPUs on a chip NVIDIA Tegra: ARM+GPU AMD Fusion: x86+GPU Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | CPU Memory CPU PCIe bus SDI Input/Output card Graphics card FP GA SDI out SDI in Peer-to-peer transfers (DirectGMA) GP U

Where we are… GPGPUs can provide a significant benefit today mainly for tightly-controlled systems, e.g. Trigger & HLT - reduced infrastructure cost development cost main issue is programming complexity & maintenance - will there be a common programming language/library? avoid vendor lock-in… - do we need the ultimate performance? Highly-parallel programming model will be also relevant for effective use of future many-core CPUs GPUs evolving more and more into independent compute units Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG |