+ The INFN COSA project Michele Michelotto

Slides:

Advertisements

Similar presentations

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Advertisements

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.

T1.1- Analysis of acceleration opportunities and virtualization requirements in industrial applications Bologna, April 2012 UNIBO.

Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.

GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.

Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it.

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.

Supermicro © 2009Confidential HPC Case Study & References.

A many-core GPU architecture.. Price, performance, and evolution.

GPU Computing with CUDA as a focus Christie Donovan.

Computing on Low Power SoC Architecture

Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.

1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.

Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)

HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.

© 2005, it - instituto de telecomunicações. Todos os direitos reservados. System Level Resource Discovery and Management for Multi Core Environment Javad.

Prepared by Careene McCallum-Rodney Hardware specification of a computer system.

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

Microprocessors SUBTITLE Team 3: David Meadows David Foster Sichao Ni Khareem Gordon.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |

Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.

Tracking with CACTuS on Jetson Running a Bayesian multi object tracker on a low power, embedded system School of Information Technology & Mathematical.

ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.

The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.

Personal Chris Ward CS147 Fall  Recent offerings from NVIDA show that small companies or even individuals can now afford and own Super Computers.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012.

High Performance Computing

Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.

HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.

Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.

Locate Potential Support Vectors for Faster

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

+ The INFN COSA project Daniele Cesini – INFN-CNAF (On behalf of the COSA collaboration)

+ The INFN COSA project Tommaso Boccali (alcune slides riadattate da Daniele Cesini)

Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Hardware Architecture

Does HPC really fit GPUs ? Davide Rossetti INFN – Roma Napoli, January 2010 APE group January Incontro di.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,

 A few notes on testing  Preliminary tests  Computed Tomography  Astrophysics  Molecular Dynamics  HS COSA f2f - 18/05/ Lucia Morganti – INFN-CNAF.

Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it

M. Bellato INFN Padova and U. Marconi INFN Bologna

FPGAs for next gen DAQ and Computing systems at CERN

Modern supercomputers, Georgian supercomputer project and usage areas

Low-Cost High-Performance Computing Via Consumer GPUs

Brad Baker, Wayne Haney, Dr. Charles Choi

GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht

Scientific requirements and dimensioning for the MICADO-SCAO RTC

LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,

Low Power processors in HEP

Low-Cost High-Performance Computing Via Consumer GPUs

Stato progetto COSA

Infrastructure for testing accelerators and new

Stepping in the world of

Introduction to Heterogeneous Parallel Computing

Introduction to Single Board Computer

Multicore and GPU Programming

Multicore and GPU Programming

Presentation transcript:

+ The INFN COSA project Michele Michelotto

+ INFN COSA project COSA: Computing On SOC Architecture Duration: 3 years from January 2015 Project Leader: Daniele Cesini (CNAF Departments: 7 INFN CNAF, PI, PD, ROMA1, FE, PR, LNL BUDGET :51.5 kEuro Year1, 42kEuro Year2 Funded by INFN CSN5 Michele Michelotto 2 Incontro sulle attività di calcolo a Padova

+ Michele Michelotto 3 Objectives Acquire know-how Porting and benchmarking of low power/low cost System on Chip Operations of Linux system on SoCs Benchmarking hybrid architectures Unification of INFN HW testing activities Continuation of the COKA project Computing on Knights Architecture Porting on traditional accelerator (GPU/MIC) Continuation of the HEPMARK projects (PADOVA) X86 benchmarking Study of custom low latency interconnection built with ARM+FPGA devices Prepare H2020 proposals on LowPower computing calls Incontro sulle attività di calcolo a Padova

+ Low-Power System on Chip (SoCs) Michele Michelotto 4 Incontro sulle attività di calcolo a Padova

+ Ok, but then....an iPhone cluster? NO, we are not thinking to build an iPhone cluster We want to use these processors in a standard computing center configuration Rack mounted Linux powered Running scientific application mostly in a batch environment..... Use development board... Michele Michelotto 5 Incontro sulle attività di calcolo a Padova

+ Powered by ARM® big.LITTLE™ technology, with a Heterogeneous Multi-Processing (HMP) solution 4 core ARM A cores ARM A7 Exynos 5422 by Samsung ~ 20 GFLOPS peak (32bit) single precision Mali- T628 MP6 GPU ~ 110 GFLOPS peak single precision 2 GB RAM 2xUSB3.0, 2xUSB2.0, 1x1000Gbs eth Ubuntu 14.4 HDMI 1.4 port 64 GB flash storage ODROID-XU3 Michele Michelotto 6 Power consumption max ~ 15 W Costs 150 euro! Incontro sulle attività di calcolo a Padova

+ HS06 for AMD A1100 – ARM A57 8 cores Incontro sulle attività di calcolo a PadovaMichele Michelotto 7 HS06/kW

+ Low Power from intel Incontro sulle attività di calcolo a PadovaMichele Michelotto 8 Tested in COSA during last 6 months E5-2683v3 not low power – used as reference 29 th Aril 2016: Intel announced that will exit the smartphone and tablet mobile SoC business by ending its struggling Atom chip product line Apollo Lake not cancelled Goldmont microarch Integrated graphics 14nm 4 cores

+ Michele Michelotto 9 Applications Experimental Physics Montecarlo and analysis of LHC experiments HEP experiments High Level Trigger and Data Acquisition applications Applications needing portable systems Computer tomography Theoretical Physics Parallel applications usually run in HPC environments Relativistic astrophysic Lattice Quantum ChromoDynamics simulations Lattice Boltzmann fluid dynamics Monte Carlo simulations of Spin-Glass systems Neural Networks DPSNN-STDP code Synthetic Tests HEPSPEC06 HPL Incontro sulle attività di calcolo a Padova

+ Computer tomography 10 Filtered Backprojection Algorithm In collaboration with the X-ray Imaging group of the Dept of Physics – Bologna University ( Real-Time Reconstruction for 3-D CT Applied to Large Objects of Cultural Heritage, R. Brancaccio, M. Bettuzzi, F. Casali, M. P. Morigi, G. Levi, A. Gallo, G. Marchetti, and D. Schneberk, IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 58, NO. 4, AUGUST 2011 Michele MichelottoIncontro sulle attività di calcolo a Padova

+ FBP Algorithm - Productivity Michele Michelotto 11 Number of reconstructed slices for time unit Not surprisingly, the Xeon guarantees a higher speed than the SoC architecture The multi-threaded version of the algorithm is faster than the GPU version for small sizes of the slice when the application performances are broken by data transfer to and from device Incontro sulle attività di calcolo a Padova Xeon is a dual E NVIDIA K20 TK1 is the NVIDIA Jetson TK1

+ FBP Algorithm - Energy efficiency Michele Michelotto 12 Reconstructed slices per energy unit by different runs In one hour, considering the 1024x1024 slice and combining the CUDA and OMP runs on both architectures:  5 TK1: 2340 slices consuming 41W  2xE K slices consuming 350W Master thesis of Elena Corni: Implementazione dell’algoritmo Filtered Back-Projection (FBP) per architetture Low-Power di tipo Systems-On-Chip Incontro sulle attività di calcolo a Padova Xeon is a dual E NVIDIA K20 TK1 is the NVIDIA Jetson TK1

+ LHCb Montecarlo software test Michele Michelotto 13 Incontro sulle attività di calcolo a Padova Gauss: event simulation + detector description based on Geant Porting was not difficult Just recompilation All the platform can provide enough RAM per core for the LHCb sw 168 8, ,2 2,1

+ LHCb Analysis software test Michele Michelotto 14 Incontro sulle attività di calcolo a Padova All available cores loaded Brunel: offline reconstruction algorithms 0,027 0,04 0,002 0,31

+ LHCb Event Building on the XEOND Sw designed to simulate the event building on InfiniBand based network It relies on the verbs library to perform RDMA operations In the second part of the test a pure computation process is started on four cores in order to simulate a software trigger The performances of the Event Builder are comparable, but the D-1540 requires a third of the power consumption of the E v3 Incontro sulle attività di calcolo a PadovaMichele Michelotto 15 Matteo Manzali – INFN-CNAF & UniFe

+ Conclusion COSA is testing two types of SoCs Low-Power SoCs from the mobile/embedded world still have many limitations for a production environment Low Power SoCs from the server world very expensive and in some cases not really low power 10Gbs/Infiniband networking easier to obtain SoCs are becoming attractive for real life scientific applications In particular if you manage to extract power from the integrated GPU CPU porting was easier than expected Low-power/low cost dominated by ARM until last year, now INTEL is becoming competitive in this segment No porting required for the CPU Advanced HW integration needed to maintain a reasonable size of the system Michele Michelotto 16 Incontro sulle attività di calcolo a Padova