Low Power processors in HEP

Slides:



Advertisements
Similar presentations
Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it.
Advertisements

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
Computing on Low Power SoC Architecture
Cosc 2150 Current CPUs Intel and AMD processors. Notes The information is current as of Dec 5, 2014, unless otherwise noted. The information for this.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
ARM Cortex-A9 performance in HPC applications Kurt Keville, Clark Della Silva, Merritt Boyd ARM gaining market share in embedded systems and SoCs Current.
Microprocessors SUBTITLE Team 3: David Meadows David Foster Sichao Ni Khareem Gordon.
Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Different CPUs CLICK THE SPINNING COMPUTER TO MOVE ON.
Extracted directly from:
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Network Setup Assignment Chris Moore, Kwan Tonpoobaln, Jon Light.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
Data Logging Solution for Digital Signal Processors Brian Newberry Nekton Research, Inc. James M. Conrad University of North.
HS06 on new CPU, KVM virtual machines and commercial cloud Michele Michelotto 1.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Comp. main. 1 project Robert Jetter IS I think I forgot something…..
Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012.
PERFORMANCE STUDY OF BIG DATA ON SMALL NODES. Ομάδα: Παναγιώτης Μιχαηλίδης Αντρέας Σόλου Instructor: Demetris Zeinalipour.
HS06 on last generation of HEP worker nodes Berkeley, Hepix Fall ‘09 INFN - Padova michele.michelotto at pd.infn.it.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.
HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.
From Westmere to Magny-cours: Hep-Spec06 Cornell U. - Hepix Fall‘10 INFN - Padova michele.michelotto at pd.infn.it.
Introduction to Microprocessors
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
The last generation of CPU processor for server farm. New challenges Michele Michelotto 1.
Farhin Al Masud What is Raspberry PI? o Low cost, credit card sized computer o SOC (System on a chip) o Founded by Raspberry PI foundation.
PASTA 2010 CPU, Disk in 2010 and beyond m. michelotto.
Hardware Architecture
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
SI2K and beyond Michele Michelotto – INFN Padova CCR – Frascati 2007, May 30th.
 A few notes on testing  Preliminary tests  Computed Tomography  Astrophysics  Molecular Dynamics  HS COSA f2f - 18/05/ Lucia Morganti – INFN-CNAF.
+ The INFN COSA project Michele Michelotto
Benchmarking of CPU models for HEP application
Intel and AMD processors
Brief introduction about “Grid at LNS”
Effects of Limiting Numerical Precision on Neural Networks
Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it
NFV Compute Acceleration APIs and Evaluation
CCR Autunno 2008 Gruppo Server
Gruppo Server CCR michele.michelotto at pd.infn.it
How to benchmark an HEP worker node
Heterogeneous Computation Team HybriLIT
Gruppo Server CCR michele.michelotto at pd.infn.it
How INFN is moving out of SI2K has a benchmark for Worker Nodes
Map-Scan Node Accelerator for Big-Data
Unit 2 Computer Systems HND in Computing and Systems Development
Multicore / Multiprocessor Architectures
Modern network servers
Odroid XU4.
Multicore and GPU Programming
EE 155 / Comp 122 Parallel Computing
Multicore and GPU Programming
Course Code 114 Introduction to Computer Science
Presentation transcript:

Low Power processors in HEP Michele Michelotto – INFN Padova

Intel Xeon v5 compare (V1, V2, V3) 32 bit

Intel Xeon v5 compare (V1, V2, V3) clock normalized (HS06/GHz)

How many HS06 I get from a new job when N-1 core are loaded?

HS06 vs HS06 compiled at 64 bit

CPU requested LHC on all Tier1

LHC data forecast years Luminosity Integr. Lum. Pileup run2 2015-2018 1.5 1034 cm-2s-1 150 fb-1 40 Long shutdown 2 2019-2020 run3 2021-2023 2.5 1034 cm-2s-1 300 fb-1 60 long shutdown 2 2024-2025 run4 2026-2029 5.0 1034 cm-2s-1 1000 fb-1 140-200

Historical Trend (data from M.Alef) XEON E5 26x0 AMD 6168 XEON 54xx XEON 51xx AMD 2xx

Courtesy of Peter Elmer, Princeton Univ.

Shipment ARM: Smarphone, Tablet, Cars, Home Media X86: Personal computer, notebook Billions !!!

HEP!

ARM introduction

The ARM Cortex A Series MODEL Architecture Process out of order Core   MODEL Architecture Process out of order Core Cache L1 Cache L2 Dmips/MHz nm min-max A5 Arm v7 32 bit n.d. no 1-4 4-64 1.57 A7 40-28 1-8 8-64 up to 1 MB 1.9 A8 45-65 1 32+32 256-512 KB 2 A9 28-65 yes 2.5 A12 up to 8 MB 3 A15 28-32 2-8 3.5 - 4.0 A17 28 4 A53 Arm v8 64 bit 8-64 + 8-64 128 KB - 2MB A57 28-20 48+32 512 KB - 2MB

Odroid XU Exynos5 Octa Cortex A15 1.6 Quadcore and A7 Quadcore CPUs 2GByte LPDDR3 RAM $169 Fedora 18, armV7, gcc4.8, ODROID kernel

The big LITTLE architecture

Jetson Tegra K1 Nvidia Tegra K1 SOC 2GB x 16 Kepler GPU with 192 CUDA cores (unused in my tests) 4+1 quad core A15 cpu 2200 MHz 2GB x 16 Linux Ubuntu/Linaro 4.8.2-19

Nvidia Tegra K1 Building Blocks

Intel Atom Server Not a SOC but a low power architecture X86-64

Server based on Intel server Atom server in 2U rack mount chassis Each card has 8core Avoton computer node and up to 32GB 92 host is this prototype

ARM server for HTC

HS06 on Exynos5, TegraK1 and Atom C2750

HS06 on Exynos5, TegraK1 and Atom C2750 – Per core loaded

HS06 on Exynos5, TegraK1 and Atom C2750 – Per core loaded and per clock

Xeon D Broadwell SoC

Xeon D vs Avoton

Xeon D vs Avoton per core

Performance per core

Measurements of power consumption Measurements of voltage, amperage and power consumption The power logger Measurements setup

Power logger sw compile First run Second run Third run Idle

Power consumption (Watt) on Intel Xeon E5 2660

Efficiency HS06/Watt

Power consumed on Intel ATOM C2750 vs XEON E5

Power Efficiency HS06/Watt Intel ATOM C2750 vs XEON E5

HS06/Watt on ARM vs Intel I measured also the Tegra and Odroid dev kit Efficiency less than Intel Atom but with cheap PSU, 20W consumed while 5W rumored Waiting for next generation of ARM at 64bit

AMD A1100 – A57 ARM

HS06 – XeonD, Avoton, Amd A1100

Per core loaded

HS06/Watt on A57 and Xeon D Xeon D perform as expected Amd A1100 – A57 has good performances but disappointing power consumption. Not yet production grade configuration. Power supply issues?

Historical Trend with SoC 44 x XEON E5 26x0 AMD 6168 ARM XEON 54xx XEON 51xx AMD 2xx

HS06/Watt x86-64 vs LowPower Atom C2750

Power consumption AMD A1100

Conclusion SoC processor can give us at least a 2x increase in HS06/Watt in HS06 and probably in HEP code The number of core/processor is very small. So special high density (proprietary) chassis are needed ARM64 look interesting but more evaluation is necessary Intel is leading in term of HS06/Watt