HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.

Slides:



Advertisements
Similar presentations
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
Advertisements

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
An evaluation of the Intel Xeon E5 Processor Series Zurich Launch Event 8 March 2012 Sverre Jarp, CERN openlab CTO Technical team: A.Lazzaro, J.Leduc,
Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it.
Internet of Things with Intel Edison Presentation Paul Guermonprez Intel Software
Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
Shimin Chen Big Data Reading Group Presented and modified by Randall Parabicoli.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
High-Performance Task Distribution for Volunteer Computing Rom Walton
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Adam Meyer, Michael Beck, Christopher Koch, and Patrick Gerber.
DELL PowerEdge 6800 performance for MR study Alexander Molodozhentsev KEK for RCS-MR group meeting November 29, 2005.
Different CPUs CLICK THE SPINNING COMPUTER TO MOVE ON.
Computer Performance Computer Engineering Department.
Last Time Performance Analysis It’s all relative
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Energy Savings with DVFS Reduction in CPU power Extra system power.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
CPU Inside Maria Gabriela Yobal de Anda L#32 9B. CPU Called also the processor Performs the transformation of input into output Executes the instructions.
HS06 on new CPU, KVM virtual machines and commercial cloud Michele Michelotto 1.
Fast Benchmark Michele Michelotto – INFN Padova Manfred Alef – GridKa Karlsruhe 1.
PC hardware and x86 programming Lec 2 Jinyang Li.
CERN IT Department CH-1211 Genève 23 Switzerland t IHEPCCC/HEPiX benchmarking WG Helge Meinhard / CERN-IT LCG Management Board 11 December.
Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.
HS06 on last generation of HEP worker nodes Berkeley, Hepix Fall ‘09 INFN - Padova michele.michelotto at pd.infn.it.
Processors with Hyper-Threading and AliRoot performance Jiří Chudoba FZÚ, Prague.
HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.
From Westmere to Magny-cours: Hep-Spec06 Cornell U. - Hepix Fall‘10 INFN - Padova michele.michelotto at pd.infn.it.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
The last generation of CPU processor for server farm. New challenges Michele Michelotto 1.
PASTA 2010 CPU, Disk in 2010 and beyond m. michelotto.
Hardware Architecture
HPC/HTC vs. Cloud Benchmarking An empirical evaluation of the performance and cost implications Kashif Iqbal - PhD ICHEC, NUI Galway,
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
CERN IT Department CH-1211 Genève 23 Switzerland t IHEPCCC/HEPiX benchmarking WG Helge Meinhard / CERN-IT Grid Deployment Board 09 January.
SI2K and beyond Michele Michelotto – INFN Padova CCR – Frascati 2007, May 30th.
Benchmarking of CPU models for HEP application
Brief introduction about “Grid at LNS”
Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it
NFV Compute Acceleration APIs and Evaluation
Title of the Poster Supervised By: Prof.*********
CCR Autunno 2008 Gruppo Server
Gruppo Server CCR michele.michelotto at pd.infn.it
White Rose Grid Infrastructure Overview
Computer Speeds The cost of Computing One Numerical Operation
How to benchmark an HEP worker node
Low Power processors in HEP
Gruppo Server CCR michele.michelotto at pd.infn.it
The Multikernel: A New OS Architecture for Scalable Multicore Systems
How INFN is moving out of SI2K has a benchmark for Worker Nodes
By Christine, Katie, Lucas, and Sean
Passive benchmarking of ATLAS Tier-0 CPUs
Comparing dual- and quad-core performance
CERN Benchmarking Cluster
INFN - Padova michele.michelotto at pd.infn.it
HEPiX Spring 2009 Highlights
SAP HANA Cost-optimized Hardware for Non-Production
Odroid XU4.
SAP HANA Cost-optimized Hardware for Non-Production
Presentation transcript:

HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1

The SL6 transition 2  Rumors of sizeable differences of HS06 across Scientific Linux distribution on same hardware  I made detailed measurements on AMD Opteron 6272  SL5 with gcc  SL6 with gcc  SL6 with last compiler available at that time  End of August  We started collecting SL6 results from WLCG sites  Alessandra Forti asked to send results to me and Manfred  Manfred created a new page on the HEPiX site.

SL6 performance vs. SL5 3

SL6+gcc4.7 vs. SL6 vs. SL5 4

SL6+gcc4.7 and SL6 gcc 4.4 Diff with SL5 and gcc

Let’s do it on Intel Xeon 6

Differences SL6 gcc4.7 and SL6 gcc4.4 wrt SL5 gcc4.1 7

HEP-SPEC06 site maintained mainly by Manfred 8

SL6 vs. SL5 from pair of similar worker node 9

Ivy Bridge vs. Sandy Bridge 10

Ivy Bridge vs. Sandy Bridge 64 bit 11

Performances per clock 12

Mail from Manfred on Friday 25th 13  DELL C6620 (2U, 4nodes)  Each node  2 x Intel Xeon E v2 – 10 cores (20 Logical 2.5 GHz  64 GB (8x8 GB PC )  6x900 GB SAS  342 HS06 (20 copies) HS06 (40 copies)

Adding Manfred new beast 14

A new architecture: ARM 15

A new architecture 16  Exynos4412 Prime CPU  1.7 GHz Cortex -A9 quad core  2GB LP-DDR2 memory (512MB/core)  $89 each  Fedora 18, armV7, gcc4.8, ODROID kernel

Courtesy of Peter Elmer, Princeton Univ. 17

HS06 measured on ARM 18

Measurements of power consumption 19  Measurements of voltage, amperage and power consumption  The power logger  Measurements setup  Single core  Multicore  32 bit measurements.  64 bit measurements  Collecting results from Manfred  Measurements on ARM

Fluke 1735 Three-Phase Power Logger 20

Measurement setup for single phase 21

On display 22

Power logger sw 23 Idle compile First run Second run Third run

Black average – Green min –Red Max 24

Power consumption (Watt) on Intel Xeon E

Min Average Max 26

Efficiency HS06/Watt 27

Historical Trend from Manfred 28 XEON E5 26x0 XEON 54xx XEON 51xx AMD 2xx AMD 6168

HS06/Watt with ARM processor 29

Mail from Manfred on Friday 25th 30  DELL C6620 (2U, 4nodes)  Each node  2 x Intel Xeon E v2 – 10 cores (20 Logical 2.5 GHz  64 GB (8x8 GB PC )  6x900 GB SAS  342 HS06 (20 copies) HS06 (40 copies)  1450 Watt on four nodes  362 Watt/node

Mail from Manfred on Friday 25th 31  DELL C6620 (2U, 4nodes)  Each node  2 x Intel Xeon E v2 – 10 cores (20 Logical 2.5 GHz  64 GB (8x8 GB PC )  6x900 GB SAS  342 HS06 (20 copies) HS06 (40 copies)  1450 Watt on four nodes  362 Watt/node

HS06/Watt +XeonE5v2670v2 (Manfred) 32

Future work 33  New Xeon E5 v2 very good performances  Detailed measurements on Xeon E5 v2 in HS06/watt  New Intel server processor  Avoton  New ARM processors  64bit processor will be available