Passive benchmarking of ATLAS Tier-0 CPUs

Slides:



Advertisements
Similar presentations
1 Jacob Thomas Basu Vaidyanathan Bret Olszewski Session II April 2014 POWER8 Benchmark and Performance.
Advertisements

BY GAURAV GUPTA[13IS09F] PAWAN KUMAR THAKUR[13IS17F] Dynamic Management of Turbo Mode in Modern Multi-core Chips DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING.
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
An evaluation of the Intel Xeon E5 Processor Series Zurich Launch Event 8 March 2012 Sverre Jarp, CERN openlab CTO Technical team: A.Lazzaro, J.Leduc,
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Transition to a new CPU benchmark on behalf of the “GDB benchmarking WG”: HEPIX: Manfred Alef, Helge Meinhard, Michelle Michelotto Experiments: Peter Hristov,
Scaling and Packing on a Chip Multiprocessor Vincent W. Freeh Tyler K. Bletsch Freeman L. Rawson, III Austin Research Laboratory.
Computer Performance Computer Engineering Department.
Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
HS06 on new CPU, KVM virtual machines and commercial cloud Michele Michelotto 1.
Solution to help customers and partners accelerate their data.
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Benchmarking Benchmarking in WLCG Helge Meinhard, CERN-IT HEPiX Fall 2015 at BNL 16-Oct Helge Meinhard (at) CERN.ch.
HS06 on last generation of HEP worker nodes Berkeley, Hepix Fall ‘09 INFN - Padova michele.michelotto at pd.infn.it.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
NIKHEF11 th March WLCG Operational Costs M. Dimou, J. Flix, A. Forti, A. Sciabà WLCG Operations Coordination Team GDB – NIKHEF [11 th March.
HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.
HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.
Contact Sambit Samal (sambits) for additional information on Benchmarks.
From Westmere to Magny-cours: Hep-Spec06 Cornell U. - Hepix Fall‘10 INFN - Padova michele.michelotto at pd.infn.it.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Definitions Information System Task Force 8 th January
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
The last generation of CPU processor for server farm. New challenges Michele Michelotto 1.
PASTA 2010 CPU, Disk in 2010 and beyond m. michelotto.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
CERN IT Department CH-1211 Genève 23 Switzerland Benchmarking of CPU servers Benchmarking of (CPU) servers Dr. Ulrich Schwickerath, CERN.
Measuring Performance II and Logic Design
Brief introduction about “Grid at LNS”
Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it
Title of the Poster Supervised By: Prof.*********
CSCI206 - Computer Organization & Programming
CCR Autunno 2008 Gruppo Server
Jacob R. Lorch Microsoft Research
Cluster Optimisation using Cgroups
How to benchmark an HEP worker node
The “Understanding Performance!” team in CERN IT
Defining Performance Which airplane has the best performance?
Outline Benchmarking in ATLAS Performance scaling
Running virtualized Hadoop, does it make sense?
Gruppo Server CCR michele.michelotto at pd.infn.it
The Multikernel: A New OS Architecture for Scalable Multicore Systems
How INFN is moving out of SI2K has a benchmark for Worker Nodes
Geant4 MT Performance Soon Yung Jun (Fermilab)
Investigation of the improved performance on Haswell processors
Experience of Lustre at a Tier-2 site
of experiment simulations
PES Lessons learned from large scale LSF scalability tests
CPU accounting of public cloud resources
Transition to a new CPU benchmark
Intel’s Core i7 Processor
Comparing dual- and quad-core performance
ALICE Computing Upgrade Predrag Buncic
CERN Benchmarking Cluster
February WLC GDB Short summaries
CLARA Based Application Vertical Elasticity
CS2100 Computer Organisation
The latest developments in preparations of the LHC community for the computing challenges of the High Luminosity LHC Dagmar Adamova (NPI AS CR Prague/Rez)
Hui Chen, Shinan Wang and Weisong Shi Wayne State University
CSCI206 - Computer Organization & Programming
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
HEPiX Spring 2009 Highlights
Exploring Multi-Core on
Presentation transcript:

Passive benchmarking of ATLAS Tier-0 CPUs Andrea Sciabà HEPiX benchmarking WG meeting March 10 2017

Introduction Goal Method Detailed description of the analysis Use real jobs to measure the relative speeds of different CPU models Method Measure the averages of CPU time/event on the desired sample of jobs Assume that it’s inversely proportional to the CPU speed Use the fact that “tasks” contain very similar jobs, so all jobs in the same task can be considered different runs of the same “benchmark” Detailed description of the analysis Refer to recent pre-GDB talk

Application The analysis was applied to jobs run at the ATLAS Tier-0 October 2015 - March 2016: SMT on March 2016 - today: SMT off The change was done for efficiency reasons Selection cuts Only successful jobs are used No. events per job > 1000 (to avoid a bias from the job initialization/finalization time)

CPU models Only six CPU models were used by the ATLAS Tier-0 Model name Architecture Physicalcores SMT on (1st period) SMT off (2nd period Opteron 6276 Interlagos 2 yes no Xeon E5-2630 v3 2.40 GHz Haswell 8 Xeon L5520 2.27 GHx Nehalem 4 Xeon E5-2650 2.00 GHz Sandy Bridge Xeon E5-2630L 6 Xeon E5-2650 v2 2.60 GHz Ivy Bridge

Question What I would like to understand from this data is: Do Tier-0 reconstruction jobs scale well with HS06?

HS06 data I use the CPU performance data collected by the CERN procurement team https://hwcollect.cern.ch:9000/hwinfo/_design/hwinfo/_view/cpuperformance HEPSPEC06 measured on the bare metal Effect of virtualization not taken into account CPU found when SMT on CPU found when SMT off

HS06 per core (or HW thread) Model name Hardware threads Physicalcores SMT on (1st period) SMT off (2nd period Opteron 6276 16 8 7.82 ± 0.13 Xeon E5-2630 v3 2.40 GHz 8/16 10.72 ± 0.09 21.42 ± 0.06 Xeon L5520 2.27 GHx 4 6.065 ± 0.004 Xeon E5-2650 2.00 GHz 8.00 ± 0.74 Xeon E5-2630L 6/12 6 8.49 ± 0.08 17.0 ± 0.0 Xeon E5-2650 v2 2.60 GHz 10.85 ± 0.36 Always 2 CPUs per node

Speed factor k – HS06 correlation Scaling generally good, two exceptions Opteron way off HS06 measures better performance than reco jobs w.r.t. Xeons Haswell tends to perform better than expected than Sandy Bridge If “expectation” is given by HS06

Sandy Bridge vs. Haswell Let’s compare the values of the coefficient between k and HS06/core for SB and Haswell k = α ×HS06/core In both cases, Haswell shows an improvement in throughput which is ~10% larger than what HS06 would predict Sandy Bridge Haswell Ratio SMT ON 0.109 0.118 1.08 SMT OFF 0.0504 0.0559 1.11 Values of alpha

Conclusions Looking at ATLAS Tier-0 CPU consumption, scalability of HS06 with performance is reasonable, but It is not good for AMD processors It hints at a mismatch from SB to Haswell that goes in the direction of what was seen for DB12, but with a rather lower effect

References Analysis notebooks available at https://github.com/sciaba/ATLAS-CPU-time-analysis Not very well commented, sorry

Acknowledgements Many thanks to Jaroslav Guenther, Mario Lassnig and all the ATLAS Tier-0 team And to Marco Guerri for making the procurement benchmarking data available