How to benchmark an HEP worker node

Slides:



Advertisements
Similar presentations
Hepmark Valutazione della potenza dei nodi di calcolo nella HEP Michele Michelotto Padova Ferrara Bologna.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
Computing Resources Joachim Wagner Overview CNGL Cluster MT Group Cluster School Cluster Desktop PCs.
A comparison of HEP code with SPEC benchmark on multicore worker nodes HEPiX Benchmarking Group Michele Michelotto at pd.infn.it.
Performance benchmark of LHCb code on state-of-the-art x86 architectures Daniel Hugo Campora Perez, Niko Neufled, Rainer Schwemmer CHEP Okinawa.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Transition to a new CPU benchmark on behalf of the “GDB benchmarking WG”: HEPIX: Manfred Alef, Helge Meinhard, Michelle Michelotto Experiments: Peter Hristov,
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.
Computer Performance Computer Engineering Department.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
HS06 on new CPU, KVM virtual machines and commercial cloud Michele Michelotto 1.
Fast Benchmark Michele Michelotto – INFN Padova Manfred Alef – GridKa Karlsruhe 1.
Computer Architecture By Chris Van Horn. CPU Basics “Brains of the Computer” Fetch Execute Cycle Instruction Branching.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
Benchmarking status Status of Benchmarking Helge Meinhard, CERN-IT WLCG Management Board 14-Jul Helge Meinhard (at) CERN.ch.
CERN IT Department CH-1211 Genève 23 Switzerland t IHEPCCC/HEPiX benchmarking WG Helge Meinhard / CERN-IT LCG Management Board 11 December.
Benchmarking Benchmarking in WLCG Helge Meinhard, CERN-IT HEPiX Fall 2015 at BNL 16-Oct Helge Meinhard (at) CERN.ch.
HS06 on last generation of HEP worker nodes Berkeley, Hepix Fall ‘09 INFN - Padova michele.michelotto at pd.infn.it.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.
HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
From Westmere to Magny-cours: Hep-Spec06 Cornell U. - Hepix Fall‘10 INFN - Padova michele.michelotto at pd.infn.it.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
The last generation of CPU processor for server farm. New challenges Michele Michelotto 1.
PASTA 2010 CPU, Disk in 2010 and beyond m. michelotto.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
CERN IT Department CH-1211 Genève 23 Switzerland t IHEPCCC/HEPiX benchmarking WG Helge Meinhard / CERN-IT Grid Deployment Board 09 January.
Measuring Performance Based on slides by Henri Casanova.
SI2K and beyond Michele Michelotto – INFN Padova CCR – Frascati 2007, May 30th.
ANL T3g infrastructure S.Chekanov (HEP Division, ANL) ANL ASC Jamboree September 2009.
Computer Architecture & Operations I
Benchmarking of CPU models for HEP application
Brief introduction about “Grid at LNS”
Dynamic Extension of the INFN Tier-1 on external resources
Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it
CCR Autunno 2008 Gruppo Server
Community Grids Laboratory
Experience of Lustre at QMUL
Gruppo Server CCR michele.michelotto at pd.infn.it
CS161 – Design and Architecture of Computer Systems
September 2 Performance Read 3.1 through 3.4 for Tuesday
The “Understanding Performance!” team in CERN IT
evoluzione modello per Run3 LHC
Low Power processors in HEP
Gruppo Server CCR michele.michelotto at pd.infn.it
How INFN is moving out of SI2K has a benchmark for Worker Nodes
Experiences with Large Data Sets
The INFN TIER1 Regional Centre
Passive benchmarking of ATLAS Tier-0 CPUs
Southwest Tier 2.
Morgan Kaufmann Publishers
Infrastructure for testing accelerators and new
Transition to a new CPU benchmark
Comparing dual- and quad-core performance
Procurements at CERN: Status and Plans
CERN Benchmarking Cluster
INFN - Padova michele.michelotto at pd.infn.it
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

How to benchmark an HEP worker node Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it

michele michelotto - INFN Padova Computing model Tier3 physics department    Desktop Germany Tier-1 UK France Italy CERN Tier 1 Japan CERN Tier 0 Tier-2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a regional group USA BNL USA FNAL grid for a physics study group Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Storage easier Tape Storage: Very Easy: event size x Number of events  Petabyte Disk Storage Easy again: events size  Terabyte Gibibyte vs Gigabyte 1 TB = 1 TeraByte = 10004 1TiB = 1 TebiByte = 10244 Raw Terabyte or Raid protected (Raid5, Raid6)? Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova But computing power? Tricky: Event/sec? Which events? Sim or Reco? We need some kind of benchmark MIPS (meaningless instruction per second) VUPS (Vax unit per second) CernUnit MHz Spec SI2K…. Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova T1 + T2 cpu budget - LHC Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova SI2K frozen SI2K is the benchmark used up to now to measure the computing power of all the HEP experiments Computing power requested by experiments from the TDR Computing power provided by a Tier-[0,1,2] SI2K is the nickname for SPEC CPU Int 2000 benchmark Came after Spec89, Spec Int 92 and Spec Int 95 Declared obsolete by SPEC in 2006 Replaced by SPEC with CPU Int 2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Transition problem Impossible to find SPEC Int 2000 pubblished results for the new processors (e.g. the not so new Clovertown 4-core) Impossible to find pubblished SPEC Int 2006 for old processor (before 2006) E.g. Old P4 Xeon, P4, AMD 2xx You can’t convert from SI2000 to SI2006 but the ratio for x86 architecture is in the 137 – 172 range Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova The SI2K inflaction The main problems with SI2000 in our community: it is not proportional to HEP codes performance (as it was) You can buy processors with huge SI2K number but with a smaller increase in real performances Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Nominal SI vs real SI CERN (and FZK) started to use a new currency: SI2K measured with “gcc”, the gnu C compiler and using two flavour of optimization FZK: High tuning: gcc –O3 –funroll-loops–march=$ARCH CERN: Low tuning: gcc –O2 –fPIC –pthread Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova FZK Measurement In 2001 SPEC with gcc was 80% of the average pubblished data In 2006 the gap was much wider Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Nominal SI vs real SI FZK uses for tender SI2K with FZK tuning (gcc-high) and add 25% to “normalize” to year 2001 CERN and FZK Proposal to WLCG: use SI2K with CERN tuning (gcc-low) and add 50% to normalize Run n copies in parallel Where n is the number of cores in the worker node To take in account the drop in performance of a multicore machine when fully loaded. Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Too many SI2K Take as an example a worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz SI2K nominal: 2929 – 3089 (min – max) SI2K sum on 4 cores: 11716 - 12536 SI2K gcc-low: 5523 SI2K gcc-high: 7034 SI2K gcc-low + 50%: 8284 Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Wrong way Old way: Take the measurement of SI2000 you prefer from SPEC (or an average) and multiply by number of cores in your farm Other variations: Take SI2000 with gcc on one core and multiply by number of cores Take SI2000 rate Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova WLCG SI2K How-to Run SI2000 with gcc3, 32bit, with CERN flags gcc –O2 –fPIC –pthread –m32 Run N copies of this SI2000 in parallel as the N number of cores Sum all the results Add 50% This is the SI2K of one machine Sum over all the machines Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Exercise Compute the WLCG official rating of a farm with 224 Dell Blade M1000e 2x5420 Number of cores/server: 8 SI2K gcc-low: 10218 10218 * 224 = 2289000 Total SI2K: 2289 kSI2k + 50%: Total WLCG SI2K: 3433 kSI2k Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova What a mess SI2K is easy to measure but is maintained any more How to ask a vendor to measure with SI2K if he can’t buy it? Is SI2006 the right substitute? Or SI2006 rate? Or Spec FP 2006? Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova CMS sw SIM and Pythia CMS Montecarlo simulation (32bit) and Pythia (64bit) show the same performance once normalized Both Specint 2006 pubblished and Specint 2006 with gcc show the same behaviour SI2K pubbished does not match HEP sw SI2K cern better but not as good as SI2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Babar TierA Results If you normalize by core and clock all new processors have the same performance Doubling the older generation cpu SI2006 matches this pattern (pubblished and gcc ratio constant) SI2000-gcc better than SI2K nominal SI2000 clearly doesn’t work Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Atlas Here 100% is Xeon5160 Few results for SI2006+gcc but no diff from CMS and babar Few results also from SI2006 pubblished because of several old architectures SI2K+gcc not bad SI2K pubblished heavily overstimate new Xeon Atlas simulation normalized performs the same on the new intel “core” or amd “opteron” (like CMS, Babar) Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Many gaps Easy to find SPEC pubblished result But only for new machines Difficult to measure: Not easy to have machine on loan from Server reseller or producer Not easy to borrow machine from colleagues Always for short periods of time A SPEC run can last 15-20 hours Need a set of dedicated worker node to make SPEC and HEP application measurement Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova HEPIX group A group with people from the major lab (CERN, FZK, DESY, RAL, INFN, JLAB, TRIUMF) after IHEPCCC request And people appointed from experiments (CMS, ATLAS, ALICE, LHCB) Several machine (lxbench cluster) at CERN Harpertown and Barcelona INFN PD Harpertown in Desy Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Measure SI2000 with gcc3, 32bit, cern tuning, parallel SI2006 with gcc3, 32bit, cern tuning, parallel SFP2006 with gcc4, 32 bit cern tuning, parallel Because Spec FP doesn’t compile with gcc3 For each experiment GEN, SIM, DIGI, RECO on the same set of events Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova SPEC rate vs parallel Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova All The Machines Lxbench01 2x Nocona 2.8 GHz/1 MB, 2x 1GB Lxbench02 2x Irvindale 2.8 GHz/2 MB, 4x 1GB DDR333 Lxb6106 2x Irvindale 2.8 GHz/2 MB, 2x 1GB DDR333 Lxb7006 2x Irvindale 2.8GHz/2 MB, 2x 1GB DDR-II 400 Lxbench03 2x Opteron 275 2.2GHz/2 MB, 4x 1GB DDR-II 400 Lxbench04 2x Woodcrest 2.66 GHz/4 MB, 8x 1GB DDR-II 533 Lxb7609 2x Woodcrest 3.00 GHz/4 MB, 4x 2GB DDR-II 667 Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 3

michele michelotto - INFN Padova All The Machines, cont. Lxbench05 2x Woodcrest 3.00 GHz/4 MB, 8x 1GB DDR-II 533 Lxbench06 2x Opteron 2218 rev. F 2.6GHz/2 MB, 8x 1GB DDR-II 667 Lxbench07 2x Clovertown 2.33 GHz/2x 4MB, 8x 2GB DDR-II 667 Lxbench08 2x Harpertown E5410 2.33 GHz/2x 4M, 8x 2GB DDR-II 667 Lxcmssrv07 2x Harpertown E5410 2.33 GHz/2x4M, 16GB Lxcmssrv08 2x Opteron Barcelona 2352 2.10 GHz / 4x512kB + 2x2MB, 16GB Desy 2x Harpertown E5440 2.83 GHz/2x4M, 16GB Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 4

michele michelotto - INFN Padova SPECint2000 Results Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 8

michele michelotto - INFN Padova SPEC2000 vs. SPEC2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 9

michele michelotto - INFN Padova ATLAS v12 Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova LHCB GEN+SIM 4 hours per run Min bias p-p events GEN+SIM Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova LHCB - Reconstruction 20 minutes for each run Min bias digitized events as input Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Alice pp Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Alice Pb Pb Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova CMS RECO bench01 Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova CMS RECO bench04 Workshop CCR 08 LNGS michele michelotto - INFN Padova

Alice results (preliminary) Exp. Results versus … Benchmark Test SPECint2000 SPECint2006 SPECfp2006 pp MinBias GEN+SIM 0.974 0.981 0.980 DIGI 0.949 0.959 0.979 RECO 0.956 0.966 0.989 TOTAL(SUM) 0.965 0.983 PbPb per2 0.976 0.982 8.6 - 11.2fm 0.754 0.752 0.682 0.942 0.943 IHEPCCC/HEPiX Benchmarking WG - 35

CMS results (preliminary) (1) Exp. Result versus… Benchmark Test SPECint2000 SPECint2006 SPECfp2006 HiggsZZ4LM190 GEN+SIM 0.983 0.988 0.986 DIGI 0.971 0.977 0.974 RECO 0.979 0.985 TOTAL(SUM) 0.982 MinBias 0.972 0.978 0.973 0.970 0.976 0.981 0.987 0.984 IHEPCCC/HEPiX Benchmarking WG - 36

CMS results (preliminary) (2) QCD_80_120 GEN+SIM 0.980 0.986 0.984 DIGI 0.973 0.976 RECO 0.975 0.981 0.977 TOTAL(SUM) 0.983 SingleElectronE1000 0.989 0.988 0.970 0.974 0.962 0.968 0.960 0.987 IHEPCCC/HEPiX Benchmarking WG - 37

CMS results (preliminary) (3) QCD_80_120 GEN+SIM 0.980 0.986 0.984 DIGI 0.973 0.976 RECO 0.975 0.981 0.977 TOTAL(SUM) 0.983 SingleElectronE1000 0.989 0.988 0.970 0.974 0.962 0.968 0.960 0.987 IHEPCCC/HEPiX Benchmarking WG - 38

michele michelotto - INFN Padova Conclusion Waiting for an official decision for the next benchmark: SI2006(gcc,parallel): PROBABLY SI2006rate(gcc): SI2006(pubblished): easier to do, but risk of future divergence C++ subset of SI2006 (best fit but risk of future divergence SI2000(gcc): obsolete TODAY: You are supposed to use the old SI2K(gcc-low)+50% Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova Nehalem Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova IMC + cache Integrated Memory controller 2 thread per core Latency cycle L1 L2 L3 Nehalem 2.66 GHz 4 11 39 Penryn Q9460 2.66GHz 3 15 N/A Workshop CCR 08 LNGS michele michelotto - INFN Padova

michele michelotto - INFN Padova 2009: Nehalem vs Shangai 45nm 700 M transistor Same architecture AMD uses 2MB's of L2 + 6MB's of L3 for 8MB's total. Intel uses 1MB of L2 + 8MB's of L3 for 9MB's total Time to market? Clock? Price? Workshop CCR 08 LNGS michele michelotto - INFN Padova