Download presentation
Presentation is loading. Please wait.
1
How to benchmark an HEP worker node
Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it
2
michele michelotto - INFN Padova
Computing model Tier3 physics department Desktop Germany Tier-1 UK France Italy CERN Tier 1 Japan CERN Tier 0 Tier-2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a regional group USA BNL USA FNAL grid for a physics study group Workshop CCR 08 LNGS michele michelotto - INFN Padova
3
michele michelotto - INFN Padova
Storage easier Tape Storage: Very Easy: event size x Number of events Petabyte Disk Storage Easy again: events size Terabyte Gibibyte vs Gigabyte 1 TB = 1 TeraByte = 10004 1TiB = 1 TebiByte = 10244 Raw Terabyte or Raid protected (Raid5, Raid6)? Workshop CCR 08 LNGS michele michelotto - INFN Padova
4
michele michelotto - INFN Padova
But computing power? Tricky: Event/sec? Which events? Sim or Reco? We need some kind of benchmark MIPS (meaningless instruction per second) VUPS (Vax unit per second) CernUnit MHz Spec SI2K…. Workshop CCR 08 LNGS michele michelotto - INFN Padova
5
michele michelotto - INFN Padova
T1 + T2 cpu budget - LHC Workshop CCR 08 LNGS michele michelotto - INFN Padova
6
michele michelotto - INFN Padova
SI2K frozen SI2K is the benchmark used up to now to measure the computing power of all the HEP experiments Computing power requested by experiments from the TDR Computing power provided by a Tier-[0,1,2] SI2K is the nickname for SPEC CPU Int 2000 benchmark Came after Spec89, Spec Int 92 and Spec Int 95 Declared obsolete by SPEC in 2006 Replaced by SPEC with CPU Int 2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova
7
michele michelotto - INFN Padova
Transition problem Impossible to find SPEC Int pubblished results for the new processors (e.g. the not so new Clovertown 4-core) Impossible to find pubblished SPEC Int 2006 for old processor (before 2006) E.g. Old P4 Xeon, P4, AMD 2xx You can’t convert from SI2000 to SI2006 but the ratio for x86 architecture is in the 137 – 172 range Workshop CCR 08 LNGS michele michelotto - INFN Padova
8
michele michelotto - INFN Padova
The SI2K inflaction The main problems with SI2000 in our community: it is not proportional to HEP codes performance (as it was) You can buy processors with huge SI2K number but with a smaller increase in real performances Workshop CCR 08 LNGS michele michelotto - INFN Padova
9
michele michelotto - INFN Padova
Nominal SI vs real SI CERN (and FZK) started to use a new currency: SI2K measured with “gcc”, the gnu C compiler and using two flavour of optimization FZK: High tuning: gcc –O3 –funroll-loops–march=$ARCH CERN: Low tuning: gcc –O2 –fPIC –pthread Workshop CCR 08 LNGS michele michelotto - INFN Padova
10
michele michelotto - INFN Padova
FZK Measurement In 2001 SPEC with gcc was 80% of the average pubblished data In 2006 the gap was much wider Workshop CCR 08 LNGS michele michelotto - INFN Padova
11
michele michelotto - INFN Padova
Nominal SI vs real SI FZK uses for tender SI2K with FZK tuning (gcc-high) and add 25% to “normalize” to year 2001 CERN and FZK Proposal to WLCG: use SI2K with CERN tuning (gcc-low) and add 50% to normalize Run n copies in parallel Where n is the number of cores in the worker node To take in account the drop in performance of a multicore machine when fully loaded. Workshop CCR 08 LNGS michele michelotto - INFN Padova
12
michele michelotto - INFN Padova
Too many SI2K Take as an example a worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz SI2K nominal: 2929 – 3089 (min – max) SI2K sum on 4 cores: SI2K gcc-low: 5523 SI2K gcc-high: 7034 SI2K gcc-low + 50%: 8284 Workshop CCR 08 LNGS michele michelotto - INFN Padova
13
michele michelotto - INFN Padova
Wrong way Old way: Take the measurement of SI2000 you prefer from SPEC (or an average) and multiply by number of cores in your farm Other variations: Take SI2000 with gcc on one core and multiply by number of cores Take SI2000 rate Workshop CCR 08 LNGS michele michelotto - INFN Padova
14
michele michelotto - INFN Padova
WLCG SI2K How-to Run SI2000 with gcc3, 32bit, with CERN flags gcc –O2 –fPIC –pthread –m32 Run N copies of this SI2000 in parallel as the N number of cores Sum all the results Add 50% This is the SI2K of one machine Sum over all the machines Workshop CCR 08 LNGS michele michelotto - INFN Padova
15
michele michelotto - INFN Padova
Exercise Compute the WLCG official rating of a farm with 224 Dell Blade M1000e 2x5420 Number of cores/server: 8 SI2K gcc-low: 10218 * 224 = Total SI2K: 2289 kSI2k + 50%: Total WLCG SI2K: 3433 kSI2k Workshop CCR 08 LNGS michele michelotto - INFN Padova
16
michele michelotto - INFN Padova
What a mess SI2K is easy to measure but is maintained any more How to ask a vendor to measure with SI2K if he can’t buy it? Is SI2006 the right substitute? Or SI2006 rate? Or Spec FP 2006? Workshop CCR 08 LNGS michele michelotto - INFN Padova
17
michele michelotto - INFN Padova
CMS sw SIM and Pythia CMS Montecarlo simulation (32bit) and Pythia (64bit) show the same performance once normalized Both Specint 2006 pubblished and Specint 2006 with gcc show the same behaviour SI2K pubbished does not match HEP sw SI2K cern better but not as good as SI2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova
18
michele michelotto - INFN Padova
Babar TierA Results If you normalize by core and clock all new processors have the same performance Doubling the older generation cpu SI2006 matches this pattern (pubblished and gcc ratio constant) SI2000-gcc better than SI2K nominal SI2000 clearly doesn’t work Workshop CCR 08 LNGS michele michelotto - INFN Padova
19
michele michelotto - INFN Padova
Atlas Here 100% is Xeon5160 Few results for SI2006+gcc but no diff from CMS and babar Few results also from SI2006 pubblished because of several old architectures SI2K+gcc not bad SI2K pubblished heavily overstimate new Xeon Atlas simulation normalized performs the same on the new intel “core” or amd “opteron” (like CMS, Babar) Workshop CCR 08 LNGS michele michelotto - INFN Padova
20
michele michelotto - INFN Padova
Many gaps Easy to find SPEC pubblished result But only for new machines Difficult to measure: Not easy to have machine on loan from Server reseller or producer Not easy to borrow machine from colleagues Always for short periods of time A SPEC run can last hours Need a set of dedicated worker node to make SPEC and HEP application measurement Workshop CCR 08 LNGS michele michelotto - INFN Padova
21
michele michelotto - INFN Padova
HEPIX group A group with people from the major lab (CERN, FZK, DESY, RAL, INFN, JLAB, TRIUMF) after IHEPCCC request And people appointed from experiments (CMS, ATLAS, ALICE, LHCB) Several machine (lxbench cluster) at CERN Harpertown and Barcelona INFN PD Harpertown in Desy Workshop CCR 08 LNGS michele michelotto - INFN Padova
22
michele michelotto - INFN Padova
Measure SI2000 with gcc3, 32bit, cern tuning, parallel SI2006 with gcc3, 32bit, cern tuning, parallel SFP2006 with gcc4, 32 bit cern tuning, parallel Because Spec FP doesn’t compile with gcc3 For each experiment GEN, SIM, DIGI, RECO on the same set of events Workshop CCR 08 LNGS michele michelotto - INFN Padova
23
michele michelotto - INFN Padova
SPEC rate vs parallel Workshop CCR 08 LNGS michele michelotto - INFN Padova
24
michele michelotto - INFN Padova
All The Machines Lxbench01 2x Nocona 2.8 GHz/1 MB, 2x 1GB Lxbench02 2x Irvindale 2.8 GHz/2 MB, 4x 1GB DDR333 Lxb6106 2x Irvindale 2.8 GHz/2 MB, 2x 1GB DDR333 Lxb7006 2x Irvindale 2.8GHz/2 MB, 2x 1GB DDR-II 400 Lxbench03 2x Opteron GHz/2 MB, 4x 1GB DDR-II 400 Lxbench04 2x Woodcrest 2.66 GHz/4 MB, 8x 1GB DDR-II 533 Lxb7609 2x Woodcrest 3.00 GHz/4 MB, 4x 2GB DDR-II 667 Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 3
25
michele michelotto - INFN Padova
All The Machines, cont. Lxbench05 2x Woodcrest 3.00 GHz/4 MB, 8x 1GB DDR-II 533 Lxbench06 2x Opteron 2218 rev. F 2.6GHz/2 MB, 8x 1GB DDR-II 667 Lxbench07 2x Clovertown 2.33 GHz/2x 4MB, 8x 2GB DDR-II 667 Lxbench08 2x Harpertown E GHz/2x 4M, 8x 2GB DDR-II 667 Lxcmssrv07 2x Harpertown E GHz/2x4M, 16GB Lxcmssrv08 2x Opteron Barcelona GHz / 4x512kB + 2x2MB, 16GB Desy 2x Harpertown E GHz/2x4M, 16GB Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 4
26
michele michelotto - INFN Padova
SPECint2000 Results Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 8
27
michele michelotto - INFN Padova
SPEC2000 vs. SPEC2006 Workshop CCR 08 LNGS michele michelotto - INFN Padova CERN Benchmarking Cluster - 9
28
michele michelotto - INFN Padova
ATLAS v12 Workshop CCR 08 LNGS michele michelotto - INFN Padova
29
michele michelotto - INFN Padova
LHCB GEN+SIM 4 hours per run Min bias p-p events GEN+SIM Workshop CCR 08 LNGS michele michelotto - INFN Padova
30
michele michelotto - INFN Padova
LHCB - Reconstruction 20 minutes for each run Min bias digitized events as input Workshop CCR 08 LNGS michele michelotto - INFN Padova
31
michele michelotto - INFN Padova
Alice pp Workshop CCR 08 LNGS michele michelotto - INFN Padova
32
michele michelotto - INFN Padova
Alice Pb Pb Workshop CCR 08 LNGS michele michelotto - INFN Padova
33
michele michelotto - INFN Padova
CMS RECO bench01 Workshop CCR 08 LNGS michele michelotto - INFN Padova
34
michele michelotto - INFN Padova
CMS RECO bench04 Workshop CCR 08 LNGS michele michelotto - INFN Padova
35
Alice results (preliminary)
Exp. Results versus … Benchmark Test SPECint2000 SPECint2006 SPECfp2006 pp MinBias GEN+SIM 0.974 0.981 0.980 DIGI 0.949 0.959 0.979 RECO 0.956 0.966 0.989 TOTAL(SUM) 0.965 0.983 PbPb per2 0.976 0.982 fm 0.754 0.752 0.682 0.942 0.943 IHEPCCC/HEPiX Benchmarking WG - 35
36
CMS results (preliminary) (1)
Exp. Result versus… Benchmark Test SPECint2000 SPECint2006 SPECfp2006 HiggsZZ4LM190 GEN+SIM 0.983 0.988 0.986 DIGI 0.971 0.977 0.974 RECO 0.979 0.985 TOTAL(SUM) 0.982 MinBias 0.972 0.978 0.973 0.970 0.976 0.981 0.987 0.984 IHEPCCC/HEPiX Benchmarking WG - 36
37
CMS results (preliminary) (2)
QCD_80_120 GEN+SIM 0.980 0.986 0.984 DIGI 0.973 0.976 RECO 0.975 0.981 0.977 TOTAL(SUM) 0.983 SingleElectronE1000 0.989 0.988 0.970 0.974 0.962 0.968 0.960 0.987 IHEPCCC/HEPiX Benchmarking WG - 37
38
CMS results (preliminary) (3)
QCD_80_120 GEN+SIM 0.980 0.986 0.984 DIGI 0.973 0.976 RECO 0.975 0.981 0.977 TOTAL(SUM) 0.983 SingleElectronE1000 0.989 0.988 0.970 0.974 0.962 0.968 0.960 0.987 IHEPCCC/HEPiX Benchmarking WG - 38
39
michele michelotto - INFN Padova
Conclusion Waiting for an official decision for the next benchmark: SI2006(gcc,parallel): PROBABLY SI2006rate(gcc): SI2006(pubblished): easier to do, but risk of future divergence C++ subset of SI2006 (best fit but risk of future divergence SI2000(gcc): obsolete TODAY: You are supposed to use the old SI2K(gcc-low)+50% Workshop CCR 08 LNGS michele michelotto - INFN Padova
40
michele michelotto - INFN Padova
Workshop CCR 08 LNGS michele michelotto - INFN Padova
41
michele michelotto - INFN Padova
Workshop CCR 08 LNGS michele michelotto - INFN Padova
42
michele michelotto - INFN Padova
Workshop CCR 08 LNGS michele michelotto - INFN Padova
43
michele michelotto - INFN Padova
Workshop CCR 08 LNGS michele michelotto - INFN Padova
44
michele michelotto - INFN Padova
Workshop CCR 08 LNGS michele michelotto - INFN Padova
45
michele michelotto - INFN Padova
Workshop CCR 08 LNGS michele michelotto - INFN Padova
46
michele michelotto - INFN Padova
Workshop CCR 08 LNGS michele michelotto - INFN Padova
47
michele michelotto - INFN Padova
Nehalem Workshop CCR 08 LNGS michele michelotto - INFN Padova
48
michele michelotto - INFN Padova
IMC + cache Integrated Memory controller 2 thread per core Latency cycle L1 L2 L3 Nehalem 2.66 GHz 4 11 39 Penryn Q GHz 3 15 N/A Workshop CCR 08 LNGS michele michelotto - INFN Padova
49
michele michelotto - INFN Padova
2009: Nehalem vs Shangai 45nm 700 M transistor Same architecture AMD uses 2MB's of L2 + 6MB's of L3 for 8MB's total. Intel uses 1MB of L2 + 8MB's of L3 for 9MB's total Time to market? Clock? Price? Workshop CCR 08 LNGS michele michelotto - INFN Padova
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.