A comparison of HEP code with SPEC benchmark on multicore worker nodes HEPiX Benchmarking Group Michele Michelotto at pd.infn.it.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

Computer Abstractions and Technology
Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
Test results Test definition (1) Istituto Nazionale di Fisica Nucleare, Sezione di Roma; (2) Istituto Nazionale di Fisica Nucleare, Sezione di Bologna.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Transition to a new CPU benchmark on behalf of the “GDB benchmarking WG”: HEPIX: Manfred Alef, Helge Meinhard, Michelle Michelotto Experiments: Peter Hristov,
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Testing Virtual Machine Performance Running ATLAS Software Yushu Yao Paolo Calafiura LBNL April 15,
Computer Performance Computer Engineering Department.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
CDA 3101 Discussion Section 09 CPU Performance. Question 1 Suppose you wish to run a program P with 7.5 * 10 9 instructions on a 5GHz machine with a CPI.
HS06 on new CPU, KVM virtual machines and commercial cloud Michele Michelotto 1.
Fast Benchmark Michele Michelotto – INFN Padova Manfred Alef – GridKa Karlsruhe 1.
Benchmarking status Status of Benchmarking Helge Meinhard, CERN-IT WLCG Management Board 14-Jul Helge Meinhard (at) CERN.ch.
CERN IT Department CH-1211 Genève 23 Switzerland t IHEPCCC/HEPiX benchmarking WG Helge Meinhard / CERN-IT LCG Management Board 11 December.
Benchmarking Benchmarking in WLCG Helge Meinhard, CERN-IT HEPiX Fall 2015 at BNL 16-Oct Helge Meinhard (at) CERN.ch.
Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.
HS06 on last generation of HEP worker nodes Berkeley, Hepix Fall ‘09 INFN - Padova michele.michelotto at pd.infn.it.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks John Gordon SA1 Face to Face CERN, June.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.
HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
From Westmere to Magny-cours: Hep-Spec06 Cornell U. - Hepix Fall‘10 INFN - Padova michele.michelotto at pd.infn.it.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)
Definitions Information System Task Force 8 th January
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.
CERN IT Department CH-1211 Genève 23 Switzerland t IHEPCCC/HEPiX benchmarking WG Helge Meinhard / CERN-IT Grid Deployment Board 09 January.
Measuring Performance Based on slides by Henri Casanova.
SI2K and beyond Michele Michelotto – INFN Padova CCR – Frascati 2007, May 30th.
Dynamic Branch Prediction
Benchmarking of CPU models for HEP application
Measuring Performance II and Logic Design
Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it
WLCG Workshop 2017 [Manchester] Operations Session Summary
CSCI206 - Computer Organization & Programming
CCR Autunno 2008 Gruppo Server
Gruppo Server CCR michele.michelotto at pd.infn.it
CS161 – Design and Architecture of Computer Systems
How to benchmark an HEP worker node
The “Understanding Performance!” team in CERN IT
Outline Benchmarking in ATLAS Performance scaling
Gruppo Server CCR michele.michelotto at pd.infn.it
How INFN is moving out of SI2K has a benchmark for Worker Nodes
What happens inside a CPU?
Transition to a new CPU benchmark
CERN Benchmarking Cluster
CSCI206 - Computer Organization & Programming
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
CMSC 611: Advanced Computer Architecture
Dynamic Branch Prediction
CMSC 611: Advanced Computer Architecture
Adapted from the slides of Prof
Presentation transcript:

A comparison of HEP code with SPEC benchmark on multicore worker nodes HEPiX Benchmarking Group Michele Michelotto at pd.infn.it

CHEP09michele michelotto - INFN Padova2 What is HEPiX? HEPiX: –An international group of Unix users and administrator from cooperating HEP institutions and HEP data center Initial focus: –enhance Unix in a standard way, like was done inside HEPVM in the 80’s. Now: –more focus on sharing of experiences, documentation, code and best practices, in all area of computing (Linux, Windows, Mail, Spam, AAA, Security, Infrastructures)

CHEP09michele michelotto - INFN Padova3 HEPiX Meeting A Yearly HEPiX meeting in Spring (Europe) A Yearly HEPix meeting in Fall (North America) Most recent meeting was at ASGC, Taipei (the Taiwan Tier1) Next meeting at Umeå univ. (Sweden), May 25-29, 2009 Each meeting ~100 users, ~50 talks and many open discussions To join: –Send an message to: –Leave the subject line blank –Type "SUBSCRIBE HEPiX-hepnt FIRSTNAME LASTNAME" (without the quotation marks) in the body of your message.

CHEP09michele michelotto - INFN Padova4 HEPiX Benchmarking WG Since about 2004 several HEPiX users were presenting measurements on performances and benchmarking Anomalies in performances between application code and SI2K In 2006 a Working Group, chaired by Helge Meinhard (CERN) was setup inside HEPiX to address those issues We requested an help from the major HEP experiments

CHEP09michele michelotto - INFN Padova5 The Group People from HEPiX –Helge Meinhard (chair, CERN IT) –Peter Wegner (Desy) –Martin Bly (RAL) –Manfred Alef (FZK Karlsruhe) –Michele Michelotto (INFN, Padova) –Ian Gable (Victoria CA) –Andreas Hirstius (CERN, OpenLab) –Alex Iribarren (CERN IT) People sent by the Experiments: –CMS: Gabriele Benelli –ATLAS: Franco Brasolin, Alessandro De Salvo –LHCB: Hubert Degaudenzi –ALICE: Peter Hristov

CHEP09michele michelotto - INFN Padova6 What is SPEC? SPEC –“ : a non profit corporation that establish maintains and endorses a set of computer related benchmarks” SPEC CPU –“Designed to provide performance measurements that can be used to compare compute-intensive workloads on different computer systems“ History –Before SPEC: CERN UNIT, MIPS, VUPS (Lep Era) –After SPEC: SPEC89, CPU92, CPU95, CPU2000, CPU2006

CHEP09michele michelotto - INFN Padova7 Why INT ? Since SPEC CPU 92 the HEP world decide to use INT as reference instead of FP (Floating Point) HEP programs of course make use of FP instructions but with minimal inpact on benchmarks I’ve never seen a clear proof of it

CHEP09michele michelotto - INFN Padova8 The mythical SI2K SPEC CPU INT 2000 shortened as SI2K The “Unit of Measure” –For all the LHC Computing TDR –For the WLCG MoU –For the resources pledged by the Tier [0,1,2] –Therefore used in tender for computer procurements

CHEP09michele michelotto - INFN Padova9 The measured SI2K Results taken from for different processors showed good linearity with HEP applications up to ~ Y2005www.spec.org HEP applications use Linux + gcc SPEC.org makes measurements on Linux/Win + Intel or Pathscale compiler If you run SPEC on Linux+gcc you obtain a smaller value (less optimization) Is it proportional to SPEC.org or to HEP applications?

CHEP09michele michelotto - INFN Padova10 SI2K measurement Take your typical WN; a dual proc with Linux + gcc Compile it in your typical environment with typical optimisation –for GridKa: “gcc –O3 –march=$ARCH” –for Cern (LHC): “gcc –O2 –fPIC –pthread” If you have N cores  Run N instances of SPEC INT in parallel In 2001 GridKa / Spec.org ratio was 80% So they needed to apply a scaling factor of +25%

CHEP09michele michelotto - INFN Padova11 The SI2K inflation Blue is the value measured with gcc and GridKa tuning Yellow is the 25% scaling to normalize to 2001 Red is the value published by spec.org

CHEP09michele michelotto - INFN Padova12 SPEC CERN and SPEC LCG At HEPiX meetings since 2005, people presented measurement showing the correlation of HEP application with SPEC measured Of course lack of linearity with spec.org Interim solution –Make measurement with Cern tuning (gcc -O2 – fPIC –pthread) –Add +50% to normalize to 2001 –This was the SI2K LCG to be used for the pledges

CHEP09michele michelotto - INFN Padova13 Too many SI2K? Too many definition of SI2K around E.g. take a common processor like an Intel Woodcrest dual core 5160 at 3.06 GHz SI2K spec.org: 2929 – 3089 (min – max) SI2K sum on 4 cores: SI2K gcc-cern: 5523 SI2K gcc-gridka: 7034 SI2K cern + 50%: 8284

CHEP09michele michelotto - INFN Padova14 Transition to CPU 2006 The use of the SI2K-LCG was a good INTERIM solution In 2006 SPEC published CPU 2006 and stopped the maintenance on CPU 2000 Impossibile to find SI2000 from SPEC for the new processor Impossibile to find SI2006 for old processor Time to move to a benchmark of CPU 2006 family?

CHEP09michele michelotto - INFN Padova15 CPU 2006 What’s new: –Larger memory footprint: from ~200MB per core to about 1GB per core in 32bit environment –Run longer (1 day vs 1 hour) –CPU 2000 fitted too much in L2 caches –INT: 12 CPU intensive applications written in C and C++ –FP: 17 CPU intensive applications written in C, C++ and Fortran

CHEP09michele michelotto - INFN Padova16 The HEPiX WG In the HEPiX Fall 2006 meeting at JLAB a group, chaired by H.Meinhard (CERN-IT) started a detailed study of CPU2006 We needed to compare CPU 2000 and CPU 2006 with HEP applications We found a good collaboration with LHC experiments thank to the push of WLCG Grid Deployment Board

CHEP09michele michelotto - INFN Padova17 SPEC rate vs parallel SPEC Rate syncronizes all the cores at the end of each test We preferred to emulate the batch-like environment of our farms using multiple parallel run Noticeable effect if the WN has four or more cores

CHEP09michele michelotto - INFN Padova18 Lxbench cluster We needed and obtained a set of dedicated Worker Nodes at CERN –To measure SI2000, 32 and 64 bit –To measure CPU 2006, INT and FP, 32 and 64 bit –To measure on EXACTLY the same machines the LHC applications performances –All dual processor, both Intel and Amd, single core, dual core, quad core –Plus other “control” machines from INFN, DESY, GridKa, RAL

CHEP09michele michelotto - INFN Padova19 HEP Applications Atlas provided results for: –Event Generation, Simulation, Digitization, Reconstruction, Total (Full chain production) Alice: –Gen+Sim, Digitization, Reconstruction and Total LHCB: –Gen+Sim CMS –Gen+Sim, Digitization, Reconstruction and Total –For several Physics Processes (Minimum Bias, QCD Jets, TTbar, Higgs in 4 lepton, single particle gun events ) to see if some physics channel would produce something different

CHEP09michele michelotto - INFN Padova20 Results Very good correlation (>90%) for all experiments Both SI2006 and SFP2006 (multiple parallel) could be good substitute for SI2000 Interesting talk from Andreas Hirstius from CERN-IT Openlab at HEPiX Spring 08 on “perfmon”

CHEP09michele michelotto - INFN Padova21 perfmon Measure a large number of hardware performance counter events ~100 events/4-5 counters on Intel/Amd Very little overhead What do we measure: –Cycle per instruction, Load/Store inst., x87 or SIMD inst., % of mispredicted branches, L2 cache misses, data bus utilization, resource stall…

CHEP09michele michelotto - INFN Padova22 Perfmon on lxbatch Perfom was run on 5 nodes of lxbatch for one month to measure the average behaviour of real HEP applications Compared with SPEC CPU: 2000 and 2006 Int, Fp and CPP CPP is the subset of all CPP test in CPU 2006 CPP showed a good match with average lxbatch e.g. for FP+SIMD, Loads and Stores and Mispredicted Branches

CHEP09michele michelotto - INFN Padova omnetpp 473.astar 483.xalancbmk 444.amd 447.dealII 450.soplex 453.povray SPEC CPP Integer tests Floating Point tests

CHEP09michele michelotto - INFN Padova24 FP usage in cpp_all Negligible usage of FP instructions in INT 2000 and 2006 similar in “average lxbatch” and “cpp_all”.

CHEP09michele michelotto - INFN Padova25 Relative performances

CHEP09michele michelotto - INFN Padova26 The choice SPECint2006 (12 applications) –Well established, published values available –HEP applications are mostly integer calculations –Correlations with experiment applications shown to be fine SPECfp2006 (17 applications) –Well established, published values available –Correlations with experiment applications shown to be fine SPECall_cpp2006 (7 applications) –Exactly as easy to run as is SPECint2006 or SPECfp2006 –No published values (not necessarily a drawback) –Takes about 6 h (SPECint2006 or SPECfp2006 are about 24 h) –Best modeling of FP contribution to HEP applications –Important memory footprint Proposal to WLCG to adopt SPECall_cpp 2006, in parallel and to call it HEP SPEC06

CHEP09michele michelotto - INFN Padova27 Hep-Spec06 MachineSPEC2000SPEC2006 int 32SPEC2006 fp 32SPEC2006 CPP 32 lxbench lxbench lxbench lxbench lxbench lxbench lxbench lxbench

CHEP09michele michelotto - INFN Padova28 Conversion factor Choose an approximate conversion factor (~5%) Give more weight to modern processors We choose a ratio of “4” to stress that we care more easiness of portability than extreme precision To validate we measured the whole GridKa and found the same number

CHEP09michele michelotto - INFN Padova29 Atlas Generation

CHEP09michele michelotto - INFN Padova30 Atlas Digi and Reco

CHEP09michele michelotto - INFN Padova31 Atlas Sim and Total

CHEP09michele michelotto - INFN Padova32 CMS slowest 5 candles

CHEP09michele michelotto - INFN Padova33 CMS fastest 2 candles

CHEP09michele michelotto - INFN Padova34 Alice pp

CHEP09michele michelotto - INFN Padova35 Alice PbPb

CHEP09michele michelotto - INFN Padova36 LHCb pp