Outline Benchmarking in ATLAS Performance scaling

Outline Benchmarking in ATLAS Performance scaling
ATLAS Benchmark pre-GDB, CERN, 7 Feb Alessandro De Salvo on behalf of the ATLAS Distributed Computing group Outline Benchmarking in ATLAS Performance scaling

Benchmarking in ATLAS Two possible options
Running directly on the resources E.g. cloud resources, already addressed in several other talks Running on the sites via the pilot Automated, continuous running along with the standard jobs Possibly limiting the number of times you want to benchmark a single machine Different benchmarking strategies are currently being evaluated by ATLAS, in particular what could be the workflows of the info: From the pilot to some ES From the ES to somewhere else which can be available to our WFMS, and possibly to the pilot itself In this talk we’ll focus on the benchmark with pilots only 2

Benchmark in the pilot Different scenarios in running benchmarks with pilots Running within every pilot and storing the result as a job attribute To maximize the correlation between job efficiency and machine status Running selectively from the pilot, based on the recent results from specific nodes, storing the result as WN attribute To maximize the efficiency and the optimization of our resources Work in progress Not possible/optimal on all resources Cloud already running asynchronously See work done on IaaS resources in Canada HPC will need separate solution too Under discussion Grid only place where it'd work But it doesn't see next slide Not currently running, but can be addressed 3

Pilot and CERN-IT ES Aim was to use current CERN-IT monitoring infrastructure writing from the pilot on the WNs. Same infrastructure already used for general monitoring Pilot can only use a proxy certificate to authenticate AMQ doesn't accept proxy authentication Tried to split the proxy in key/cert pair but real problem is the delegation chain Proxy issuer is the user and the server doesn't recognise it Looking at alternative paths Run benchmark in each pilot and store in panda and then transfer results Run benchmark asynchronously for every resource with some other method and store it somewhere else. Solution should depend on use cases 4

Performance scaling with HC
Procedure Use HammerCloud results to evaluate the ATLAS software performance at sites Not a real benchmark, but giving us a real-life indication of the performance of the nodes in different sites Embedded in the standard procedure operated by ATLAS (Functional Tests) Can be used to see the relative perfomance of specific classes of nodes, normalizing to a reference CPU type and machine The WCT, number of events, node name and all the other relevant parameters are extracted from the ATLAS Analytics platform (kibana), filled with the standard Panda job informations 5

HammerCloud and benchmarking
“standard candle” job running on different sites Same SW release, same input file, same #events Running only 1 event would be faster, but inaccurate As reported in previous talks, the first event of each job is longer than the others, due to some initial tasks running in athena during the first initialization Running more events insures higher accuracy on the measurements of the events throughput of real jobs Single Core (SCORE) standard candle: 25 events mc12_8TeV, Athena , AtlasG4_trf.py E.g. 134 PanDA queues, template: Multi core (MCORE) standard candle: 8 events mc15_13TeV, Athena , Sim_tf.py E.g. 150 PanDA queues, template: 6

Performance scaling with HC
Available informations from Kibana Site name (Panda resource) Node name CPU type WCT per event * cores Average WCT * cores hostname 7

Performance scaling comparison
CPU types Using the CPU types and nodes already benchmarked in FZK by Manfred 3 sets of measurements available for each node class/CPU type HS06 DB12 ATLAS HC WCT Comparison of the WCT per event ratio on different CPU types Normalizing with the least performant node type of the same brand (Intel) Not comparing with AMD as we would have only a single processor type available (Opteron 6138 only) Both SCORE and MCORE show the same behaviour and correlations with HS06 and DB12 This is just a born-level comparison, although encouraging! We might expect different behaviors depending on the process type we use 8

SCORE and MCORE jobs vs HS06
9

SCORE and MCORE jobs vs DB12
10

Conclusions Many thanks to all people involved:
Different benchmarking strategies are being addressed and evaluated by ATLAS ATLAS is aiming to use the standard CERN IT monitoring infrastructure to collect the benchmark data Using different options is possible but not desirable First attempt to evaluate the performance scaling of the ATLAS software for both single and multi core jobs via Hammer Cloud tasks, comparing with HS06 and DB12 results The very same approach could be done in CMS Many thanks to all people involved: Franco Brasolin, Alessandro Di Girolamo, Domenico Giordano, Alessandra Forti, Jaroslava Schovancova and all the (many) others I forgot here for the contributions to this talk 11

Outline Benchmarking in ATLAS Performance scaling

Similar presentations

Presentation on theme: "Outline Benchmarking in ATLAS Performance scaling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Outline Benchmarking in ATLAS Performance scaling

Similar presentations

Presentation on theme: "Outline Benchmarking in ATLAS Performance scaling"— Presentation transcript:

Similar presentations

About project

Feedback