Outline Benchmarking in ATLAS Performance scaling

Slides:

Advertisements

Similar presentations

CHEP 2015 Analysis of CERN Computing Infrastructure and Monitoring Data Christian Nieke, CERN IT / Technische Universität Braunschweig On behalf of the.

Advertisements

Moving out of SI2K How INFN is moving out of SI2K as a benchmark for Worker Nodes performance evaluation Michele Michelotto at pd.infn.it.

Test results Test definition (1) Istituto Nazionale di Fisica Nucleare, Sezione di Roma; (2) Istituto Nazionale di Fisica Nucleare, Sezione di Bologna.

Transition to a new CPU benchmark on behalf of the “GDB benchmarking WG”: HEPIX: Manfred Alef, Helge Meinhard, Michelle Michelotto Experiments: Peter Hristov,

Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &

ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.

PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

PanDA Summary Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.

Experiment Support ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario.

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.

Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.

Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,

ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.

PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.

Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.

Wataru Takase, Tomoaki Nakamura, Yoshiyuki Watase, Takashi Sasaki

Daniele Bonacorsi Andrea Sciabà

Accessing the VI-SEEM infrastructure

WLCG IPv6 deployment strategy

Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it

Review of the WLCG experiments compute plans

Monitoring Evolution and IPv6

WLCG Workshop 2017 [Manchester] Operations Session Summary

DPM at ATLAS sites and testbeds in Italy

OPERATING SYSTEMS CS 3502 Fall 2017

Update on CERN IT Unified Monitoring Architecture (UMA)

Xiaomei Zhang CMS IHEP Group Meeting December

Virtualization and Clouds ATLAS position

Overview of the Belle II computing

Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław

The “Understanding Performance!” team in CERN IT

ATLAS Cloud Operations

POW MND section.

How INFN is moving out of SI2K has a benchmark for Worker Nodes

Benchmarking Suite: Update

How to enable computing

David Cameron ATLAS Site Jamboree, 20 Jan 2017

PanDA in a Federated Environment

Readiness of ATLAS Computing - A personal view

Passive benchmarking of ATLAS Tier-0 CPUs

Job workflow Pre production operations:

PES Lessons learned from large scale LSF scalability tests

CPU accounting of public cloud resources

Building Grids with Condor

AliEn central services (structure and operation)

FCT Follow-up Meeting 31 March, 2017 Fernando Meireles

Transition to a new CPU benchmark

WLCG Collaboration Workshop;

This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.

Cloud Computing R&D Proposal

Monitoring of the infrastructure from the VO perspective

Grid Canada Testbed using HEP applications

A Simulator to Study Virtual Memory Manager Behavior

DGAS Today and tomorrow

LO2 – Understand Computer Software

Exploit the massive Volunteer Computing resource for HEP computation

Exploring Multi-Core on

Presentation transcript:

Outline Benchmarking in ATLAS Performance scaling ATLAS Benchmark pre-GDB, CERN, 7 Feb 2017 Alessandro De Salvo on behalf of the ATLAS Distributed Computing group Outline Benchmarking in ATLAS Performance scaling

Benchmarking in ATLAS Two possible options Running directly on the resources E.g. cloud resources, already addressed in several other talks Running on the sites via the pilot Automated, continuous running along with the standard jobs Possibly limiting the number of times you want to benchmark a single machine Different benchmarking strategies are currently being evaluated by ATLAS, in particular what could be the workflows of the info: From the pilot to some ES From the ES to somewhere else which can be available to our WFMS, and possibly to the pilot itself In this talk we’ll focus on the benchmark with pilots only 2

Benchmark in the pilot Different scenarios in running benchmarks with pilots Running within every pilot and storing the result as a job attribute To maximize the correlation between job efficiency and machine status Running selectively from the pilot, based on the recent results from specific nodes, storing the result as WN attribute To maximize the efficiency and the optimization of our resources Work in progress Not possible/optimal on all resources Cloud already running asynchronously See work done on IaaS resources in Canada HPC will need separate solution too Under discussion Grid only place where it'd work But it doesn't see next slide Not currently running, but can be addressed 3

Pilot and CERN-IT ES Aim was to use current CERN-IT monitoring infrastructure writing from the pilot on the WNs. Same infrastructure already used for general monitoring Pilot can only use a proxy certificate to authenticate AMQ doesn't accept proxy authentication Tried to split the proxy in key/cert pair but real problem is the delegation chain Proxy issuer is the user and the server doesn't recognise it Looking at alternative paths Run benchmark in each pilot and store in panda and then transfer results Run benchmark asynchronously for every resource with some other method and store it somewhere else. Solution should depend on use cases 4

Performance scaling with HC Procedure Use HammerCloud results to evaluate the ATLAS software performance at sites Not a real benchmark, but giving us a real-life indication of the performance of the nodes in different sites Embedded in the standard procedure operated by ATLAS (Functional Tests) Can be used to see the relative perfomance of specific classes of nodes, normalizing to a reference CPU type and machine The WCT, number of events, node name and all the other relevant parameters are extracted from the ATLAS Analytics platform (kibana), filled with the standard Panda job informations 5

HammerCloud and benchmarking “standard candle” job running on different sites Same SW release, same input file, same #events Running only 1 event would be faster, but inaccurate As reported in previous talks, the first event of each job is longer than the others, due to some initial tasks running in athena during the first initialization Running more events insures higher accuracy on the measurements of the events throughput of real jobs Single Core (SCORE) standard candle: 25 events mc12_8TeV, Athena 17.2.2, AtlasG4_trf.py E.g. http://bigpanda.cern.ch/job?pandaid=3220691634 134 PanDA queues, template: http://hammercloud.cern.ch/hc/app/atlas/template/880/ Multi core (MCORE) standard candle: 8 events mc15_13TeV, Athena 19.2.3, Sim_tf.py E.g. http://bigpanda.cern.ch/job?pandaid=3221073145 150 PanDA queues, template: http://hammercloud.cern.ch/hc/app/atlas/template/843/ 6

Performance scaling with HC Available informations from Kibana Site name (Panda resource) Node name CPU type WCT per event * cores Average WCT * cores hostname 7

Performance scaling comparison CPU types Using the CPU types and nodes already benchmarked in FZK by Manfred 3 sets of measurements available for each node class/CPU type HS06 DB12 ATLAS HC WCT Comparison of the WCT per event ratio on different CPU types Normalizing with the least performant node type of the same brand (Intel) Not comparing with AMD as we would have only a single processor type available (Opteron 6138 only) Both SCORE and MCORE show the same behaviour and correlations with HS06 and DB12 This is just a born-level comparison, although encouraging! We might expect different behaviors depending on the process type we use 8

SCORE and MCORE jobs vs HS06 9

SCORE and MCORE jobs vs DB12 10

Conclusions Many thanks to all people involved: Different benchmarking strategies are being addressed and evaluated by ATLAS ATLAS is aiming to use the standard CERN IT monitoring infrastructure to collect the benchmark data Using different options is possible but not desirable First attempt to evaluate the performance scaling of the ATLAS software for both single and multi core jobs via Hammer Cloud tasks, comparing with HS06 and DB12 results The very same approach could be done in CMS Many thanks to all people involved: Franco Brasolin, Alessandro Di Girolamo, Domenico Giordano, Alessandra Forti, Jaroslava Schovancova and all the (many) others I forgot here for the contributions to this talk 11