The “Understanding Performance!” team in CERN IT

Slides:

Advertisements

Similar presentations

Introduction to Systems Architecture Kieran Mathieson.

Advertisements

IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’

1 Martin Schulz, Lawrence Livermore National Laboratory Brian White, Sally A. McKee, Cornell University Hsien-Hsin Lee, Georgia Institute of Technology.

Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

ICE-DIP Mid-term review Data Transfer WP4a - ESR4: Aram Santogidis › 16/1/2015.

CERN Physics Database Services and Plans Maria Girone, CERN-IT

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

1 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders.

Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

Ian Bird CERN, 17 th July 2013 July 17, 2013

Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.

Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,

Computer Architecture Organization and Architecture

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

HTCC coffee march /03/2017 Sébastien VALAT – CERN.

IML News Sergei Gleyzer (CMS)

Performance Assurance for Large Scale Big Data Systems

WLCG IPv6 deployment strategy

Review of the WLCG experiments compute plans

Monitoring Evolution and IPv6

WLCG Workshop 2017 [Manchester] Operations Session Summary

Ian Bird WLCG Workshop San Francisco, 8th October 2016

“A Data Movement Service for the LHC”

Inter-experimental LHC Machine Learning Working Group Activities

Virtualization and Clouds ATLAS position

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017

Data Analytics and CERN IT Hadoop Service

Computing models, facilities, distributed computing

SuperB and its computing requirements

Methodology: Aspects: cost models, modelling of system, understanding of behaviour & performance, technology evolution, prototyping  Develop prototypes.

Outline Benchmarking in ATLAS Performance scaling

Investigation of the improved performance on Haswell processors

FUTURE ICT CHALLENGES IN SCIENTIFIC COMPUTING

CMSC 611 Advanced Computer Arch.

for the Offline and Computing groups

Sergei V. Gleyzer University of Florida

How to enable computing

Controlling a large CPU farm using industrial tools

Markus Schulz Understanding Performance CERN-IT-DI-LCG

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Understanding and modelling of the distributed infrastructure & computing models With H-LHC in mind.

Metrics? Efficiency? Cost?

System performance and cost model working group

ALICE Computing Model in Run3

Vision for CERN IT Department

WLCG Collaboration Workshop;

Monitoring of the infrastructure from the VO perspective

The latest developments in preparations of the LHC community for the computing challenges of the High Luminosity LHC Dagmar Adamova (NPI AS CR Prague/Rez)

New strategies of the LHC experiments to meet

US ATLAS Physics & Computing

Grid Canada Testbed using HEP applications

CMSC 611 Advanced Computer Arch.

Ivan Reid (Brunel University London/CMS)

Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J

Performance And Scalability In Oracle9i And SQL Server 2000

Computing at the HL-LHC

Workflow and HPC erhtjhtyhy Doug Benjamin Argonne National Lab.

The LHCb Computing Data Challenge DC06

Presentation transcript:

The “Understanding Performance!” team in CERN IT Andrea Valassi (IT-DI-LCG) on behalf of the ¡UP! team WLCG Workshop – 9th October 2016

Motivation HL-LHC will create a computational problem More data, higher complexity (luminosity, pileup, trigger, detector upgrades...) Gap between computing needs and resources at flat budget ALICE and LHCb will already face a challenge for Run3 Long-term activity needed on common performance studies and tools Link the many parallel (sometimes isolated) activities in the community Aggregate knowledge about existing/new tools for software optimization Shift focus from aggregating to analyzing WLCG monitoring data Aim for quantitative assessment of the performance impact of changes Proof-of-concept implementation of new software and infrastructure High level models of workloads and infrastructures Promote and contribute to common activities between the experiments Play role in WLCG architecture/planning activities Move cost and efficiency into the spotlight

People Team in IT Members Part of WLCG team (IT-DI-LCG) All involved also in other activities, varying % on UP Links with HEP Software Foundation and with the experiments Co-organized performance session of HSF Workshop in May at Orsay Links with Software Technology (formerly Concurrency) Forum and EP-SFT Links with Openlab, Techlab and other groups in CERN IT Members Nathalie Rauschmayr Markus Schulz Andrea Sciabà David Smith Andrea Valassi

Current activities Getting an overview of existing activities Within CERN IT and within the community Helping to link activities CERN IT Analytics WG, ATLAS workflow performance WG, HSF workshop… Developing tools and procedures By working on concrete experiment applications, workflows and logs Ranging from macro-level workflows to micro-level software Very brief overview of a few specific examples on the following pages

Software-related activities Analysis of memory usage/dynamics and development of related tools FOM-tools (Find Obsolete Memory), x32-ABI re-evaluation Evaluation of tracing tools (SystemTap), studies on CPU hardware counters Aggregate knowledge about other existing tools (Coverity, VTune...) Studies on Feedback-Directed Optimization Job profiles helping compilers to auto optimize code (AutoFDO) Investigating differences between Intel microarchitectures IPC (instructions per cycle) on Haswell vs Ivy Bridge for some simulation jobs Analyzing effects at the level of instruction caches and branch prediction units Analysis of a few specific software components (e.g. Sherpa MC generator)

Workflow and infrastructure activities Analysis of ATLAS jobs on the HLT farms I/O patterns and constraints, workload behavior over time, OS and VM tuning Analysis of ATLAS Panda logs (being expanded also to CMS logs) Differences between sites and between job types Focus on understanding workflow efficiency Fit CPU “speed” using data analytics and compare to benchmarks Set-up of a small cluster of 16 physical machines for performance studies Batch “standard” nodes, complementary to Openlab/Techlab “exotic” nodes Evaluation of computing on storage servers (Andrei Kiryanov) Most WLCG storage servers have low CPU utilization Prediction of available network capacity (Hendrik Borras, Marian Babik)

Next steps Continue with current activities Expand to other experiments (talk to us!) and abstract commonalities Identify or build a set of reference “candles” on well defined environments With the goal of allowing comparisons with measurements at grid sites Start work on a cost model for a small number of workloads With the goal of predicting quantitative effect of infrastructure changes With HSF organize a one-day performance workshop Identify joined projects and build a common roadmap Contribute to the proposed Community White Paper