CPU accounting of public cloud resources

Slides:



Advertisements
Similar presentations
Communicating Machine Features to Batch Jobs GDB, April 6 th 2011 Originally to WLCG MB, March 8 th 2011 June 13 th 2012.
Advertisements

Private Cloud or Dedicated Hosts Mason Mabardy & Matt Maples.
Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Cloud Computing Wrap-Up Fernando Barreiro Megino (CERN IT-ES) Sergey Panitkin.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Cloud Status Laurence Field IT/SDC 09/09/2014. Cloud Date Title 2 SaaS PaaS IaaS VMs on demand.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Exploiting Virtualization & Cloud Computing in ATLAS 1 Fernando H. Barreiro.
4/5/20071 The LAW (Linux Applications on Windows) Project Sudhamsh Reddy University of Texas at Arlington.
Machine/Job Features Update Stefan Roiser. Machine/Job Features Recap Resource User Resource Provider Batch Deploy pilot Cloud Node Deploy VM Virtual.
1 The Adoption of Cloud Technology within the LHC Experiments Laurence Field IT/SDC 17/10/2014.
Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
Grid and Cloud Computing Alessandro Usai SWITCH Sergio Maffioletti Grid Computing Competence Centre - UZH/GC3
Benchmarking status Status of Benchmarking Helge Meinhard, CERN-IT WLCG Management Board 14-Jul Helge Meinhard (at) CERN.ch.
Benchmarking Benchmarking in WLCG Helge Meinhard, CERN-IT HEPiX Fall 2015 at BNL 16-Oct Helge Meinhard (at) CERN.ch.
Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
Accounting Update John Gordon and Stuart Pullinger January 2014 GDB.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
APEL Cloud Accounting Status and Plans APEL Team John Gordon.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
ATLAS FroNTier cache consistency stress testing David Front Weizmann Institute 1September 2009 ATLASFroNTier chache consistency stress testing.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
Automated virtualisation performance framework 1 Tim Bell Sean Crosby (Univ. of Melbourne) Jan van Eldik Ulrich Schwickerath Arne Wiebalck HEPiX Fall 2015.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI CERN and HelixNebula, the Science Cloud Fernando Barreiro Megino (CERN IT)
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Accounting Review Summary from the pre-GDB related to CPU (wallclock) accounting Julia Andreeva CERN-IT GDB 13th April
Accounting John Gordon WLC Workshop 2016, Lisbon.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
Using HLRmon for advanced visualization of resource usage Enrico Fattibene INFN - CNAF ISCG 2010 – Taipei March 11 th, 2010.
Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI John Gordon EGI Virtualisation and Cloud Workshop Amsterdam 12 th May 2011.
Panda Monitoring, Job Information, Performance Collection Kaushik De (UT Arlington), Torre Wenaus (BNL) OSG All Hands Consortium Meeting March 3, 2008.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
CERN IT Department CH-1211 Genève 23 Switzerland Benchmarking of CPU servers Benchmarking of (CPU) servers Dr. Ulrich Schwickerath, CERN.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino Simone Campana.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
CERN IT Department CH-1211 Genève 23 Switzerland The CERN internal Cloud Sebastien Goasguen, Belmiro Rodrigues Moreira, Ewan Roche, Ulrich.
How to integrate portals with EGI accounting system R.Graciani EGI TF 2012.
WLCG IPv6 deployment strategy
Review of the WLCG experiments compute plans
Laurence Field IT/SDC Cloud Activity Coordination
Title of the Poster Supervised By: Prof.*********
Virtualization and Clouds ATLAS position
The New APEL Client Will Rogers, STFC.
Connecting LRMS to GRMS
Multi User Pilot Jobs update
VOs and ARC Florido Paganelli, Lund University
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Benchmarking Changes and Accounting
ATLAS Cloud Operations
Outline Benchmarking in ATLAS Performance scaling
How to enable computing
David Cameron ATLAS Site Jamboree, 20 Jan 2017
Raw Wallclock in APEL John Gordon, STFC-RAL
Experiment Dashboard overviw of the applications
Passive benchmarking of ATLAS Tier-0 CPUs
PES Lessons learned from large scale LSF scalability tests
Developments in Batch and the Grid
Discussions on group meeting
WLCG Collaboration Workshop;
VMDIRAC status Vanessa HAMAR CC-IN2P3.
Cloud Computing R&D Proposal
OS Virtualization.
D. van der Ster, CERN IT-ES J. Elmsheuser, LMU Munich
Accounting Repository
Backfilling the Grid with Containerized BOINC in the ATLAS computing
Presentation transcript:

CPU accounting of public cloud resources Fernando H. Barreiro Megino CERN IT-ES-VOS 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

Pre-GDB - CPU Accounting of public cloud resources Disclaimer This is a summary of a discussion held on 12.12.12 between Fernando Barreiro (CERN-IT) Jerome Belleman (CERN-IT) Alessandro Di Girolamo (CERN-IT) Sergey Panitkin (BNL) Lalit Kumar Patel Ulrich Schwickerath (CERN-IT) Ryan Taylor (University of Victoria) First time the summary of the discussion is presented to the public – the ideas or implementation have not been approved by any party Alternative ideas and improvements are welcome 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

Pre-GDB - CPU Accounting of public cloud resources Motivation UVic runs a production queue for ATLAS on various public clouds, but there is no recipe to account these resources yet Complications: there is no access to the bare metal or details about the underlying HW and its usage, e.g. Are all machines in the cloud the same? HS06 Benchmarking CPU overcommitting Proposal to use PanDA (or experiments’ WMS system)and Hammercloud as a complementary accounting source for public clouds 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

Original scenario: Batch 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

Solved scenario: Private cloud 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

New scenario: Public cloud 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

Pre-GDB - CPU Accounting of public cloud resources Possible solution (1) Ceilometer feasible for private clouds: need direct access to the hypervisors to install the client Private OpenStack clouds (e.g. BNL and Nectar) might find this a convenient solution Alternative idea: use the PanDA server as accounting source Fed by the pilots The PanDA database holds the duration of each job and the type of CPU Resources consumed by the workload will always be lower than the delivered resources, since they don’t include things as: Boot-up/shut-down time of a VM VM running partially empty (e.g. 8 core VM using only 1 job slot) Using the CPU information provided by the pilot (retrieved from /proc/cpuinfo) will not be exact: CPU can be reported as emulated CPU (e.g. QEMU Virtual CPU version (cpu64-rhel6) 4096 KB) and not the correct chipset We don’t have HEPSPEC06 benchmarks of the VMs We have to consider effects of over-commitment of CPU and others 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

Pre-GDB - CPU Accounting of public cloud resources Possible solution (2) HammerCloud (HC) can be used to estimate some of the above uncertainties HC submits same jobs to all sites HC stores metrics: Events/second that the CPU of a site was able to process Data mining on HC DB would reveal Ev/s rate of different sites Compare Ev/s metric and use the HEPSPEC06 benchmarks that are available for grid sites Estimate conversion factor Use conversion factor to approximate HS06 value for sites where this is unknown Aside from accounting purposes, useful data could be obtained for informational purposes to compare the Ev/s rate and published HS06 value of sites Identify sites that may be publishing inaccurate or mis-measured HS06 values. Study would show how well the HS06 benchmark represents the typical LHC experiment workload by examining the correlation with the real-world amount of work done 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources

Pre-GDB - CPU Accounting of public cloud resources Conclusions No existing solution to account public cloud resources Proposal to use PanDA (the WMS) and HammerCloud as a complementary source It would be very interesting for all types of resources: To estimate the difference between consumed and delivered resources To compare the Ev/s rates and published HS06 benchmarks of sites See how well the HS06 benchmarks represent the typical LHC experiment workload 9/11/2018 Pre-GDB - CPU Accounting of public cloud resources