Arne Wiebalck -- VM Performance: I/O

Slides:



Advertisements
Similar presentations
Netbus: A Transparent Mechanism for Remote Device Access in Virtualized Systems Sanjay Kumar PhD Student Advisor: Prof. Karsten Schwan.
Advertisements

Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Differentiated I/O services in virtualized environments
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
Ceph vs Local Storage for Virtual Machine 26 th March 2015 HEPiX Spring 2015, Oxford Alexander Dibbo George Ryall, Ian Collier, Andrew Lahiff, Frazer Barnsley.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
24 February 2015 Ryota Mibu, NEC
CERN Cloud Infrastructure Report 2 Arne Wiebalck for the CERN Cloud Team HEPiX Autumn Meeting Lincoln, Nebraska, U.S. Oct 17, 2014 Arne Wiebalck: CERN.
Introduction to DoC Private Cloud
IOFlow: A Software-defined Storage Architecture Eno Thereska, Hitesh Ballani, Greg O’Shea, Thomas Karagiannis, Antony Rowstron, Tom Talpey, Richard Black,
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC
Presented by : Ran Koretzki. Basic Introduction What are VM’s ? What is migration ? What is Live migration ?
CERN Cloud Infrastructure Report 2 Bruno Bompastor for the CERN Cloud Team HEPiX Spring 2015 Oxford University, UK Bruno Bompastor: CERN Cloud Report.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
1 The Virtual Reality Virtualization both inside and outside of the cloud Mike Furgal Director – Managed Database Services BravePoint.
Windows Server 2012 VSP Windows Kernel Applications Non-Hypervisor Aware OS Windows Server 2008, 2012 Windows Kernel VSC VMBus Emulation “Designed for.
Key Perf considerations & bottlenecks Windows Azure VM characteristics Monitoring TroubleshootingBest practices.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
BESIII physical offline data analysis on virtualization platform Qiulan Huang Computing Center, IHEP,CAS CHEP 2015.
Ceph Storage in OpenStack Part 2 openstack-ch,
Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,
Improving Network I/O Virtualization for Cloud Computing.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
RD Connection Brokers Personal Desktop Pooled Desktops RD WEB Session Hosts VDI Corp LAN User login Get list of published apps & collections User.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Consolidation and Optimization Best Practices: SQL Server 2008 and Hyper-V Dandy Weyn | Microsoft Corp. Antwerp, March
Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413.
Session Objectives And Takeaways A word on Perf & VDI architecture.
Introduction to virtualization
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Control System Virtualization for the LHCb Online System ICALEPCS – San Francisco Enrico Bonaccorsi, (CERN)
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.
Automated virtualisation performance framework 1 Tim Bell Sean Crosby (Univ. of Melbourne) Jan van Eldik Ulrich Schwickerath Arne Wiebalck HEPiX Fall 2015.
Background Computer System Architectures Computer System Software.
CERN Cloud Infrastructure Report 2 Arne Wiebalck for the CERN Cloud Team HEPiX Spring Meeting DESY, Zeuthen, Germany Apr 19, 2019 Numbers Operations What’s.
Introduction to Exadata X5 and X6 New Features
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Module Objectives At the end of the module, you will be able to:
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Improving resilience of T0 grid services Manuel Guijarro.
Performance analysis comparison Andrea Chierici Virtualization tutorial Catania 1-3 dicember 2010.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Lessons learned administering a larger setup for LHCb
RHEV Platform at LHCb Red Hat at CERN 17-18/1/17
Running virtualized Hadoop, does it make sense?
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Virtualization in the gLite Grid Middleware software process
Discussions on group meeting
Software Architecture in Practice
Comparison of the Three CPU Schedulers in Xen
Specialized Cloud Mechanisms
Specialized Cloud Architectures
Windows Virtual PC / Hyper-V
Presentation transcript:

Arne Wiebalck -- VM Performance: I/O LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O In this talk: I/O only Most issues were I/O related Symptom: High IOwait You can optimize this Understand service offering Tune your VM CPU being looked at Host-side, too early Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O Hypervisor limits Two-disk RAID-1 Effectively one disk Disk is a shared resource IOPS / #VMs Your VM is co-hosted with 7 other VMs 4 cores 8 GB RAM RAM CPUs Disk pro rata IOPS 25 IOPS virtual machine hypervisor User expectation: ephemeral disk ≈ local disk Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O Ask not only what the VM can do for you … Arne Wiebalck -- VM Performance: I/O

Tip 1: Minimize disk usage Use tmpfs Reduce logging Configure AFS memory caches Supported in Puppet Use volumes … Arne Wiebalck -- VM Performance: I/O

Tip 2: Check IO Scheduling The “lxplus problem” VMs use ‘deadline’ elevator Set by ‘virtual-guest’ tuned profile, RH’s default for VMs Not always ideal (interactive machines): ‘deadline’ prefers reads, can delay writes (default: 5 secs) Made to allow reads under heavy load (webserver) lxplus: sssd makes DB updates during login IO scheduler on the VM changed to CFQ Completely Fair Queuing Benchmark: login loop Arne Wiebalck -- VM Performance: I/O

‘deadline’ vs. ‘CFQ’ IO benchmark starts Switch from ‘deadline’ to ‘CFQ’ IO benchmarks continues Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O Tip 3: Use volumes Volumes are networked virtual disks Show up as block devices Attached to one VM at a time Arbitrary size (within your quota) Provided by Ceph (and NetApp) QoS for IOPS and bandwidth Allows to offer different types Arne Wiebalck -- VM Performance: I/O

Volumes types Name Bandwidth IOPS Comment standard 80MB/s 100 io1 std disk performance io1 120MB/s 500 quota on request cp1 critical power cp2 critical power Windows only (in preparation) Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O Ceph volumes at work IOPS QoS changed from 100 to 500 IOPS ATLAS TDAQ monitoring application Y- Axis: CPU % spent in IOwait Blue: CVI VM (h/w RAID-10 with cache) Yellow: OpenStack VM EGI Message Broker monitoring Y- Axis: Scaled CPU load (5 mins of load / #cores) IOPS QoS changed from 100 to 500 IOPS Arne Wiebalck -- VM Performance: I/O

Tip 4: Use SSD block level caching SSDs as disks in hypervisors would solve all IOPS and latency issues But still (too expensive and) too small Compromise: SSD block level caching flashcache (from Facebook, used at CERN for AFS before) dm-cache (in-kernel since 3.9, rec. by RedHat, in CentOS7) bcache (in kernel since 3.10) Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O bcache Change cache mode at runtime Think SSD replacements Strong error-handling Flush and bypass on error Easy setup Transparent for VM Need special kernel for HV RAM RAM CPUs Disk 100 IOPS SSD 20k IOPS hypervisor Arne Wiebalck -- VM Performance: I/O

Switch cache mode from ‘none’ bcache in action (2) On a 4 VM hypervisor: ~25 IOPS/VM  ~1000 IOPS/VM Benchmarking a caching system is non-trivial: - SSD performance can vary over time - SSD performance can vary between runs - Data distribution important (c.f. Zipf) Switch cache mode from ‘none’ to ‘writeback’ Benchmark ended, #threads decreased Arne Wiebalck -- VM Performance: I/O

bcache and Latency Caveat: SSD failures are fatal! SSD block level caching sufficient for IOPS and latency demands. Blue: CVI VM (h/w RAID-10 w/ cache) Yellow: OpenStack VM Red: OpenStack on bcache HV Use a VM on a bcache hypervisor Caveat: SSD failures are fatal! Clients: lxplus, ATLAS build service, CMS Frontier, root, … (16 tenants) Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O “Tip” 5: KVM caching I/O from the VM goes directly to the disk Required for live migration Not optimal for performance I/O can be cached on the hypervisor Operationally difficult No live migration Done for batch 4 cores 8 GB RAM RAM CPUs Disk virtual machine hypervisor Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O KVM caching in action ATLAS SAM VM (‘none’ to ‘write-back’) Batch nodes with and without KVM caching Arne Wiebalck -- VM Performance: I/O

Arne Wiebalck -- VM Performance: I/O Take home messages The Cloud service offers various options to improve the I/O performance of your VMs You need to analyze and pick the right one for your use case Reduce I/O Check I/O scheduler Use volumes Use SSD hypervisors (Use KVM caching) Get in touch with the Cloud team in case you need assistance! Arne Wiebalck -- VM Performance: I/O