© 2014 VMware Inc. All rights reserved. Performance Management Iwan ‘e1’ Rahabok Staff SE (Strategic Accounts) & CTO Ambassador

Slides:



Advertisements
Similar presentations
Housekeeping Utilities for VMware. 11 June Housekeeping is preparing meals for oneself and family and the managing of other domestic concerns.
Advertisements

Key Metrics for Effective Storage Performance and Capacity Reporting.
Capacity Planning in a Virtual Environment
System Center 2012 R2 Overview
© 2014 VMware Inc. All rights reserved. Characterizing Cloud Management Performance Adarsh Jagadeeshwaran CMG INDIA CONFERENCE, December 12, 2014.
© 2009 VMware Inc. All rights reserved VMware vCenter Operation Manager Karoly Szalai, Technical Support Engineer CCNP, VCP 3/4/5, VCAP4-DCA.
VSphere 4 Best Practices/ Common Support Issues Paul Hill Research Engineer, System Management VMware.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
Managing the Capacity and Performance of a VMware Cluster environment Presented by: Pete Weilnau CTO PERFMAN
Virtualization and Cloud Computing Virtualization David Bednárek, Jakub Yaghob, Filip Zavoral.
Managing storage requirements in VMware Environments October 2009.
Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation.
Virtualization Terminology and Concepts
Yes, yes it does! 1.Guest Clustering is supported with SQL Server when running a guest operating system of Windows Server 2008 SP2 or newer.
vSphere 5 Changes for Backups and Administration Rick Vanover MCITP vExpert VCP Veeam Software.
Virtualization 101.
Virtualization Infrastructure Administration Cluster Jakub Yaghob.
Scalability Module 6.
Virtualization Performance H. Reza Taheri Senior Staff Eng. VMware.
VMware vSphere 4 Introduction. Agenda VMware vSphere Virtualization Technology vMotion Storage vMotion Snapshot High Availability DRS Resource Pools Monitoring.
PowerVM and VMware. What this presentation is Basic Terms that can be used to discuss multiple forms of virtualization Concepts common to virtualization.
Tales from the Trenches About
August 21, Five Myths of Virtualization Management Rick Ruskin, VP Sales eG Innovations, Inc. IT Expo Booth 749.
Yury Kissin Infrastructure Consultant Storage improvements Dynamic Memory Hyper-V Replica VM Mobility New and Improved Networking Capabilities.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
Introduction to VMware Virtualization
Sources of Performance Problems
Don’t Panic DBAs – Databases On VMware Made Easy Kathy Gibbs Senior Database Administrator, CONFIO Software.
Planning and Designing Server Virtualisation.
Improving Disk Latency and Throughput with VMware Presented by Raxco Software, Inc. March 11, 2011.
VSP1999 esxtop for Advanced Users Name, Title, Company.
Session objectives Discuss whether or not virtualization makes sense for Exchange 2013 Describe supportability of virtualization features Explain sizing.
Eric Burgener VP, Product Management A New Approach to Storage in Virtual Environments March 2012.
Virtual Server Monitoring Solution Overview. Agenda MonitorIT Overview Solution Demonstration Questions Contact Information.
Server Virtualization & Disaster Recovery Ryerson University, Computer & Communication Services (CCS), Technical Support Group Eran Frank Manager, Technical.
SC2012 Infrastructure Components Management Justin Cook (Data # 3) Principal Consultant, Systems Management Noel Fairclough (Data # 3) Consultant, Systems.
Clint Huffman Microsoft Premier Field Engineer (PFE) Microsoft Corporation SESSION CODE: VIR315 Kenon Owens Technical Product Manager Microsoft Corporation.
Consolidation and Optimization Best Practices: SQL Server 2008 and Hyper-V Dandy Weyn | Microsoft Corp. Antwerp, March
VMware vSphere Configuration and Management v6
VMWare Troubleshooting Basics Lewis Talley. Memory ESXi incorporates a number of memory management techniques such as (transparent page sharing, Ballooning,
1 | SharePoint Saturday Calgary – 31 MAY 2014 About Me.
© Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Restricted Module 7.
Deployment options for Fluid Cache for SAN with VMware
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Turn Bare Metal Into Silver Lining With SCVMM 2012, Today! Mark Rhodes OBS SESSION CODE: SEC313 (c) 2011 Microsoft. All rights reserved.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
1 Best Practices for Monitoring Databases on VMware Dean Richards Senior DBA, Confio Software.
REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.
VMware Certified Professional 6-Data Center Virtualization Beta 2V0-621Exam.
Module Objectives At the end of the module, you will be able to:
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
1 SQL Server on VMware? Rob Mandeville Senior DBA, Confio Software 1 Virtualizing Our Environment: Lessons Learned Rob Mandeville.
Difference between External and Internal Server Monitoring.
PernixData FVP & Architect Storage that is Fast, Scalable and Predictable Frank Brix Pedersen Systems Engineer -
Optimizing SQL Server Performance in a Virtual Environment Denny Cherry twitter.com/mrdenny.
vSphere 6 Foundations Exam Training
VMware vSphere 4.0 Preventive & Maintenance. Agenda Preventive & Maintenace Storage/Datastore ESX Host Performance Monitoring ESX Maintenance User Access.
Virtualization Fundamentals for DBAs Joey D’Antoni February 3, 2015 DBA Fundamentals VC.
1 SQL Server on VMware? Rob Mandeville Senior DBA, Confio Software.
Introduction to VMware Virtualization
Don’t Panic, DBAs! Databases on Vmware made easy Janis Griffin Senior DBA, Confio Software 1.
Optimizing SQL Server Performance in a Virtual Environment
Key Metrics and Practices for Monitoring Virtualization Platforms
Virtualization Meetup Discussion
Cloud Computing Architecture
Specialized Cloud Architectures
Presentation transcript:

© 2014 VMware Inc. All rights reserved. Performance Management Iwan ‘e1’ Rahabok Staff SE (Strategic Accounts) & CTO Ambassador | | Linkedin.com/in/e1ang | Tweeter: e1_ang VCAP-DCD, TOGAF Certified, vExpert

2 Warm-up exercise  You got an from the app team, saying the main Intranet application was slow. The was 1 hour ago. The stated that it was slow for 1 hour, and it was ok after that. So it was slow between 1-2 hours ago, but ok now. You did a check. Everything is indeed ok in the past 1 hour. The application spans 10 VMs in 2 different clusters, 4 datastores and 1 RDM You are not familiar with the applications. You do not know what apps runs on each VM as you have no access to the Guest OS. Your environment: 1 VC, 4 clusters, 30 hosts, 500 VM, 40 datastores, 1 midrange array, 10 GE, iSCSI storage Test your vSphere knowledge! How do you solve/approach this with just vSphere? Test your vSphere knowledge! How do you solve/approach this with just vSphere? What do you do?  A: Smile, as this will be a nice challenge for your TAM/BCS/MCS/RE  B: No sweat, you’re VCDX + CCIE + ITIL Master. You’re born for this.  C: SMS your wive, “Honey, I’m staying overnight at the datacenter  “  D: Take a blood pressure medicine so it won’t shoot up.  E: Buy the app team very nice dinner, and tell them to keep quiet.

Performance: How do you know it’s optimised? What do you measure? – Utilisation? Utilisation of 100% means it’s performing…? Utilisation of 5% means it’s performing…? Utilisation of 50% means it’s performing…? Really? – Something else? What is that something else? CONFIDENTIAL3 To understand this “something else”, we need to go back to “fundamental”.

What do we care at each layer? SDDC VM We care if it is being served well by the platform. Other VM is irrelevant from VM Owner point of view. Make sure it is not contending for resource. 1 1 We check if it is sized properly. If too small, increase its configuration. If too big, right size it for better performance We check if it is sized properly. If too small, increase its configuration. If too big, right size it for better performance 2 2 We care if it is serving everyone well. Make sure there is no contention for resource among all the VMs in the platform. We care if it is serving everyone well. Make sure there is no contention for resource among all the VMs in the platform. 1 1 We check for overall utilisation. Too low, we are not investing wisely on hardware Too high, we need to buy more hardware. We check for overall utilisation. Too low, we are not investing wisely on hardware Too high, we need to buy more hardware. 2 2

Take Away: Contention and Utilisation Unlike physical DC, in virtual infrastructure…. – we use Contention, not Utilisation, for Performance Management – we use Utilisation (short range) for Performance Management – we use Utilisation (long range) for Capacity Management Contention is how you measure that the platform is performing well. Sound good! But how do you measure “Contention”? CONFIDENTIAL5

Performance: The counters What counters prove that it is optimised? – You need a technical fact to assure yourself Either that, or take a sleeping pill at night – You need a technical fact to show to your customers Your SLA must be based on something concrete, not subject to interpretation or “feeling of the day” – If you can’t prove it, how does anyone know it is optimised? ;-) CONFIDENTIAL6

Optimized Infrastructure Performance* CPU RAM Storage Network CONFIDENTIAL7 * While keeping Cost in mind

How a VM gets its resource Provisioned Limit Reservation Entitlement 0 Contention Usage Demand

VM CPU: The 4 States

VM CPU: What do you monitor? Contention – Ready (ms)? – Co-Stop (ms)? – Latency (%)? – Max Limited (ms)? – Overlap (ms)? – Swap Wait (ms)? Utilisation – Used (ms)? – Usage (%)? – Demand (MHz)? CONFIDENTIAL10 Quiz Time! What’s difference between Average, Summation and Latest? How does timeline impact the value? Quiz Time! What’s difference between Average, Summation and Latest? How does timeline impact the value?

VM CPU: What you should monitor Contention: – Contention (%) Utilisation – Workload (%) Contention – Latency (%) – Max Limited (if applicable) Utilisation – Usage (%) – Demand (MHz) CONFIDENTIAL11 Discussion Time! What’s should the value be for an optimized environment? Discussion Time! What’s should the value be for an optimized environment? vCenter Operations vCenter

One more thing… Hypervisor does not have visibility inside the Guest OS. There is 1 particular CPU counter that you should get. It tells you that there is not enough CPU to meet demand. vRealize Operations (via Hyperic) does not collect this counter Which counter is that? CONFIDENTIAL12

CONFIDENTIAL13 Enough about CPU. Let’s move to RAM! Enough about CPU. Let’s move to RAM!

Quiz Time! Which of the following sentences are True: – Ballooning is bad. You see a VM has balloon, that VM has memory performance problem. – Ballooning happens before Compression, which happens before Swapping. If you see a VM has Compressed memory but not Ballooned memory, that vCenter is buggy, or your eyes are just tired. – If all the VMs in the ESXi host has low Usage counter, then the ESXi must also be low. – Turn on Large Page, and there goes all your TPS. – To check if a VM has memory contention, check its CPU Swap Wait counter. – Why are all the questions difficult?! Answer – Ballooning indicates the ESXi has memory pressure. It does not mean the VM has memory performance issue. – Pages remain compressed or swap if they are not accessed. – Usage counter is different in VM and ESXi! In VM, it is Active. In ESXi, it is Consumed. This is due to 2 level memory concept. – Yes, unless your ESXi is under heavy memory constraint. CONFIDENTIAL14

2 levels of Memory Hierarchy New hierarchy in VMware’s memory overcommit technology Transparent Page Sharing Ballooning Memory Compression Swap to Host Cache (SSD) Disk swapping Decompression is sub-ms compared to swap (15-20 ms)! OS Hypervisor

vSphere Memory Management 2 types of Memory Management – Guest OS level Balloon – Hypervisor level TPS Compression, Swap to disk, Swap to cache (SSD) CONFIDENTIAL16 Volunteer Time! Explain Balloon, TPS, Compression. Volunteer Time! Explain Balloon, TPS, Compression.

VM RAM: What do you monitor? Contention – Swapped? – Balloon? – Compressed? – Latency? – CPU Swap Wait? Utilisation – Active? – Usage? – Consumed? CONFIDENTIAL17

VM RAM: What you should monitor Contention: – RAM Contention (%) Utilisation – Workload (%) – Consumed (KB) Contention – Latency (%) – CPU Swap Wait (ms) Utilisation – Usage (%) – Consumed (KB) CONFIDENTIAL18 Discussion Time! What’s should the value be for an optimized environment? Discussion Time! What’s should the value be for an optimized environment? vCenter Operations vCenter

One more thing… Hypervisor does not have visibility inside the Guest OS. There is 1 particular RAM counter that you should get. It tells you that there is not enough RAM to meet demand. Which counter is that? You can monitor it Guest OS paging activity by separating the page file into its own vmdk. – You can then use vC Ops to analyse the pattern. CONFIDENTIAL19

CONFIDENTIAL20 Enough about RAM. Let’s move to Storage! Enough about RAM. Let’s move to Storage!

Quiz Time! Which of the following sentences are True: – The latency counter is the (Write Latency + Read Latency) / 2 – If you have RDM, vCenter does not track the latency. – If the VM virtual disk counter showing 1000 IOPS, but the VM datastore counter showing 2x IOPS, something is seriously wrong. Time to call your TAM! – If all your VMs experiencing high latency, the first thing you do is check the VMkernel queue Answer – It is not. It takes into account the number of commands issued. It’s a weighted average. – It only tracks the latency at the latest data. It’s not including other data during the collection period. – Check for snapshot. Snapshot IOPS is transparent to virtual disk. – The first thing you do is check the physical device queue and your storage array. VMkernel queue rarely exceeds 1 ms. CONFIDENTIAL21

VM Storage: Where and what do you monitor? 22 Virtual Disk Disk Datastore

VM Storage: where to monitor For vmdk, use Datastore metric groups. For RDM, use Disk metric groups Disk metric group is naturally not relevant for NFS (files) Disk VM RDM VMFS NFS Disk 1 Disk 2 Disk 3 Disk scsi0:1scsi0:2 Datastore vDisk scsi0:0

VM Storage: What do you monitor? Contention – Latency (ms) Utilisation – Commands per second – Usage (KBps) – Workload (%) Contention – Latency (ms) Utilisation – Commands Issued – Usage (KBps) CONFIDENTIAL24 vCenter Operations vCenter

VM Network Contentions – Drop packets – Packets retransmit Utilisation – Network throughput Limitations – We cannot monitor latency (e.g. between source and destination) CONFIDENTIAL25

Different Tiers, Different Optimization Business Logic: – Tier 1 is optimised for Performance and Availability – Tier 3 is optimised for Cost Do you allow Tier 1 VM on Tier 3 Storage? – Or you map the Compute Tier to the Storage Tier? What distinguish Tier 1 from Tier 3? – Availability – Performance – Monitoring – Cost! CONFIDENTIAL26

Tiering: Considerations Compute – No of spare host – No of hosts – Consolidation Ratio (VM:Host) – vCPU:pCPU Oversubscribed – vRAM:pRAM Oversubscribed – Clustering (e.g. VCS) Storage – IOPS per VM – Latency Monitoring – Application availability monitoring (e.g. AppHA) – Application performance monitoring (e.g. vC Ops Enterprise) Availability – Automated DR (SRM) – RPO – RTO CONFIDENTIAL27

3-Tiers Offering: Example Tier 1Tier 2Tier 3 No of spare host211 No of hosts6810 Consolidation Ratio (VM:Host)10:120:140:1 vCPU:pCPU Oversubscribedn/a2.0x4.0x vRAM:pRAM Oversubscribedn/a1.5x2.0x IOPS per VM Latency<10 ms15-20 ms20-25 ms Clustering (e.g. VCS)Yes No Application monitoring (e.g. AppHA)Yes No AppsYes No Automated DR (SRM)Yes RPO5 minutes1-2 hour2-8 hours RTO1 hour<2 hours<4 hours CONFIDENTIAL28

Demystifying “Peak” There are 2 types of “Peak” – Peak across time – Peak across objects Impacts – Peak across time can be too high if the burst is high VM is low for 24 hours, burst to 100% for 5 minutes, and you get 100% reported. – Peak across time can be lower if the number of member objects is high. Peak of a cluster in the past 1 day is 70%. That means at least 1 host was >70%. – Peak across objects can be too high is the load is unbalanced Happens when cluster utilisation is not high enough to trigger DRS orStorage DRS CONFIDENTIAL29

Sample SLA and Internal Threshold CONFIDENTIAL30 Tier 1Tier 2Tier 3 CPU Contention1%3%13% RAM Contention0%5%10% Disk Latency10 ms20 ms30 ms SLA only applies to VM. VM owner does not care about underlying platform SLA only applies to VM. VM owner does not care about underlying platform Tier 1Tier 2Tier 3 CPU Contention0.5%2%10% RAM Contention0%2%8% Disk Latency10 ms15 ms20 ms

Where to monitor at the Platform level? Compute – Host? – Cluster? – Datacenter? – vCenter? Storage – Host? – Cluster? – Datastore? – Datastore Cluster? – Datacenter? – vCenter? Network – Standard Switch and port group? :-) – Host? – Distributed Switch? – Distributed Port Group? CONFIDENTIAL31

Where to monitor Compute – Host – Datacenter Storage – Host – Cluster Network – Host Compute – Cluster Storage – Datastore – Datastore Cluster Network – Distributed Switch. – Distributed Port Group CONFIDENTIAL32 DRS (and Storage DRS) will balance the cluster Not here Monitor these

QoS in a shared environment QoS is mandatory in a shared environment Areas to control – Compute – Network – Storage CPU and RAM – Shares – Reservation – Limit? – Resource Pool? Storage I/O Control Network I/O Control CONFIDENTIAL33

QoS: Compute When not to use Resource Pool? When to use Resource Pool? What’s the impact of Reservation? – HA Slot Size. Unless you use % – Boot time – Oversubscribe ability. You cannot go beyond 100% reservation. CONFIDENTIAL34

QoS: Storage A single VM can hog storage throughput – Just need to run IOmeter – Unfairly penalizes VMs on hosts with high consolidation ratios Existing resource management only works for VMs on the same host SIOC calculates datastore latency to identify contention – Latency is a normalized, average across VMs – IO size and IOPS included 100 % 75% device queue depth % Storage Array Queue ESX Server 38% 50% device queue depth % Without SIOC – Latency is Unbounded Without Storage IO Control Actual Disk Resources utilized by each VM are not in the correct ratio Storage Congested

QoS: Storage SIOC enforces fairness when datastore latency crosses threshold – Dynamic threshold setting – Fairness enforced by limiting VMs access to queue slots What’s the limitation? – No inter-datastore awareness – Does not work on RDM – Non VM workload not included Work with your Storage team. – Auto-tiering array is supported 75% device queue depth VM A 1500 Shares VM B 500 Shares VM C 500 Shares 25 % ESX Server 100 % 60% 20% With Storage IO Control Actual Disk Resources utilized by each VM Are in the correct ratio even across ESX Hosts Storage Queue Throttled With SIOC – Latency is Controlled Storage Controlled Storage Array Queue

Key Takeaways Optimization in SDDC has a lot more components than we normally think Contention is 1 st. Utilisation is 2 nd SLA is at VM level, not Infrastructure level. Peak can be too low or too high. Anything else? CONFIDENTIAL37