Download presentation
Presentation is loading. Please wait.
Published byJune O’Connor’ Modified over 9 years ago
1
VSP1999 esxtop for Advanced Users Name, Title, Company
2
2 Disclaimer This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined.
3
3 Before we dive in…
4
4 vSphere Performance Management Tools (1 of 2) vCenter Alarms Relies on static thresholds Alarm trigger may not always indicate an actual performance problem vCenter Operations Aggregates metrics into workload, capacity and health scores Relies on dynamic thresholds vCenter Charts Historical trends Post mortem analysis, comparing metrics
5
5 vSphere Performance Management Tools (2 of 2) esxtop/resxtop For live troubleshooting and root cause analysis esxplot, perfmon and other tools can be used for offline analysis
6
6 Performance Snapshot For complicated problems Technical support may ask you for a performance snapshot for offline analysis
7
7 About This Talk This talk will focus on the esxtop counters using illustrative examples esxtop manual: http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf Interpreting esxtop statistics http://communities.vmware.com/docs/DOC-11812 Previous vmworld talks: VMworld 2008 - http://vmworld.com/docs/DOC-2356http://vmworld.com/docs/DOC-2356 VMworld 2009 - http://vmworld.com/docs/DOC-3838http://vmworld.com/docs/DOC-3838 VMworld 2010 - http://www.vmworld.com/docs/DOC-5101http://www.vmworld.com/docs/DOC-5101
8
8 esxtop Screens Screens c: cpu (default) m: memory n: network d: disk adapter u: disk device (added in ESX 3.5) v: disk VM (added in ESX 3.5) i: Interrupts (added in ESX 4.0) p: power management (added in ESX 4.1) VMkernel CPU Scheduler Memory Scheduler Virtual Switch vSCSI c, i, pmd, u, vn VM
9
9 New counters in ESX 5.0
10
10 vCPU and VM Count World, VM and vCPU count
11
11 VMWAIT %WAIT - %IDLE More about this later…
12
12 CPU Clock Frequency in Different P-states CPU clock frequency in different P-states P-states are visible to ESX only when power management setting in the BIOS is set to “OS Controlled” More about this later…
13
13 Failed Disk IOs Failed IOs are now accounted separately from successful IOs
14
14 VAAI: Block Deletion Operations New set of VAAI stats for tracking block deletion VAAI : vStorage API for Array Integration
15
15 Low-Latency Swap (Host Cache) Low-Latency (SSD) Swap
16
16 Understanding CPU counters
17
17 CPU State Times IDLE WAIT SWPWT blocked VMWAIT RUN RDY MLMTD Elapsed Time CSTP Guest I/O
18
18 CPU Usage Accounting USED = RUN RUN SYS OVRLP System Service USED could be < RUN if the CPU is not running at its rated clock frequency + SYS - OVRLP
19
19 Impact of P-States P-States%RUN%UTIL%USED P0 (2400 Mhz)100% P1 (1700 Mhz)100% 70% P2 (1200 Mhz)100% 50% P3 (800 Mhz)100% 33% %USED: CPU usage with reference to rated base clock frequency %UTIL: CPU utilization with reference to current clock frequency %RUN: CPU occupancy time
20
20 Factors That Affect VM CPU Usage Accounting Chargeback %SYS time CPU frequency scaling Turbo boost USED > (RUN – SYS) Power management USED < (RUN – SYS) Hyperthreading
21
21 Poor performance due to power management
22
22 CPU Usage: With CPU Clock Frequency Scaling VM is running all the time but uses only 75% of the clock frequency. Power savings enabled in BIOS.
23
23 Poor performance due to core sharing
24
24 Hyperthreading PCPU Core HT Off HT On ESX scheduler tries to avoid sharing the same core
25
25 CPU Usage: Without Core Sharing Two VMs running on different cores USED is > 100 due to Turbo Boost
26
26 CPU Usage: With Core Sharing Two VMs sharing the same core %LAT_C counter shows the CPU time unavailable to due to core sharing
27
27 Performance Impact of Swapping
28
28 Performance Impact of Swapping Some swapping activity Time spent in blocked state due to swapping
29
29 How to identify storage connectivity issues
30
30 NFS Connectivity Issue (1 of 2) I/O activity to NFS datastore System time charged for NFS activity
31
31 NFS Connectivity Issue (2 of 2) VM blocked, connectivity lost to NFS datastore No I/O activity on the NFS datastore VM is not using CPU
32
32 Poor performance during snapshot revert
33
33 Snapshot Revert Reads in MB from VM check point file Not accounted in VM disk I/O traffic But can be seen in adapter view
34
34 Wide-NUMA behavior in ESX 5.0
35
35 Wide-NUMA Support in ESX 5.0 2 x 16G NUMA Nodes 24G vRAM exceeds one NUMA node 1 home NUMA node assigned 1 vCPU VM
36
36 Wide-NUMA Support in ESX 5.0 8 vCPUs, exceeds one NUMA node 2 x 16G NUMA Nodes 24G vRAM exceeds one NUMA node 2 Home NUMA nodes assigned
37
37 Network packet drops due to CPU resource issue
38
38 Network Packet Drops Max CPU limited Excessive Ready time Packet drops at the vSwitch
39
39 Understanding esxtop disk counters
40
40 Disk I/O Latencies Application Guest OS Application Guest OS ESX Storage Stack ESX Storage Stack VMM Driver KAVG iostat/perfmon DAVG GAVG QAVG KAVG = GAVG – DAVG Array SP Fabric vSCSI HBA Time spent in ESX storage stack is minimal, for all practical purposes KAVG ~= QAVG In a well configured system QAVG should be zero
41
41 Disk I/O Queuing GQLEN – Guest Queue AQLEN – Adapter Queue WQLEN – World Queue D(/L)QLEN – LUN Queue SQLEN – Array SP Queue DQLEN AQLEN SQLEN GQLEN D(/L)QLEN can change dynamically when SIOC is enabled Reported in esxtop Application Guest OS Application Guest OS VMM Driver Array SP Fabric vSCSI HBA WQLEN ESX Storage Stack ESX Storage Stack
42
Max IOPS = Max Outstanding IOs / Latency For example, with 64 outstanding IOs and 4msec average latency Max IOPS = 64/4ms = 16,000
43
43 Identifying Queue bottlenecks
44
44 Disk I/O Queuing – Device Queue Device Queue length, modifiable via driver parameter IO commands in Flight IO commands waiting in Queue
45
45 Disk I/O Queuing – World Queue World ID World Queue Length – modifiable Disk.SchedNumReq uestOutstanding
46
46 Device Queue Full KAVG is non-zero Queuing issue LUN Queue depth is 32 32 IOs in flight and 32 Queued
47
47 Disk I/O Queuing – Adapter Queue Different adapters have different queue size Adapter Queue can come into play if the total outstanding IOs exceeds the adapter queue
48
48 A few takeaways…
49
49 Takeaways esxtop is great for troubleshooting a diverse set of problems You can do root-cause analysis by co-relating statistics from different screens Good understanding of the counters is essential for accurate troubleshooting esxtop is not designed for performance management There are various other tools for vSphere performance management
50
50 Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.