Lecture – Performance Performance management on UNIX
02/06/20162 Performance Analysis Performance analysis involves identifying various system bottlenecks This involves a number of steps We must ask a number of questions Is there a performance Problem? Is the problem CPU or I/O related?
02/06/20163 Performance Analysis CPU Related? What is the current load on the CPU? What is the average load on the CPU? I/O Related Is it normal disk I/O? Would more/faster disks help? Is it paging I/O? Would more physical memory help?
02/06/20164 Related to a Particular User or Program? Identify the user / program Identify what they are doing to cause the problem Revise their operating procedures Consider removing them from the system
02/06/20165 Determining CPU Usage Determining the CPU usage is the first thing we should do There are a number of tools to do this vmstat gives several pieces of useful information including CPU usage vmstat [interval] [count] Interval is the number of seconds between reports and count is the number of reports to generate
02/06/20166 vmstat 2 10 vmstat 2 10 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
7 vmstat The first line gives the average values since the system was booted and should be ignored To determine the CPU usage, we are interested in the last three columns, us, sy, id us: % of CPU dedicated to User tasks sy: % of CPU dedicated to System tasks. Including I/O performing general O/S functions etc. id: % of CPU idle procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
02/06/20168 Analysing vmstat output (CPU) Just because CPU time is high or idle time is low does not indicate a system problem It may simply indicate that a number of batch jobs are scheduled to run at the same time and might benefit from being rearranged In order to establish if there is a genuine problem it is necessary to monitor the system over an extended period If average CPU% remain high, there is a problem
02/06/20169 Analysing vmstat output (Process States) There are three states in which a process may be at any point in time Runtime, uninterrupted sleep, swapped out Process Statistics: r: Number of processes waiting for runtime b: Number of processes in uninterrupted sleep w: Number of processes swapped out, but otherwise able to run A high r suggests there is a bottle neck. procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
02/06/ Analysing vmstat output (Memory) Memory Statistics swapd: Amount of virtual memory used (KB) free: Amount of idle memory (KB) buff: Ammount of memory used in buffers cache:amount of memory left in cache procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
02/06/ Analysing vmstat output (Swap) Swap Statistics si: Amount of memory swapped in from disk (KB/s) so: Amount of memory swapped out to disk (KB/s) Swap statistics are arguably the most important statistic to monitor, and of these, the so field This field indicates the pages that have been swapped out, even if done before vmstat was started procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
02/06/ Analysing vmstat output (I/O) I/O Statistics bi: Blocks received from a block device (blocks/sec) bo: Blocks sent to a block device (blocks/sec) If there are a large number of block transfers, the problem with your system may lie here (i.e. device access is high) A single reading, however is not indicative of the system as a whole, simply a snapshot All Linux blocks are 1KB except for CDRom blocks (2KB) procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
02/06/ Analysing vmstat output (System) System Statistics in: The number of interrupts per second, including the system clock cs: The number of context switches per second procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
02/06/ Analysing vmstat output (CPU usage) System Statistics us: % of CPU dedicated to user tasks sy: % of CPU dedicated to system tasks id: % of CPU idle procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id
02/06/ top top is another tool for identifying problems with a LINUX system Displays the top CPU processes Displays a listing of the most CPU intensive tasks on the system Can provide an interactive interface for manipulating the processes Default is to update every 5 seconds top operates by examining files in the /proc pseudo file system This pseudo file system is used as an interface to kernel data structures man proc
02/06/ rbradley]$ top 17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, processes: 59 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idle Mem: k av, k used, k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_c Swap: k av, 9096k used, k free 34656k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1 root S :15 0 init 2 root SW :00 0 keventd 3 root SW :01 0 kapmd 4 root SWN :00 0 ksoftirqd_CPU0 9 root SW :00 0 bdflush 226 root SW :00 0 kjournald 586 root S :08 0 syslogd 590 root S :03 0 klogd 666 root S :09 0 sshd 719 root S :00 0 gpm 728 root S :05 0 crond 785 xfs S :00 0 xfs 803 daemon S :00 0 atd 812 root S :00 0 mingetty 813 root S :00 0 mingetty top
17 Analysing top output Up: The time the system has been up and the three load averages Average number of processes ready to run in the last 1,5 and 15 minutes Same as the output of uptime Processes: The total number of processes running at the time of the last update Broken down into running, sleeping, stopped and zombied (A zombie process is a finished process where the parent has not read it exit state – which causes the process to be cleaned up) 17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, processes: 59 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idle Mem: k av, k used, k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_c Swap: k av, 9096k used, k free 34656k cached
02/06/ Analysing top output CPU States: The percentage of CPU time in user mode, system mode, niced tasks (negative nice tasks) and idle Time spent in niced tasks will also be counted system and user time, so the total will be more than 100% Mem: Statistics on memory usage, including total available memory, free memory, used memory, shared memory, memory used for buffers 17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, processes: 59 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idle Mem: k av, k used, k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_c Swap: k av, 9096k used, k free 34656k cached
02/06/ Analysing top output Swap: Statistics on swap space including total swap space and used swap space This and the Mem section together are the same as the output of free* PID: The process ID of each task USER: The username pf the task’s owner PRI: The priority of the task NI: The nice value of the task. Negative values are lower priority 17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, processes: 59 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idle Mem: k av, k used, k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_c Swap: k av, 9096k used, k free 34656k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1 root S :15 0 init
02/06/ Analysing top output SIZE: The size of the task’s code plus data stack space, in kilobytes RSS: The total amount of physical memory used by the task in kilobytes SHARE: The amount of shared memory used by the task STATE: The state of the task, S: sleeping, D: uninterrupted sleep, R: running, Z: zombies, T: stopped or traced %CPU: The task’s share of the CPU since the last screen update as a a percentage of total CPU time %MEM: The task’s percentage of physical memory Time: Total CPU time used by process since it started COMMAND: The task’s command name PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1 root S :15 0 init
02/06/ Using top to control processes In addition to command-line options for controlling the appearance of top (not covered here) there are a number of commands that can be issued to top while running Space: immediately updates the display ^L: Erases and redraws the screen k: kill a process You will be prompted for the pid and a signal to send to the process (normally 15)
02/06/ Using top to control processes i: ignore zombie processes n: change the number of processes to view r: renice a process P: sort tasks by CPU usage M: sort tasks by Memory usage
02/06/ Renice The renice command is used to alter the priority of running processes The default nice value is 0 The range in Linux is -20 to +20 The lower the value the faster the process runs Can examine the nice value of a process using ps –l
02/06/ Renice The owner of and root can change the nice value of a process using renice Changes apply to all child processes renice priority [[-p] pid...] [[-g] pgrp...] [[-u] user...] ps -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 S wait4 pts/1 00:00:00 bash 0 R pts/1 00:00:00 ps renice : old priority 0, new priority 5 ps -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 S wait4 pts/1 00:00:00 bash 0 R pts/1 00:00:00 ps
02/06/ Renice Once a nice value has been increased, only the root user can reduce it again, not even to the default value renice : old priority 5, new priority 19 ps -l F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 S wait4 pts/1 00:00:00 bash 0 R pts/1 00:00:00 ps renice renice: 24496: setpriority: Permission denied
02/06/ How Much Swap Space? A quick rule of thumb often used is twice as much as you have physical memory This approach is a bit simplistic and does not scale well 1. Estimate total memory requirements 2. Add some megabytes as a spare 3. Subtract the amount of physical memory available 4. If the value from 3 is > 3 times the available physical memory, you need more memory
02/06/ How Much Swap Space Sometimes the above formula will show that you don’t need swap space at all It is a good policy to create some anyway Linux uses the swap space so that as much physical memory as possible is kept free It swaps out pages that have not been used for a while When memory is needed, it is available
02/06/ How Much Swap Space? If swap space is removed (using the swapoff command) the system will attempt to move any swapped pages into other swap space or physical memory If there is not enough space elsewhere the system may become unavailable for a time, while it sorts itself out, but it will come back