CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Performance Monitoring
CIT 470: Advanced Network and System AdministrationSlide #2 Topics 1.Performance monitoring. 2.Performance tuning. 3.CPU 4.Memory 5.Disk 6.Network
CIT 470: Advanced Network and System AdministrationSlide #3 Performance Monitoring Identify which aspect of performance Latency: delay until initial access. Throughput: rate of transfer/processing. Identify which system component CPU Memory Disk Network
CIT 470: Advanced Network and System AdministrationSlide #4 Performance Tuning Process 1.Learn the customer’s problem. Identify specifically what’s wrong. 2.Find the problem’s cause and fix it. 1.When does the problem occur? 2.Has anything about the system changed? 3.What critical resource is affecting performance? 3.Have the right tools. Historical monitoring data will show what’s normal and identify any trends.
CIT 470: Advanced Network and System AdministrationSlide #5 Experimenter Effect Monitoring the system affects performance. Monitoring tools use system resources. If you’ve consistently monitored system, then monitoring won’t alter system performance.
CIT 470: Advanced Network and System AdministrationSlide #6 Performance Problem Solutions 1.Get more of needed resource. Ex: Upgrade processor, use striped disk array. 2.Reduce system requirements. Ex: Kill processes, move services to other hosts. 3.Eliminate inefficiency and waste. Ex: Produce a static home page every 15 minutes instead of regenerating each access. 4.Ration resource usage. Ex: Set process priorities with renice. Ex: Limit process resource usage with limit.
CIT 470: Advanced Network and System AdministrationSlide #7 Monitoring Processes uptime Provides aggregate data about system load. ps Shows running processes with CPU, mem usage. top Updated list of running processes + summaries. vmstat Summary data about processes and CPU usage.
CIT 470: Advanced Network and System AdministrationSlide #8 Uptime Uptime provides the following data How long system has been running. Number of users logged in. Average number of runnable processes. In last 1, 5, 15 minutes. Want a load average under 3. Uptime example > uptime 17:40 up 126 days, 8:03, 6 users, load average: 1.40, 1.03, 0.55
CIT 470: Advanced Network and System AdministrationSlide #9 vmstat Number of Runnable and Blocked processes. Memory (virtual, free, buffered, cached) Blocks/second transferred in (bi) and out (bo) Interrupts/sec (in) and context switches/sec (cs) CPU usage by user, system, idle, and waiting. > vmstat 5 4 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa
CIT 470: Advanced Network and System AdministrationSlide #10 Identifying CPU Shortages 1.Short-term CPU spikes are normal. 2.Consistently high number of runnable processes (r) in vmstat. 3.Consistent high total CPU usage (sy+us). 4.High system time compared to user time and high context switches indicates system is thrashing between processes instead of doing user work.
CIT 470: Advanced Network and System AdministrationSlide #11 Changing Process Priorities Nice values Positive values lower priorities. Negative values increase priorities. If you know a process will be a CPU hog, nice +5 command_name If you detect a CPU hog after it’s started, renice 5 PID
CIT 470: Advanced Network and System AdministrationSlide #12 Managing Processes with kill TERM (default) Terminates process execution (Ctrl-c). Processes can catch or ignore signal. KILL (9) Terminates process execution. Processes cannot catch or ignore. Processes waiting on I/O will not die. STOP Suspends process execution until SIGCONT (Ctrl-z). Useful for moving CPU hog out of way temporarily.
CIT 470: Advanced Network and System AdministrationSlide #13 Imposing Limits on Processes CPU timeulimit –t secs Maximum file sizeulimit –f KB Maximum data segmentulimit –d KB Maximum stack sizeulimit –s KB Maximum physical memulimit –m KB Maximum core sizeulimit –c KB Maximum number procsulimit –u n Maximum virtual memulimit –v KB
CIT 470: Advanced Network and System AdministrationSlide #14 Monitoring Memory Use free to see how memory is used. System will use most free memory for caching. System will swap out inactive processes. Don’t worry until free < 5% of total memory. Use vmstat to detect paging activity. Page out (so) rate greater than 0 consistently. High page in (si) rate, as system uses the paging facility to load programs into memory.
CIT 470: Advanced Network and System AdministrationSlide #15 Managing Memory 1.Improving paging capacity. Add new swapfiles with swapon. Add new swap partitions. 2.Improving paging performance. Use swap partitions instead of swap files. Distribute swap resources across disks. 3.Migrate memory hogs to another host. 4.Add more memory.
CIT 470: Advanced Network and System AdministrationSlide #16 Monitoring Disk I/O Use iostat to get per disk statistics. Transactions per second (tps). Blocks read/written per second. Managing disk performance problems. Distribute heavily used data across disks/ctrlers. Get more or faster disks. Use RAID or LVM striping.
CIT 470: Advanced Network and System AdministrationSlide #17 iostat > iostat 2 Linux (zim) 03/26/2007 avg-cpu: %user %nice %system %iowait %steal %idle Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde hdh hdc avg-cpu: %user %nice %system %iowait %steal %idle Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde hdh hdc
CIT 470: Advanced Network and System AdministrationSlide #18 Managing Disk Capacity Detecting disk resource usage. List all partition usage with df –h Identify high usage directories with du Summary data: du –s Highest usage directories: du -k /|sort –rn Use find to detect disk hogs. Use find –size to search for big files. Use –atime +X to identify files that haven’t been used in X days.
CIT 470: Advanced Network and System AdministrationSlide #19 Managing Disk Shortages 1.Add more disks. 2.Move files to remote fileservers. 3.Eliminate unnecessary files. 4.Compress large infrequently used files. 5.Impose disk quotas on users. Soft limit: can be violated temporarily. Hard limit: cannot be violated.
CIT 470: Advanced Network and System AdministrationSlide #20 Network Statistics > netstat -s Tcp: active connections openings passive connection openings 9 failed connection attempts 6195 connection resets received 5 connections established segments received segments send out segments retransmited 1389 bad segments received resets sent Ip: total packets received 6 with invalid headers 28 with invalid addresses 0 forwarded 0 incoming packets discarded incoming packets delivered requests sent out Udp: packets received 336 packets to unknown port received. 6 packet receive errors packets sent
CIT 470: Advanced Network and System AdministrationSlide #21 References 1.Mark Burgess, Principles of System and Network Administration, Wiley, Aeleen Frisch, Essential System Administration, 3 rd edition, O’Reilly, Mike Loukides and Gian-Paolo D. Musumeci, System Performance Tuning, 2 nd edition, O’Reilly, Evi Nemeth et al, UNIX System Administration Handbook, 3 rd edition, Prentice Hall, 2001.