System Troubleshooting TCS Network, System, and Load Monitoring TCS for Developers
LBT TCS Cluster
Networking VLANS for private networks 6 Gb non-blocking, full duplex backbone. Latency, Throughput, Data Rate Broadcast Multicast TCP/UDP Bottleneck at the desktop workstations
Diagnostics Theory Memory bound versus CPU bound Network throughput versus speed Multithreading errors Subsystem Interaction printf and syslog Standard Out and Standard Error
Monitoring and Diagnostic Tools /sbin/tcpdump /sbin/ifconfig cacti top syslog top vmstat R gnuplot
tcpdump Interactive -lett -i {limit} Device can be eth0 or eth0.20 for vlans Gather Only -i -w Gathers all raw packets and writes them to a file for processing later
Reflective Memory 17:51: IP > : UDP, length :51: IP > : UDP, length 60 17:51: IP > : UDP, length 60 17:51: IP > : UDP, length 60 17:51: IP > : UDP, length 60 17:51: IP > : UDP, length 60 17:51: IP > : UDP, length 60 17:51: IP > : UDP, length 1028 ~]# tcpdump -i eth0
ifconfig eth0 Link encap:Ethernet HWaddr 00:11:11:10:04:10 inet6 addr: fe80::211:11ff:fe10:410/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets: errors:0 dropped:0 overruns:0 frame:0 TX packets: errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes: (3.2 GiB) TX bytes: (3.7 GiB) Base address:0xdf40 Memory:fbee0000-fbf00000 eth0.10 Link encap:Ethernet HWaddr 00:11:11:10:04:10 inet addr: Bcast: Mask: inet6 addr: fe80::211:11ff:fe10:410/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets: errors:0 dropped:0 overruns:0 frame:0 TX packets: errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes: (2.5 GiB) TX bytes: (1.0 GiB) ~]# ifconfig -a
Cacti ( LDAP authentication Customizable views Full Deployment September, 2006
top Time spent lost in system is probably io which includes networking Sort by memory usage with “M” Top inaccurately reports itself
vmstat Vmstat is a linux utility for monitoring virtual memory usage. It can also be used to track down I/O problems including networking. procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa
Statistical Analysis R, gnuplot, and Matlab All of these packages give you a different view of the data that you gather. Even if you are not comfortable with them, someone else might be. Graphs, Charts, baselines, etc…
Syslog /var/log/TCS/? ~]$ tail -f /var/log/TCS/user Jul 24 20:55:19 lbtmu105 LBT_ECS: Thermal failed to connect to IP port Jul 24 20:55:20 lbtmu105 LBT_ECS: Thermal not connected to ThermalBox, Send Cmd failed Jul 24 20:55:32 lbtmu105 LBT_ECS: Thermal failed to connect to IP port Jul 24 20:55:33 lbtmu105 LBT_ECS: Thermal not connected to ThermalBox, Send Cmd failed Jul 24 20:55:43 lbtmu103 last message repeated 58 times Jul 24 20:55:45 lbtmu105 LBT_ECS: Thermal failed to connect to IP port Jul 24 20:55:46 lbtmu105 LBT_ECS: Thermal not connected to ThermalBox, Send Cmd failed Jul 24 20:55:58 lbtmu105 LBT_ECS: Thermal failed to connect to IP port Jul 24 20:55:59 lbtmu105 LBT_ECS: Thermal not connected to ThermalBox, Send Cmd failed