ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang
Challenge How can we manage all these machines?
Goal Aimed at networked system management Better tools for – Discovering system states – Enhancing system availability – Monitoring network and system statistics – Error detection and correction – Fault tolerance for specific network applications (such as web server)
Overview Distributed agents gathering information Centralized Control Unit (CCU) monitors and analyzes data. Takes control action if needed Script language for automatic decision making Web browser user interface
SNMP Tool EchoMe Daemon
Centralized Control Unit Information collection – Machine information – Network information Information analysis – Individual Machine analysis – Collaborative network analysis Action – System modification – Network routing
Information Collection Two approaches investigated – EchoMe Daemons running on hosts and reporting system information to server – SNMP to discover router connectivity and states Daemon mostly for collecting local information. Much more detailed SNMP for network connectivity
EchoMe Daemon 1. Automatically discover a node (node reporting stage) – EchoMe Daemon start up as machine boot – Send up OS type/machine info to CCU – Register a session in CCU 2. CCU sends to node a monitor program base on node’s OS/Machine type and execute it on the node. 3. Monitor program send up information packet periodically to CCU.
Router Connectivity Discovery by SNMP Routers implemented SNMP Program can run on any host within Millennium Given a router (can get from local host’s gateway information), query its ipRouteTable Traverse all its neighboring routers, performing the same query Recursion stops at specified distance
System Information Number and speed of the CPUs Total physical and swap memory Installed System Clock Uptime Kernel Version Percent CPU user, nice, system and idle One, five and fifteen minute load averages Number of running processes and total number of processes Amount of free, shared, buffered, cached and swap memory
Network Information Network Interfaces – /proc/dev or CTL_NET/AF_LINK – SNMP: interface.ifTable ARP cache – direct neighbors – /proc/arp or RTF_LLINFO – SNMP: ip.ipNetToMediaTable Route Table – /proc/route or NET_RT_DUMP – SNMP: ip.ipRouteTable
Information Analysis CCU a relational database Front end, parsing engine Individual Node Analysis Collaborative Analysis
Parsing Engine IPACKET is in standard XML format IPACKET use incremental update, new packet specifies differences from previous packet. Parsing Engine parses the IPACKET into objects and does the insertion to iface accordingly. DATA
IFACE Tables The client node register an unique nodeid in iface_node_table It starts a session for reporting information to CCU Each time, client node reports information by sending up an information packet. (ipacket) CCU process this packet, create an unique statement id from iface_index_table and parse information into each iface_?DATA_table.
Individual Node Analysis Clean up iface_?data_table by transferring and categorizing data into each nodes’ own data table. A background process runs on CCU. Examples: – Network statistic overtime table – Network route change reporting – Network usage of nodes. (packets, tcp/udp connection counts) – Node’s system state overtime table – Node’s configuration change table
Collaborative Analysis Group up specify information in the iface_?data_tables and ninfo_?data_tables to generate special tables for user viewing/analysis. Examples – Network connectivity graph – Network graph between two node or route – Network snapshot table – All nodes’ current network statistic table – All nodes’ current state table
Interface to View Analysis Web interface – Viewable under web browser Web session – Display analysis – Take action input from user Java Servlet + JSP – Security control – Data Objects map with tables in collaborative analysis
Action Daemon capable of receiving and executing binary programs from CCU Command module issues command in response to certain events – Add pseudo interface to a host – Reroute a host – Initialize new program – Etc.
Security OpenSSL encryption EchoMe Daemon Run as nobody System Modification Program needs to do suexec (ROOT PASSWORD requires)
System Stat Table
Transcripts for SNMP Router Discovery …… Iterating neighbors of IP address: (A9E533A1) IP address: (A9E533E9) IP address: (A9E533A5) IP address: (A9E533A7) IP address: (A9E533A8) IP address: (A9E53221) IP address: (A9E53281) IP address: (A9E533A6) IP address: (A9E533A9) IP address: (A9E533EA) In getIPRouteTable. nHops = 8 Setting target to ……
Partial Router Connectivity on Millennium Discovered by SNMP
Conclusion Information collection methods feasible Automatic discovery Comprehensive and accurate information about system Needs user feedback
Future Work More (or less) features based on user feedback User interface More on information analysis and decision making Fully deploy on millennium