Download presentation
Presentation is loading. Please wait.
Published byBertram Bates Modified over 8 years ago
1
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang
2
Challenge How can we manage all these machines?
3
Goal Aimed at networked system management Better tools for – Discovering system states – Enhancing system availability – Monitoring network and system statistics – Error detection and correction – Fault tolerance for specific network applications (such as web server)
4
Overview Distributed agents gathering information Centralized Control Unit (CCU) monitors and analyzes data. Takes control action if needed Script language for automatic decision making Web browser user interface
5
SNMP Tool EchoMe Daemon
6
Centralized Control Unit Information collection – Machine information – Network information Information analysis – Individual Machine analysis – Collaborative network analysis Action – System modification – Network routing
7
Information Collection Two approaches investigated – EchoMe Daemons running on hosts and reporting system information to server – SNMP to discover router connectivity and states Daemon mostly for collecting local information. Much more detailed SNMP for network connectivity
8
EchoMe Daemon 1. Automatically discover a node (node reporting stage) – EchoMe Daemon start up as machine boot – Send up OS type/machine info to CCU – Register a session in CCU 2. CCU sends to node a monitor program base on node’s OS/Machine type and execute it on the node. 3. Monitor program send up information packet periodically to CCU.
9
Router Connectivity Discovery by SNMP Routers implemented SNMP Program can run on any host within Millennium Given a router (can get from local host’s gateway information), query its ipRouteTable Traverse all its neighboring routers, performing the same query Recursion stops at specified distance
10
System Information Number and speed of the CPUs Total physical and swap memory Installed System Clock Uptime Kernel Version Percent CPU user, nice, system and idle One, five and fifteen minute load averages Number of running processes and total number of processes Amount of free, shared, buffered, cached and swap memory
11
Network Information Network Interfaces – /proc/dev or CTL_NET/AF_LINK – SNMP: interface.ifTable ARP cache – direct neighbors – /proc/arp or RTF_LLINFO – SNMP: ip.ipNetToMediaTable Route Table – /proc/route or NET_RT_DUMP – SNMP: ip.ipRouteTable
12
Information Analysis CCU a relational database Front end, parsing engine Individual Node Analysis Collaborative Analysis
13
Parsing Engine IPACKET is in standard XML format IPACKET use incremental update, new packet specifies differences from previous packet. Parsing Engine parses the IPACKET into objects and does the insertion to iface accordingly. DATA
14
IFACE Tables The client node register an unique nodeid in iface_node_table It starts a session for reporting information to CCU Each time, client node reports information by sending up an information packet. (ipacket) CCU process this packet, create an unique statement id from iface_index_table and parse information into each iface_?DATA_table.
16
Individual Node Analysis Clean up iface_?data_table by transferring and categorizing data into each nodes’ own data table. A background process runs on CCU. Examples: – Network statistic overtime table – Network route change reporting – Network usage of nodes. (packets, tcp/udp connection counts) – Node’s system state overtime table – Node’s configuration change table
17
Collaborative Analysis Group up specify information in the iface_?data_tables and ninfo_?data_tables to generate special tables for user viewing/analysis. Examples – Network connectivity graph – Network graph between two node or route – Network snapshot table – All nodes’ current network statistic table – All nodes’ current state table
18
Interface to View Analysis Web interface – Viewable under web browser Web session – Display analysis – Take action input from user Java Servlet + JSP – Security control – Data Objects map with tables in collaborative analysis
19
Action Daemon capable of receiving and executing binary programs from CCU Command module issues command in response to certain events – Add pseudo interface to a host – Reroute a host – Initialize new program – Etc.
20
Security OpenSSL encryption EchoMe Daemon Run as nobody System Modification Program needs to do suexec (ROOT PASSWORD requires)
21
System Stat Table
22
Transcripts for SNMP Router Discovery …… Iterating neighbors of 169.229.51.202.... IP address: 169.229.51.161(A9E533A1) IP address: 169.229.51.233(A9E533E9) IP address: 169.229.51.165(A9E533A5) IP address: 169.229.51.167(A9E533A7) IP address: 169.229.51.168(A9E533A8) IP address: 169.229.50.33(A9E53221) IP address: 169.229.50.129(A9E53281) IP address: 169.229.51.166(A9E533A6) IP address: 169.229.51.169(A9E533A9) IP address: 169.229.51.234(A9E533EA) In getIPRouteTable. nHops = 8 Setting target to 169.229.51.234 ……
23
Partial Router Connectivity on Millennium Discovered by SNMP 169.229.48.1 169.229.51.226 169.229.51.161 169.229.51.165 128.32.44.10 128.32.44.1 169.229.51.169 169.229.51.233 169.229.51.167 169.229.51.133 169.229.51.198
24
Conclusion Information collection methods feasible Automatic discovery Comprehensive and accurate information about system Needs user feedback
25
Future Work More (or less) features based on user feedback User interface More on information analysis and decision making Fully deploy on millennium
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.