Download presentation
Presentation is loading. Please wait.
Published byJacob Goodman Modified over 8 years ago
1
Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004
2
Update on Farm Control. 2 Domenico Galli Outline Test of Sub-Farm Monitor & Control software on Linux SLC3. PVSS Boot Manager. Changes in monitor PVSS Panels. IPMI-DIM power manager.
3
Update on Farm Control. 3 Domenico Galli Test of Sub-Farm Monitor & Control Software on Linux SLC3 Sub-farm monitor & control software (SFM 0.2) has been tested on Linux SLC3. lm_sensors package had to be recompiled and istalled in order to monitor temperatures and fans. SFM0.2 package works without recompiling. No imcompatibilities detected.
4
Update on Farm Control. 4 Domenico Galli Boot Manager A PVSS panel has been developed to configure the boot of the subfarms nodes controlled by a control PC. The panel allows to add/remove/configure the nodes of a sub-farm, by specifying hostname, MAC address and IP address. At present the panel write a text file containing the configuration of the nodes. The target is to write directly the DHCP table for the control PC.
5
Update on Farm Control. 5 Domenico Galli PVSS Monitor Panels A button has been added to all the monitor panels to configure the thresholds for warning & error state of the state machine. A PVSS script compare the monitored value with the threshold, and if it is exceeded, a state machine transition is triggered.
6
Update on Farm Control. 6 Domenico Galli PVSS Monitor Panels (II) If the button is pressed, a new panel is open, in which an “expert user” can set the alarm thresholds.
7
Update on Farm Control. 7 Domenico Galli IPMI (Intelligent Platform Management Interface) What IPMI can be useful for? Switching on/off the power supply of the farm nodes without using expensive network-controlled power distributors. Monitoring the power status of the farm nodes (on/off). Monitoring temperatures, fan speeds, power supply voltages, etc. in a OS-independent way. Accessing on-board event-log.
8
Update on Farm Control. 8 Domenico Galli IPMI Interfaces IPMI KCS (Keyboard Controller Style) interface (AKA open interface) Local interface (interface to the host OS), unauthenticated. Can be accessed through the openIPMI linux software. Can’t be used to swich on a PC or to power cycle a hung-up PC. LAN interface Network interface, session-based, authenticated. Designed to be always available (even when the system is powered down or when the OS is hung or inactive). Hardware implementation. OS independent.
9
Update on Farm Control. 9 Domenico Galli IPMI LAN Interface Server side (farm node): Harware implementation. NIC hardware redirects to BMC the Ethernet frames containing datagrams destined to UDP port 623. Configured by means of PC startup configuration utility. May use DHCP to set up network parameters. No need of additional software. Client side (control PC). Client software, e.g.: IPMItool, freeIPMI, IPMIsh linux software. Management Network Controller (BMC) Baseboard Management Controller Control PC (IPMI client) UDP port 623 LAN Farm node other Ethernet frames
10
Update on Farm Control. 10 Domenico Galli IPMI Power Commands on: power-up the chassis. off: power-down the chassis (without a clean shut- down of the OS). cycle: power-down, wait 1 second, and power-up again. soft_off: initiate a soft-shutdown of OS via ACPI by emulating a fatal over-temperature condition. hard_reset: pulse the system reset signal. pulse_diag: pulse a version of a diagnostic interrupt that goes directly to the processor(s). This is typically used to cause the operating system to do a diagnostic dump (OS dependent).
11
Update on Farm Control. 11 Domenico Galli DIM-IPMI Power Manager A Power Manager (based on IPMI and DIM) to switch on/off the power to the Farm Nodes is under development. Each Control PC runs a DIM server interfaced to IPMI and publishes, for each node, a command and a service. Control PC IPMI-DIM server SFN-001-01 BMC SFN-001-02 BMC SFN-001-03 BMC SFN-001-04 BMC SFN-001-05 BMC IPMI DIM Services: /SFN-001-01/power_status /SFN-001-02/power_status /SFN-001-03/power_status DIM Commands: /SFN-001-01/power_switch on|off|soft_off|cycle /SFN-001-02/power_switch on|off|soft_off|cycle /SFN-001-03/power_switch on|off|soft_off|cycle PVSS-DIM client PVSS GUI Farm Nodes DIM CMD-line client
12
Update on Farm Control. 12 Domenico Galli Status of DIM-IPMI Power Manager We started using with IPMItool’s libintf_lan.so library. Problems: IPMI response takes at least 0.7 s. In case of a disconnected node, timeout takes about 16 s. A complete cycle over 200 nodes, to update the farm power status, takes therefore 140-3200 s. Solution: Use one thread for each node to be contacted, in order to parallelize IPMI connections. But: libintf_lan.so library is not thread-safe (global variables, timeouts using signals+longjmp, etc.)
13
Update on Farm Control. 13 Domenico Galli Status of DIM-IPMI Power Manager (II) DONE IPMItool’s libintf_lan.so deeply hacked, in order to make it “more” thread-safe (no more global variables, no more signals & longjmps to time-out). A power manager DIM server and a command-line DIM client are ready and working (tested on a Dell PowerEdge SC 1425 without OS). TODO: Conflicts between commands and status monitor on the same node must be arbitrated by the DIM-IPMI server (if the NIC BMC is processing a command, it is not able to receive other commands). Add mutex to the library to protect non-thread-safe system/library calls (e.g. malloc, free, etc.).
14
Update on Farm Control. 14 Domenico Galli Power Manager Command-Line Client pwSwitch [-m hostname] on|off|(cycle|soft_off) N.B.: node lhcbcn2 is disconnected! command time out service time out command time out
15
Update on Farm Control. 15 Domenico Galli Power Manager PVSS Client Work in progress. Basically one PVSS panel showing: A list of the controlled nodes with their power status (on, off). Buttons for power on / off / soft_off / cycle / power_reset / pulse_diag.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.