Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks.

Similar presentations


Presentation on theme: "Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks."— Presentation transcript:

1 Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks to Massimiliano Masi at CERN

2 IPMI at RAL Tier1 At RAL Tier1 we are just beginning rolling out significant use of IPMI In our new building were able to implement a separate management network for IPMI, APC PDUs etc

3 What and Why Started in 1998, IPMI is now at revision 2.0

4 What and Why Started in 1998, IPMI is now at revision 2.0 Is a standard accepted by DELL, IBM, SUN, INTEL and many others including SuperMicro of course

5 What and Why Started in 1998, IPMI is now at revision 2.0 Is a standard accepted by DELL, IBM, SUN, INTEL and many others including SuperMicro of course Goal 1: IPMI is a spec for monitoring and controlling the machine via special hardware, the Baseboard Management Controller, BMC

6 What and Why Started in 1998, IPMI is now at revision 2.0 Is a standard accepted by DELL, IBM, SUN, INTEL and many others including SuperMicro of course Goal 1: IPMI is a spec for monitoring and controlling the machine via special hardware, the Baseboard Management Controller, BMC Goal 2: Serial Over Lan (SOL). This is a method to redirect serial connections over an ethernet cable. Many cards now also provide KVM over LAN – eliminating need for expensive network KVMs!

7 What and Why? Major IPMI concepts: Sensors (Fans speed, CPU Temperature, voltage) Events (What the BMC should do when the CPU temperature reach 100 degrees? SNMP Traps) SDR (Sensor data repository, where the data are collected) SEL (System Event Log, a log of all critical situation) Session (Between the client and the BMC)

8 What and Why? SECURITY Can define users Can define privileges Can encrypt communication with BMC The security depends on the version of the specification

9 What and Why? SECURITY Can define users Can define privileges Can encrypt communication with BMC The security depends on the version of the specification Version 2.0: RMCP/RMCP+: based on RAKP messages (HMAC like protocol) Serial-Over-Lan is encrypted with RMCP+ only

10 Manufacturers provide GUIs

11 Open source tools OpenIPMI (ipmitool) Lmsensors Freeipmi (no drivers)

12 ipmitool sensor local output [root@lcg0954 ~]# ipmitool sensor CPU Temp 1 | 35.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 2 | 34.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 3 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 4 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 Sys Temp | 31.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU1 Vcore | 1.184 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 CPU2 Vcore | 1.192 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 3.3V | 3.264 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 5V | 4.920 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 12V | 11.712 | Volts | ok | 10.464 | 10.560 | 10.656 | 13.344 | 13.440 | 13.536 1.5V | 1.488 | Volts | ok | 1.296 | 1.312 | 1.328 | 1.664 | 1.680 | 1.696 5VSB | 4.896 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 VBAT | 3.280 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 Fan1 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan2 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan3 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan4 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan5 | 10400.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan6 | 8800.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan7 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Fan8 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Power Supply | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU0 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU1 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU Overheat | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip0 | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip1 | 0x0 | discrete | 0x0000| na | na | na | na | na | na

13 ipmitool sensor remote output # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ sensor get'CPU1 Temp Password: Locating sensor record... Sensor ID : CPU1 Temp (0x0)Entity ID : 3.0 Sensor Type (Discrete): OEM reserved #c0

14 ipmitool sensor remote output # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ sensor get'CPU1 Temp Password: Locating sensor record... Sensor ID : CPU1 Temp (0x0)Entity ID : 3.0 Sensor Type (Discrete): OEM reserved #c0 Note that the lanplus option encrypts communication including passwords

15 ipmitool remote power control # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ power off Password: Chassis Power Control: Down/Off

16 ipmitool remote power control # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ power off Password: Chassis Power Control: Down/Off # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ power status Password: Chassis Power is off

17 ipmitool remote power control # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ power off Password: Chassis Power Control: Down/Off # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ power status Password: Chassis Power is off # ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \ power on Password: Chassis Power Control: Up/On

18 ipmitool remote power control = less visits to the machine room!

19 ipmitool serial over lan # ipmitool -I lanplus -H 172.16.176.143 –U \ ADMIN sol activate Password: [SOL Session operational. Use ~? for help] Scientific Linux SL release 4.6 (Beryllium) Kernel 2.6.9-78.0.22.ELsmp on an i686 gdss328.gridpp.rl.ac.uk login:

20 ipmitool serial over lan = even less visits to the machine room!

21 ipmitool sensor local output [root@lcg0954 ~]# ipmitool sensor CPU Temp 1 | 35.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 2 | 34.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 3 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 4 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 Sys Temp | 31.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU1 Vcore | 1.184 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 CPU2 Vcore | 1.192 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 3.3V | 3.264 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 5V | 4.920 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 12V | 11.712 | Volts | ok | 10.464 | 10.560 | 10.656 | 13.344 | 13.440 | 13.536 1.5V | 1.488 | Volts | ok | 1.296 | 1.312 | 1.328 | 1.664 | 1.680 | 1.696 5VSB | 4.896 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 VBAT | 3.280 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 Fan1 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan2 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan3 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan4 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan5 | 10400.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan6 | 8800.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan7 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Fan8 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Power Supply | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU0 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU1 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU Overheat | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip0 | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip1 | 0x0 | discrete | 0x0000| na | na | na | na | na | na

22 Gathering IPMI metrics in Ganglia Perl script runs ipmitool sensor and pulls out non null values Metric labels vary with manufacturer and specific BMC Test deployment at: http://ganglia.gridpp.rl.ac.uk/ganglia/?m=load_one&r=hour&s=desce nding&c=Workers_SL4&h=lcg0954.gridpp.rl.ac.uk

23 Future Our new hardware has BMCs that support KVM over lan as well – with SuperMicros web interface The data gathered by Ganglia can be mined for very granular information about the conditions in the machine room – indicating airflow problems etc. Useful in diagnosing hardware problems after the event Configure snmp traps for alarms


Download ppt "Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks."

Similar presentations


Ads by Google