Cisco UCS Hardware Monitoring

Cisco UCS Hardware Monitoring
This technical material presents the Sentry Software’s hardware monitoring offering BMC ProactiveNet Performance Management for Cisco UCS

Sentry’s Hardware and Storage Monitoring solution
Service Assurance Monitoring solution Runs “within” BPPM Complements BMC BSM BMC exclusive Technology PATROL Performance Manager ProactiveNet

Monitoring the Hardware of a Server
Critical devices Processors Memory modules Network cards Link monitoring Traffic Environment Temperature Cooling Power supplies Energy Usage Disks Controllers Physical disks RAIDs

Features Inventory Monitoring Diagnosis Reporting Capacity Report
Discover all of the internal components of servers, disk arrays, fiber switches and tape libraries. Perform an inventory with detailed information about each device’s characteristics. Monitoring Disks: RAID controllers, hard disks, RAIDs, failure prediction, availability of the volumes. Environment: temperature, internal voltages, power supplies, fans. Critical components: processors, memory modules, ECC errors, failure prediction. Network links: network adapters, link loss, negotiated speed, data traffic, bandwidth utilization. Diagnosis Provides details about each monitored component in order to facilitate its replacement should a failure occurs (vendor, model, serial number, part number, FRU number, location in the chassis) Full hardware health reports display detailed information regarding failures, their consequences and how to fix them. Reporting Ethernet traffic report: visualize the network traffic on each port, in MB/sec or the total amount of data that transited, in and out, in GB per hour or per day. SAN traffic report: visualize the SAN traffic from the fiber switch, for each FC port, in MB/sec or the total amount of data that transited, in and out, in GB per hour or per day. Capacity Report Convenient report detailing the capacity of the monitored system: number of physical CPUs, amount of memory, overall size of disks and volumes, number of connected ports Power Consumption Live monitoring (Watts) for all servers and storage devices Energy Usage reports (kiloWatts hours) on a daily basis The key features of our Cisco UCS hardware monitoring solution: Automatic discovery of components Monitoring of critical hardware devices Full hardware report to help administrators identify root causes and how to fix them Report of SAN and Network traffic to check for bottlenecks Capacity reports for capacity planning Power consumption and temperature monitoring to reduce the electricty bill

Monitoring Cisco UCS C-Series (rack-mount)
Instrumentation IPMI Environment Processors, memory modules LEDs Power consumption Disks WMI/SNMP Network cards and traffic Prerequisites UCS C-Series running Windows Microsoft’s IPMI provider for WMI WMI or Windows SNMP MIB-2 Agent UCS C-Series running Linux ipmitool Linux commands or Linux SNMP MIB-2 Agent Out-of-band monitoring Through Cisco Integrated Management Controller (IMC) with remote IPMI Cisco rack-mount servers are high-performance standard PC servers, running Windows or Linux, instrumented with a few standard protocols: IPMI, WMI or SNMP and some LSI-specific components. On Windows, Sentry’s hardware monitoring solution will rely on WMI, Microsoft’s IPMI WMI provider to monitor the environment (temperature, fans, power supplies, disks, LEDs, etc.). The monitoring of the NICs is done through the Windows NDIS provider for WMI or through the Windows SNMP MIB-2 Agent. On Linux, Sentry’s hardware monitoring solution will rely on the OpenIPMI driver and ipmitool, an official Linux utility, to monitor the environment (temperature, fans, power supplies, disks, LEDs, etc.). The monitoring of the NICs is done through some Linux commands or through the Linux SNMP MIB-2 Agent. It is possible to monitor a Cisco UCS C-Series rack-mount server out-of-band through its “Integrated Management Controller” (IMC), using remote IPMI. The IMC needs to be properly configured on the network and remote IPMI enabled. While less detailed that the in-band monitoring, this solution still gives a complete picture of the hardware health of the C-Series server. Please note that UCS B-Series servers don’t come with an IMC.

Cisco UCS C-Series running Windows
Summary of the monitored items in a Cisco UCS C-Series rack-mount servers: processor status memory modules status Temperature Voltages Fans power supplies full network monitoring disk controller Disks LEDs

Cisco UCS C-Series running Linux
Summary of the monitored items in a Cisco UCS C-Series rack-mount servers: processor status memory modules status Temperature Voltages Fans power supplies full network monitoring disk controller Disks LEDs

Cisco UCS C-Series Out-of-Band
Summary of the monitored items in a Cisco UCS C-Series rack-mount servers: Processor status Memory modules status Temperature Voltages Fans Power supplies Disks LEDs

Cisco UCS C-Series running VMWare ESXi

Monitoring Cisco UCS B-Series (Blades)
Instrumentation Through the Fabric Interconnect Switch Native UCS XML API Blade enclosure Powering, cooling, temperature sensors Overall power consumption Status of each blade server Fabric Interconnect Switch Powering, cooling, temperature Power consumption Ethernet and fiber links (status, speed) Traffic monitoring (in, out, MB/sec and GB/day) Blade servers (in the chassis) Very much like regular servers Without power supplies, network cards Instrumented like a UCS C-Series server (Windows, Linux or VMware ESXi) The Cisco UCS B-Series infrastructure consists of: A main chassis (UCS 5100) with the blade servers (B-Series) A couple of fabric interconnect switches. The switches are responsible for linking of the blade servers to the LAN and to a SAN (optional). It’s one switch that is capable of handling both traffics on the same backplane (actually everything is 10Gb/s Ethernet, and SAN traffic is encapsulated into Ethernet frames). The switch is also responsible for the management of the entire platform. UCS Manager, the Cisco built-in administration tool, is actually a Web application running on the switch itself. It gives visibility on the health of the main chassis (temperature, cooling, powering), the health of the interconnect switches (temperature, cooling, powering, connectivity) and an overall status of each blade server. In order to cover the entire UCS B-Series platform, Sentry’s hardware monitoring solution connects to the switch (through Cisco’s native UCS XML API) to gather all metrics related to the main chassis and the switch. The product also needs to connect to each blade server individually in order to gather internal metrics are not available through UCS Manager: storage subsystem, network traffic, a few environmental parameters. Various instrumentation standards are leveraged on the B-Series blade servers to assess the health of their internal hardware components: IPMI, WMI, and SSH.. In essence, Sentry’s hardware monitoring solution integrates Cisco UCS Manager into BMC Performance Manager: every metrics and status that is available in UCS Manager’s GUI is made available in the BMC framework, and thus can be leveraged for reporting, proactive alerting, event correlation, service impact management, etc. In addition to that, Sentry’s hardware monitoring solution also provides an in-depth of the internal hardware components of the blade servers (which is not available in UCS Manager): processor status, memory module status, disk controllers, RAIDs, physical disks, network traffic, etc.

Cisco UCS B-Series (Blade Servers)
Summary of the monitored items: Cisco UCS switch: powering, cooling, temperatures, status of each port, link failure detection, link downgrade detection, traffic reporting, power consumption Cisco UCS chassis: powering, cooling, temperatures (external and internal), status of each blade Cisco UCS B-Series blade servers: processor status, memory modules status, temperature, voltages, full network monitoring, disk controller, disks and RAIDs

How it works internally
Initialization Checks protocol availability with specified credentials SNMP, WMI, WBEM, IPMI, UCS XML API Platform detection Tests each of the connectors against the monitored system B-Series, C-Series Windows, Linux, VMware Builds a “detected connectors” list Discovery Discovers hardware pieces Detects “missing” components Sets alert thresholds on all parameters Activates/deactivates parameters depending on avail information Collection Collects the value of each parameter Executes “Alert Actions” when a threshold is breached Sentry’s Hardware Monitoring products and the Cisco UCS-specific code work in a very similar way: Initialization of the product, configuration load, and availability check for various standard monitoring protocols with the credentials provided by the user Testing of platform detection for every connector against the platform to determine which instrumentation technology is present “underneath” The discovery relies on the detected connectors in order to discover the hardware pieces of the monitored system: processors, cards, disks, sensors, etc. The discovery also enable or disable parameters depending on the information available on the system and automatically sets all the thresholds. Last, the discovery triggers an alert when a device is no longer discovered and marks it as “missing” The collection process is executed every 2 minutes and retrieve the status of all devices and execute alert actions when failure is detected

Use Case: Monitoring the Hardware of a Server
Same module for all servers Cisco UCS B-Series, C-Series, Chassis and Interconnect Windows, Linux, VMware Same classes and parameters  Easy integration Comprehensive Temperature sensors, fans, power supplies, controllers, processors, memory modules, network cards, disks, HBAs, etc. Versatile SNMP, WMI, WBEM, Telnet, SSH, command lines Upon hardware failure, a standard alert is generated The alert contains a full text description of the problem Short description, status reported by the device Value of the various parameters and thresholds Possible consequences and recommended action Help to identify and replace the faulty device

Use Case: Instrumentation Failures
Our product relies on various protocols and instrumentation layers If one protocol or instrumentation layer fails… The associated “connector” goes into alarm The objects monitored through this protocol/instrumentation layer go “offline” No hardware alert is generated Helps you sort out real hardware problems from monitoring problems

Use Case: Hardware Inventory
Our product discovers and reports on the real hardware components Real number of CPUs (no cores, no hyper-threading) Real number of memory modules Real amount of physical disks (not only what is seen by the OS) Real number of network interfaces (not only the ones configured) Link Speed for network interfaces Other additional information Benefits True hardware inventory (not just what is seen by the OS) Licensing based on the number of CPUs

Use Case: Monitoring the Traffic on the Interconnect
Monitoring of the Ethernet and FC traffic Internal and external For each port Status of the SFP Link speed and status Received and transmitted packets, error percentage Traffic (received and transmitted) Bandwidth utilization Reporting MB/sec Total amount of data in GB per day Benefits Identify big users (servers) Analyze the impact of the nightly backups, the mirroring Analyze the impact of the deployment of a new application Diagnose multi-pathing issues Identify disk arrays under hard pressure Etc.

Use Case: Reporting the Power Consumption in the Data Center
Live graph In Watts Report In kWh Per hour Per day Allow to calculate the actual energy cost of any system Benefits “If you can’t measure it, you can’t manage it” Identify power-hungry devices Estimate the cost reduction provided by virtualization, upgrades, etc. Charge-back application owners

Use Case: Warming the Data Center
Cooling costs = 50% of the electricity costs in the data center P.U.E. = 2.0+ Measure the temperature Internal CPUs Ambient Compare to the alert thresholds Let you find the optimal temperature in the datacenter Higher temperature means less cooling 1 degree warmer means 5% cooling costs reduction

Not only Cisco… List of Supported Servers
UCS B-Series UCS C-Series Dell PowerEdge (Win, Linux) Blades HP ProLiant (Win, Linux) Integrity (Win, Linux, HP-UX) HP 9000 (HP-UX) NetServer (Win) SuperDome (HP-UX) BladeSystem AlphaServer (Tru64) OpenVMS IBM pSeries (AIX) eServer p5 (AIX) Netfinity (Win, Linux) xSeries (Win, Linux) BladeCenter Sun SPARC (sun4u) SPARC T1/T2 (sun4v) X64 (Solaris, Linux) Sun Fire F12K, F15K, etc. Sun Fire M4000, M9000, etc. Fujitsu-Siemens PRIMERGY (Win, Linux) PRIMEPOWER (Solaris) Blade BX

And also SAN Devices Disk arrays EMC Symmetrix EMC Clariion
HP EVA, HP StorageWorks XP, VA, EMA, MSA IBM DS3000, 4000, 6000, 8000 series Hitachi USP, AMS Filers NetApp Fiber switches Brocade Silkworm McData Cisco MDS Tape libraries Quantum/ADIC IBM HP StorageTek HBA Emulex QLogic

Sentry’s Hardware and Storage Monitoring Solution
Why BMC and Sentry Goal Improve uptime, optimize performance Lower IT costs Manage energy costs Where it Hurts Missed hardware failures Long time to resolve problems Hardware-related problems SAN-related problems Problems involve sysadmins, network admins and SAN admins. Hard to arbitrate Integrating the monitoring of a new platform is complex and time-consuming Energy expenses keep climbing every year Frustrating not to know the culprits and what to do about it Root Cause Lack of visibility on server and SAN hardware health Lack of visibility on SAN perf. Hardware instrumentation is vendor-specific and sometimes even lacking No per-device visibility on power consumption Solved by Sentry’s Hardware and Storage Monitoring Solution How Monitors the hardware Servers, disk arrays, SAN switches, tape libraries Disks, RAIDs, power supplies, NICs, HBAs, processors, etc. Discovery, Inventory, Monitoring, Diagnosis, Reporting, Data traffic monitoring Single solution for all Cisco B-Series, C-Series, all OSes. Monitors the power consumption On each server and SAN device, in Watts and kWh Works on 100% of IT Really? IBM (outsourcing) chose BMC+Sentry over their own solutions (Tivoli and Director) to make sure they meet their customer’s SLA criteria DELL chose Sentry to integrate their own “OpenManage” solution with BPM, BEM, etc.

This slide is designed to be an opening or closing slide
This slide is designed to be an opening or closing slide. This will allow the presenter to have a presentation cued up in slideshow mode without being on the title slide. The audience can take their seats, leave, or have open discussion with this slide up.

Cisco UCS Hardware Monitoring

Similar presentations

Presentation on theme: "Cisco UCS Hardware Monitoring"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cisco UCS Hardware Monitoring

Similar presentations

Presentation on theme: "Cisco UCS Hardware Monitoring"— Presentation transcript:

Similar presentations

About project

Feedback