1 OFED Management Tools Ira Weiny Lawrence Livermore National Lab OFED Developer Workshop November 16, 2007.

Slides:



Advertisements
Similar presentations
Lab Access Connection Test
Advertisements

COMMIT Workshop CORC Implementers Meeting 20 th November 2014.
File Server Organization and Best Practices IT Partners June, 02, 2010.
Module 12: Microsoft Windows 2000 Clustering. Overview Application of Clustering Technology Testing Tools.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA UNCLASSIFIED IB Monitoring Through the Console Jesse Martinez Los.
Wireless and Switch Security NETS David Mitchell.
1 WebManager: Transforming a Network Management Application Into a Component-Based Framework Sauvé, Coutinho, Almeida, Souza, Duarte 2001.
2008/7/3 NanoMon: An Adaptable Sensor Network Monitoring Software Misun Yu, Haeyong Kim, and Pyeongsoo Mah Embedded S/W Research Division Electronics and.
Honeypot An instrument for attracting and detecting attackers Adapted from R. Baumann.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Maintaining and Updating Windows Server 2008
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 14: Troubleshooting Windows Server 2003 Networks.
2. Setting Up Your Android Development Environment.
2006 Sonoma Workshop January 2006 Pre-Plugfest Interop Session Tuan Phamdo – Intel – Co-Chair IBTA CIWG Sujal Das - Director, SW Product Mgmt, Mellanox.
Implementing High Availability
Understanding and Managing WebSphere V5
Virtual Machine Management
CONTENTS:-  What is Event Log Service ?  Types of event logs and their purpose.  How and when the Event Log is useful?  What is Event Viewer?  Briefing.
©2010 Check Point Software Technologies Ltd. | [Unrestricted] For everyone Endpoint Security Current portfolio and looking forward October 2010.
Chapter 13: Sharing Printers on Windows Server 2008 R2 Networks BAI617.
Installing Samba Vicki Insixiengmay Jonathan Krieger.
HORIZONT 1 TWS/WebAdmin The Web Interface for TWS Release Notes HORIZONT Software for Datacenters Garmischer Str. 8 D München Tel ++49(0)89 / 540.
ITIS 2110 Class # No home network devices devices devices devices devices devices devices 9.
TUTORIAL # 2 INFORMATION SECURITY 493. LAB # 4 (ROUTING TABLE & FIREWALLS) Routing tables is an electronic table (file) or database type object It is.
Module 14: Configuring Print Resources and Printing Pools.
Final Report Workshop in Information Security – Distributed Databases Project Access Control Security vs. Performance By: Yosi Barad, Ainat Chervin and.
Infiniband subnet management Discuss the Infiniband subnet management system Discuss fat tree and subnet management in an Infiniband with a fat tree topology.
Module 7: Fundamentals of Administering Windows Server 2008.
Scalable name and address resolution infrastructure -- Ira Weiny/John Fleck #OFADevWorkshop.
Update on Scalable SA Project #OFADevWorkshop Hal Rosenstock Mellanox Technologies.
Module 10: Maintaining High-Availability. Overview Introduction to Availability Increasing Availability Using Failover Clustering Standby Servers and.
InfiniBand in the Lab Erik 1.
Maintaining and Updating Windows Server Monitoring Windows Server It is important to monitor your Server system to make sure it is running smoothly.
CN2140 Server II Kemtis Kunanuraksapong MSIS with Distinction MCT, MCITP, MCTS, MCDST, MCP, A+
Red Hat RDMA Integration and Testing Processes Doug Ledford.
Management Tools Development related to DoE Hal Rosenstock.
Vinay Paul. CONTENTS:- What is Event Log Service ? Types of event logs and their purpose. How and when the Event Log is useful? What is Event Viewer?
OFED 1.3 InfiniBand Management Update Hal Rosenstock.
Cluster Software Overview
Stairway to the cloud or can we take the highway? Taivo Liik.
Information Security 493. Lab # 4 (Routing table & firewalls) Routing tables is an electronic table (file) or database type object that is stored in a.
Testing in Android. Methods Unit Testing Integration Testing System Testing Regression Testing Compatibility Testing Black Box (Functional) White Box.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
Module 10: Windows Firewall and Caching Fundamentals.
OFED 1.2 Management Update Hal Rosenstock.
Cs423-cotter1 Windows Operating Environment. cs423-cotter2 Windows Operating Environment 32 bit operating environment – Windows XP Microsoft Visual Studio.net,.net2005,
OFA-IWG Interop Event April 2007 Rupert Dance Lamprey Networks Sonoma Workshop Presentation.
Firmware (CLP-310 Series).
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.
Linux Management Enhancements Hal Rosenstock.
Day 15 Apache. Being a web server Once your system is correctly connected to the network, you could be a web server. –When you go to a web site such as.
Printing solutions as easy as Cosmos Mono M6 Firmware Update Guide Digital Printing Division Samsung Electronics Company, Ltd.
Copyright © New Signature Who we are: Focused on consistently delivering great customer experiences. What we do: We help you transform your business.
Maintaining and Updating Windows Server 2008 Lesson 8.
Slide 1 © 2016, Lera Technologies. All Rights Reserved. Oracle Data Integrator By Lera Technologies.
MCSA Windows Server 2012 Pass Upgrading Your Skills to MCSA Windows Server 2012 Exam By The Help Of Exams4Sure Get Complete File From
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
L14 - Speed Integration with Ethernet-enabled CENTERLINE® MCCs, Rockwell Software Studio 5000® and IntelliCENTER® Software.
Monitoring and Fault Tolerance
Docker Birthday #3.
Chapter 2: System Structures
VoIP Management and Control
1Y0-253 Exam Implementing Citrix NetScaler 10.5 for App and Desktop Solutions
Introduction to Opnet Mobile Networks Introduction to Opnet
A Web-based Integrated Console for Controlling a Set of Networks
Automating Security in the Cloud
Thales Alenia Space Competence Center Software Solutions
The bios.
Setting up PostgreSQL for Production in AWS
Presentation transcript:

1 OFED Management Tools Ira Weiny Lawrence Livermore National Lab OFED Developer Workshop November 16, 2007

2 Clusters Peloton: Zeus 288; Rhea 576; Atlas 1152; Minos 864 Visualization: Gauss 257; Prism 129; Mobius 17; Vertex 17; Stagg 10; Boole 6; Grant 6 Total Infiniband connected nodes at LLNL: 3322  Not including test resources  And more on the way!

3 LLNL OFED improvements node-name-map support in diags/OpenSM Performance Manager OpenSM event plugin (libopensmskummeeplugin) OpenSM console (working on secure connection)

4 node-name-map for better logging BEFORE SUBNET UP...Found 3 Xmit Discards in 5 sec on node 0x2c e64 port 1...Found 2 Xmit Discards in 5 sec on node 0x2c port 1...Found 2 Xmit Discards in 5 sec on node 0x2c ec port 1 AFTER SUBNET UP...Found 3 Xmit Discards in 5 sec on wopri (0x2c e64) port 1...Found 2 Xmit Discards in 5 sec on wopr4 (0x2c ) port 1...Found 2 Xmit Discards in 5 sec on wopr3 (0x2c ec) port 1

5 OpenSM PerfMgr OpenSM $ perfmgr Performance Manager status: state : Enabled sweep state : Sleeping sweep time : 5s outstanding queries/max : 0/500 loaded event plugin : opensmskummeeplugin OpenSM $ help perfmgr perfmgr [enable|disable|clear_counters|dump_counters|sweep_time[seconds]] perfmgr -- print the performance manager state [enable|disable] -- change the perfmgr state [sweep_time] -- change the perfmgr sweep time [clear_counters] -- clear the counters stored [dump_counters [mach]] -- dump the counters (optionally in [mach]ine readable format) OpenSM $

6 Skummee Skummee is an open source, web based cluster monitoring package.

7 libopensmskummeeplugin mysql> select name,port,xmit_data,rcv_data from port_data_counters,nodes where port_data_counters.guid=nodes.guid; | name | port | xmit_data | rcv_data | | wopri | 1 | | | | MT25218 InfiniHostEx Mellanox Technologies | 1 | | | | wopr4 | 1 | | | | MT25218 InfiniHostEx Mellanox Technologies | 1 | | | | wopr3 | 1 | | | | wopr5 | 1 | | | | SW1 wopr ISR9024D (MLX4 FW) | 1 | | | | SW1 wopr ISR9024D (MLX4 FW) | 2 | | | | SW1 wopr ISR9024D (MLX4 FW) | 3 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 4 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 5 | | | | SW1 wopr ISR9024D (MLX4 FW) | 6 | | | | SW1 wopr ISR9024D (MLX4 FW) | 7 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 8 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 9 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 10 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 11 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 12 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 13 | | | | SW1 wopr ISR9024D (MLX4 FW) | 14 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 15 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 16 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 17 | | | | SW1 wopr ISR9024D (MLX4 FW) | 18 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 19 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 20 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 21 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 22 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 23 | 0 | 0 | | SW1 wopr ISR9024D (MLX4 FW) | 24 | 0 | 0 | rows in set (0.00 sec)

8 Issues Diags are better now, but still need work Require sweeping the network  Ok for diagnosing some problems but can be time consuming and increase load for normal monitoring. Subnet must be “up” for tools to work

9 Possible Solutions Integrate more with OpenSM  OpenSM knows more about the subnet, leverage this information for “normal” monitoring  Use event plugin and console Improve diags through the use of out of band information  At LLNL this involves the use of an ethernet “management” network  Other solutions may be to use known subnet configuration to compare against

10 Where's the code? Still can be hard to determine actual source for OFED kernel ofed_makedist.sh is a BIG help! However, how do we know if it is pulling the correct OFED version?

11 Thanks to Hal Rosenstock (Xsigo) Sasha Khapyorsky (Voltaire) Tim Meier (LLNL) Al Chu (LLNL)