Overview of cluster management tools Marco Mambelli – August 9 2011 OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

Slides:



Advertisements
Similar presentations
automated single login access to Novell storage resources
Advertisements

Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
Cut Costs and Increase Productivity in your IT Organization with Effective Computer and Network Monitoring. Copyright © T3 Software Builders, Inc 2004.
Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
Leveraging WinPE and Linux Preboot for Effective Provisioning Jonathan Richey | Director of Development | Altiris, Inc.
Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.
Server-Side vs. Client-Side Scripting Languages
Firefox 2 Feature Proposal: Remote User Profiles TeamOne August 3, 2007 TeamOne August 3, 2007.
Configurations Management System Chris Boyd.  Time consuming task of provisioning a number of systems with STIG compliance  Managing a number of systems.
Automating Linux Installations at CERN G. Cancio, L. Cons, P. Defert, M. Olive, I. Reguero, C. Rossi IT/PDP, CERN presented by G. Cancio.
DCS PowerEdge C Systems Management Overview PowerEdge C Product Group.
Monitoring Scale-Out with the MySQL Enterprise Monitor Andy Bang Lead Software Engineer MySQL-Sun, Enterprise Tools Team Wednesday, April 16, :15.
Linux Operations and Administration
INFSO-RI Quality Assurance with ETICS – multi- node automated testing CGW 09 M.Zurek, A. A. Rodriguez, A. Aimar, A. di Meglio, L. Dini CERN Krakow,
Conditions and Terms of Use
An Introduction to IBM Systems Director
Inventory:OCSNG + GLPI Monitoring: Zenoss 3
Weekly Report By: Devin Trejo Week of May 30, > June 5, 2015.
EDG LCFGng: concepts Fabric Management Tutorial - n° 2 LCFG (Local ConFiGuration system)  LCFG is originally developed by the.
Bright Cluster Manager Advanced cluster management made easy Dr Matthijs van Leeuwen CEO Bright Computing Mark Corcoran Director of Sales Bright Computing.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
Rocks ‘n’ Rolls An Introduction to Programming Clusters using Rocks © 2008 UC Regents Anoop Rajendra.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
FNAL System Patching Design Jack Schmidt, Al Lilianstrom, Andy Romero, Troy Dawson, Connie Sieh (Fermi National Accelerator Laboratory) Introduction FNAL.
XA R7.8 Link Manager Belinda Daub Sr. Technical Consultant 1.
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.
Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Performance Monitoring of SLAC Blackbox Nodes Using Perl, Nagios, and Ganglia Roxanne Martinez Mentor: Yemi Adesanya United States Department of Energy.
New Delhi, India Smokeping/Cacti/Munin SANOG 10 Workshop August 29-Sep 2 – New Delhi, India Hervey Allen.
1 PUPPET AND DSC. INTRODUCTION AND USAGE IN CONTINUOUS DELIVERY PROCESS. VIKTAR VEDMICH PAVEL PESETSKIY AUGUST 1, 2015.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Experiment Management System CSE 423 Aaron Kloc Jordan Harstad Robert Sorensen Robert Trevino Nicolas Tjioe Status Report Presentation Industry Mentor:
Microsoft Management Seminar Series SMS 2003 Change Management.
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
FEF Puppet Implementation Project Jason Allen 8/18/2010.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Module 12: Configuring and Managing Storage Technologies
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
Linux Operations and Administration
Tool Integration with Data and Computation Grid “Grid Wizard 2”
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.
Cloud Installation & Configuration Management. Outline  Definitions  Tools, “Comparison”  References.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
EGI-InSPIRE RI Pakiti Michal Prochazka, (Daniel Kouril)
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
Linux Basics Part 2. VIM Editor vi improved Installed on most Linux machines Can be a bit confusing at first... o Cheat sheets FTW Other popular editors:
1 Policy Based Systems Management with Puppet Sean Dague
Distributed Monitoring with Nagios: Past, Present, Future Mike Guthrie
April 1st, 2009 Cobbler Provisioning Made Easy Jasper Capel.
Puppet and Cobbler for the configuration of multiple grid sites
Application or server monitoring
Quattor Usage at Nikhef
Spacewalk and Koji at Fermilab
Drupal VM and Docker4Drupal For Drupal Development Platform
Drupal VM and Docker4Drupal as Consistent Drupal Development Platform
Module 01 ETICS Overview ETICS Online Tutorials
Linux Cluster Tools Development
Pete Gronbech, Kashif Mohammad and Vipul Davda
Presentation transcript:

Overview of cluster management tools Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO

Cluster Management Overview Management infrastructure Provisioning Configuration and software package management Monitoring 8/9/11 Overview of cluster management tools 2

Management Infrastructure Remote power-cycling and serial console access FEF has standardized on Avocent ACS serial console servers and Avocent PM line of PDU Other depts use IPMI or a mix of Avocent and APC products All depts have scripts to control and configure remote power- cycling and serial console access 8/9/11 Overview of cluster management tools 3

CMS Tier 1 Power Usage Plots CMS plots power utilization by querying PDUs using SNMP. This data can be particularly useful to datacenter managers 8/9/11 Overview of cluster management tools 4

Provisioning Tools  Preparing a system for use; OS installation and initial configuration Tools  PXE/Kickstart  Rocks  Cobbler  Perceus 8/9/11 Overview of cluster management tools 5

FEF’s PXE/Kickstart Setup FEF uses custom tool built on top of MySQL and Perl DCHP server modules No dhcpd restart required Web front-end for specifying kickstart / nodecombinations Very flexible Kickstart files are created dynamically based on selections from the web GUI Planning to eval Cobbler later this year. 8/9/11 Overview of cluster management tools 6

Rocks Clusters Open source Linux distribution based on CentOS Created in 2000 Created for easy deployment of large clusters Used by CMS Tier 1 at Fermilab for 2400 machines CMS can install ~500 systems in 1 hour Only one OS version per Rocks server may be a deal breaker for some 8/9/11 Overview of cluster management tools 7

Cobbler Cobbler is infrastructure to provision a node Cobbler is RedHat specific, uses kickstart Cobbler holds OS, profiles and system data A new system requires a profile, a MAC address and a name, nothing more. Provisions new system via PXE boot, via settings for individual system. CLI and Web GUI 8/9/11 8 Overview of cluster management tools

Cobler at MWT2 Cobbler is implemented through multiple services on a single server. It acts as a TFTP server for PXE booting, controls the repositories used for install, and provides DHCP/DNS support as well. 8/9/11 Overview of cluster management tools 9

Configuration and Package Management Tools that help manage system configuration (files, dirs, permissions, etc.) and software packages Popular open source solutions  Cfengine  Puppet  Bcfg2  Warewulf 8/9/11 Overview of cluster management tools 10

Cfengine First version released in 1993 Written in C Fairly easy to understand syntax Relatively easy to find sysadmins with experience FEF and UWM used for many years 8/9/11 Overview of cluster management tools 11

Puppet New generation of configuration management system Extensible, declarative language Understands dependencies (huge benefit) Better reporting than Cfengine Auto generation of documentation (think Javadoc) 8/9/11 Overview of cluster management tools 12

FEF Puppet Usage Management of all external mounts Kerberos files -- keytab files,.k5login, etc Package management (RPM sets grouped by cluster) NIC bonding configs Group quotas Grid host certs FEF_backup FEF avg is 325 actions per node every Puppet run 8/9/11 Overview of cluster management tools 13

Cfengine vs. Puppet (High Level) 8/9/11 Overview of cluster management tools 14

Puppet Add User Example 8/9/11 Overview of cluster management tools 15

Cfengine Add User Example 8/9/11 Overview of cluster management tools 16

Example Puppet Dependency Graph 8/9/11 Overview of cluster management tools 17

Puppet Dashboard Puppet dashboard is a web interface for quickly viewing puppet run status and the state of individual system configurations. 8/9/11 Overview of cluster management tools 18

Bcfg2 BCFG2 is an xml-based configuration management system Developed by Argonne National Lab Being used by CMS Tier 1 at Fermilab for 2 years to manage a limited number of configuration items. Not RPMS Developers are very responsive; provide support via mailing list and IRC Complex file manipulation can be tricky Does simple pre/post dependencies 8/9/11 Overview of cluster management tools 19

Bcfg2 Reporting System Node status and last run times are viewable from the Bcfg2 web interface 8/9/11 Overview of cluster management tools 20

Monitoring Tools to help monitor system state and performance In use at Fermilab:  Zabbix  Nagios  Ganglia  Many custom solutions using MRTG, RRDtool, etc 8/9/11 Overview of cluster management tools 21

Zabbix Being used by CMS Tier 1 to monitor approx 2.4K nodes; performs 100K checks. Does status and performance monitoring. Relatively new compared to Nagios. Most configuration is done via the web interface. Easy to add custom checks and alerts. 8/9/11 Overview of cluster management tools 22

Zabbix Dashboard Zabbix provides a polished web interface that displays finely grained status and performance information 8/9/11 Overview of cluster management tools 23

Nagios Used by FEF; 3.5K nodes, approx 30K checks on one server Around for many years. Create a new check by dropping shell script on node (check_mk plugin) Nagios support built-in to Puppet. Web interface can be slow and feels dated. 8/9/11 Overview of cluster management tools 24

Nagios Web Interface The Nagios web interface is functional but feels dated. Performance is an issue when monitoring many hosts 8/9/11 Overview of cluster management tools 25

OSG Survey The most used cluster management tool is Rocks, sometime customized Followed by Puppets, Cobbler and Cfengine. Users are happy with the performance of the tools they use, especially Rocks and Puppets. The difficulty of the first installation is average (sometime long or with some guesswork) to easy (works out of the box); same for the updates. 8/9/11 Overview of cluster management tools 26

OSG Survey (cont) The operation is automatic or requires simple documented tasks. Rocks seem the easiest to operate Puppet is the easiest to install/update Available documentation is good for Rocks, good to average (there could be more or sometime is confusing) for the others. There are some long time users of Rocks and Cfengine while Puppet gained popularity in recent times. 8/9/11 Overview of cluster management tools 27

Comparison of Puppet/Cfengine/Bcfg2  bin/ShowDocument?docid= bin/ShowDocument?docid=3967 Evaluation and feature comparison of the Nagios and Zabbix monitoring systems  bin/ShowDocument?docid= bin/ShowDocument?docid=3277 Survey about the setup of clusters in OSG  Available from OSG DOCDB 8/9/11 Overview of cluster management tools 28

Credits Thank you to Jason Allen, Head of Fermilab Experiments Facilities (FEF) Department in the Computing Division – this talk is based heavily on his Cluster Management talk at the 2011 OSG all hands meeting 8/9/11 Overview of cluster management tools 29

8/9/11 Overview of cluster management tools 30 ?!?!