Quick Overview of NPACI Rocks Philip M. Papadopoulos Associate Director, Distributed Computing San Diego Supercomputer Center.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

1 Applications Virtualization in VPC Nadya Williams UCSD.
VMWare to Hyper-V FOR SERVER What we looked at before migration  Performance – Hyper-V performs at near native speeds.  OS Compatibility – Hyper-V.
2 © 2004, Cisco Systems, Inc. All rights reserved. IT Essentials I v. 3 Module 4 Operating System Fundamentals.
INSTALLING LINUX.  Identify the proper Hardware  Methods for installing Linux  Determine a purpose for the Linux Machine  Linux File Systems  Linux.
Leveraging WinPE and Linux Preboot for Effective Provisioning Jonathan Richey | Director of Development | Altiris, Inc.
Building on the BIRN Workshop BIRN Systems Architecture Overview Philip Papadopoulos – BIRN CC, Systems Architect.
Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul , Korea {heaven, psiver,
© UC Regents 2010 Extending Rocks Clusters into Amazon EC2 Using Condor Philip Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer.
1 Web Server Administration Chapter 3 Installing the Server.
Lesson 4-Installing Network Operating Systems. Overview Installing and configuring Novell NetWare 6.0. Installing and configuring Windows 2000 Server.
(NHA) The Laboratory of Computer Communication and Networking Network Host Analyzer.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Installing software on personal computer
Automating Linux Installations at CERN G. Cancio, L. Cons, P. Defert, M. Olive, I. Reguero, C. Rossi IT/PDP, CERN presented by G. Cancio.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
Ch 11 Managing System Reliability and Availability 1.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
SUSE Linux Enterprise Desktop Administration Chapter 1 Install SUSE Linux Enterprise Desktop 10.
Rocks Clusters SUN HPC Consortium November 2004 Federico D. Sacerdoti Advanced CyberInfrastructure Group San Diego Supercomputer Center.
N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks.
Rocks cluster : a cluster oriented linux distribution or how to install a computer cluster in a day.
Linux Operations and Administration
NMS Labs Mikko Suomi LAB1 Choose SNMP device managment software Features: –Gives Nice overview of network –Bandwith monitoring –Multible.
Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.
© 1999 Cobalt Networks, Inc. (\dkh) Cobalt NASRaQ for the Technical Overview.
Guide to Linux Installation and Administration, 2e1 Chapter 3 Installing Linux.
Chapter 7 Microsoft Windows XP. Windows XP Versions XP Home XP Home XP Professional XP Professional XP Professional 64-Bit XP Professional 64-Bit XP Media.
Managing Mature White Box Clusters at CERN LCW: Practical Experience Tim Smith CERN/IT.
An Introduction to IBM Systems Director
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
Weekly Report By: Devin Trejo Week of May 30, > June 5, 2015.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
Rocks ‘n’ Rolls An Introduction to Programming Clusters using Rocks © 2008 UC Regents Anoop Rajendra.
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Notes from Installing a Mac G5 Cluster at SLAC Chuck Boeheim SLAC Computing Services.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
Deploying a Network of GNU/Linux Clusters with Rocks / Arto Teräs Slide 1(18) Deploying a Network of GNU/Linux Clusters with Rocks Arto Teräs.
Status of Florida Tier2 Center A mini tutorial on ROCKS appliances Jorge L. Rodriguez February 2003.
Microsoft Management Seminar Series SMS 2003 Change Management.
Terascala – Lustre for the Rest of Us  Delivering high performance, Lustre-based parallel storage appliances  Simplifies deployment, management and tuning.
Cluster Software Overview
RAL Site report John Gordon ITD October 1999
Cisco Confidential © 2012 Cisco and/or its affiliates. All rights reserved. 1 Cisco UCS Director – Carmel (5.0) Ravikumar Pisupati Senior Manager, Engineering.
© 2008 Cisco Systems, Inc. All rights reserved.CIPT1 v6.0—1-1 Getting Started with Cisco Unified Communications Manager Installing and Upgrading Cisco.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.
Chapter 8: Installing Linux The Complete Guide To Linux System Administration.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
2: Operating Systems Networking for Home & Small Business.
Automating Installations by Using the Microsoft Windows 2000 Setup Manager Create setup scripts simply and easily. Create and modify answer files and UDFs.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Chapter 4: server services. The Complete Guide to Linux System Administration2 Objectives Configure network interfaces using command- line and graphical.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland M.Schröder, Hepix Vancouver 2011 OCS Inventory at CERN Matthias Schröder (IT-OIS)
BY: SALMAN 1.
Leveraging SCCM: Brockport’s Journey to Software Deployment and Image Automation Thomas Calandra Stephen Lane.
Create setup scripts simply and easily.
Guide to Linux Installation and Administration, 2e
BY: SALMAN.
Large Scale Parallel Print Service
Printer Admin Print Job Manager
Presentation transcript:

Quick Overview of NPACI Rocks Philip M. Papadopoulos Associate Director, Distributed Computing San Diego Supercomputer Center

Seed Questions Do you buy-in installation services? From the supplier or a third-party vendor? –We integrate. Easier to have vendor integrate larger clusters Do you buy pre-configured systems or build your own configuation? –Rocks is adaptable to many configurations Do you upgrade the full cluster at one time or in rolling mode? –Suggest all at once (very quick with Rocks) can be done as a batch job. –Can support rolling, if desired. Do you perform formal acceptance or burn-in tests? –Unfortunately, no. Need more automated testing.

Installation/Management Need to have a strategy for managing cluster nodes Pitfalls –Installing each node “by hand” Difficult to keep software on nodes up to date –Disk Imaging techniques (e.g.. VA Disk Imager) Difficult to handle heterogeneous nodes Treats OS as a single monolithic system –Specialized installation programs (e.g. IBM’s LUI, or RWCPs Multicast installer) – let Linux packaging vendors do their job Penultimate –RedHat Kickstart Define packages needed for OS on nodes, kickstart gives a reasonable measure of control. Need to fully automate to scale out (Rocks gets you there)

Scaling out Evolve to management of “two” systems –The front end(s) Log in host User’s home areas, passwords, groups Cluster configuration information –The compute nodes Disposable OS image Let software manage node heterogeneity Parallel (re)installation Data partitions on cluster drives untouched during re-installs Cluster-wide configuration files derived through reports from a MySQL database (DHCP, hosts, PBS nodes, …)

NPACI Rocks Toolkit – rocks.npaci.edu Techniques and software for easy installation, management, monitoring and update of clusters Installation –Bootable CD + floppy which contains all the packages and site configuration info to bring up an entire cluster Management and update philosophies –Trivial to completely reinstall any (all) nodes. –Nodes are 100% automatically configured Use of DHCP, NIS for configuration –Use RedHat’s Kickstart to define the set of software that defines a node. –All software is delivered in a RedHat Package (RPM) Encapsulate configuration for a package (e.g.. Myrinet) Manage dependencies –Never try to figure out if node software is consistent If you ever ask yourself this question, reinstall the node

Rocks Current State – Ver. 2.1 Now tracking Redhat 7.1 –2.4 Kernel –“Standard Tools” – PBS, MAUI, MPICH, GM, SSH, SSL, … –Could support other distros … don’t have staff for this. Designed to take “bare hardware” to cluster in a short period of time –Linux upgrades are often “forklift-style”. Rocks supports this as the default mode of admin Bootable CD –Kickstart file for Frontend created from Rocks webpage. –Use same CD to boot nodes. Automated integration “Legacy Unix config files” derived from mySQL database Re-installation (we have a single HTTP server, 100 Mbit) –One node: 10 Minutes –32 nodes: 13 Minutes –Use multiple HTTP servers + IP-balancing switches for scale

More Rocksisms Leverage widely-used (standard) software wherever possible –Everything is in RedHat Packages (RPM) –RedHat’s “kickstart” installation tool –SSH, Telnet (only during installation), Existing open source tools Write only the software that we need to write Focus on simplicity –Commodity components For example: x86 compute servers, Ethernet, Myrinet –Minimal For example: no additional diagnostic or proprietary networks Rocks is a collection point of software for people building clusters –It evolving to include cluster software and packaging from more than just SDSC and UCB –

Rocks-dist Integrate RedHat Packages from –Redhat (mirror) – base distribution + updates –Contrib directory –Locally produced packages –Local contrib (e.g. commerically bought code) –Packages from rocks.npaci.edu Produces a single updated distribution that resides on front-end –Is a RedHat Distribution with patches and updates applied Kickstart (RedHat) file is a text description of what’s on a node. Rocks automatically produces frontend and node files. Different Kickstart files and different distribution can co- exist on a front-end to add flexibility in configuring nodes.

insert-ethers Used to populate the “nodes” MySQL table Parses a file (e.g., /var/log/messages) for DHCPDISCOVER messages –Extracts MAC addr and, if not in table, adds MAC addr and hostname to table For every new entry: –Rebuilds /etc/hosts and /etc/dhcpd.conf –Reconfigures NIS –Restarts DHCP and PBS Hostname is – - - Configurable to change hostname –E.g., when adding new cabinets

Configuration Derived from Database mySQL DB makehosts /etc/hosts makedhcp /etc/dhcpd.conf pbs-config-sql pbs node list insert-ethers Node 0 Node 1 Node N Automated node discovery

Remote re-installation Shoot-node and eKV Rocks provides a simple method to remotely reinstall a node –CD/Floppy used to install the first time By default, hard power cycling will cause a node to reinstall itself. –Addressable PDUs can do this on generic hardware With no serial (or KVM) console, we are able to watch a node as installs (eKV), but … –Can’t see BIOS messages at boot up Syslog for all nodes sent to a log host (and to local disk) –Can look at what a node was complaining about before it went offline

Remotely starting reinstallation on two nodes Remote re-installation Shoot-node and eKV

Monitoring your cluster PBS has a GUI called xpsmon. Gives a nice graphical view of up/down state of nodes SNMP status –Use the extensive SNMP MIB defined by the Linux community to find out many things about a node Installed software Uptime Load Slow Ganglia (UCB) – IP Multicast-based monitoring system –20+ different health measures I think we’re still weak here – learning about other activities in this area (e.g. ngop, CERN activities, City Toolkit)

Cern Cern.ch/hep-proj-grid-fabric Installation tools : wwwinfo.cern.ch/pdp