Running clusters on a Shoestring Fermilab SC 2007.

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

COMPUTERS: TOOLS FOR AN INFORMATION AGE Chapter 3 Operating Systems.
Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.
Operating System.
P3- Represent how data flows around a computer system
Intel® Manager for Lustre* Lustre Installation & Configuration
Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
Discovering Computers Fundamentals, Third Edition CGS 1000 Introduction to Computers and Technology Fall 2006.
Computer Parts Assignment
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul , Korea {heaven, psiver,
Exploring the UNIX File System and File Security
MCT260-Operating Systems I Operating Systems I Introduction to Operating Systems.
Operating Systems: Software in the Background
Wednesday, June 07, 2006 “Unix is user friendly … it’s just picky about it’s friends”. - Anonymous.
MCITP: Microsoft Windows Vista Desktop Support - Enterprise Section 1: Prepare to Deploy.
Spring 2007Introduction to OS1 IT 3423: Operating System Concepts and Administration Instructor: Wayne (Weizheng) Zhou
Connecting with Computer Science, 2e
Installation. Overview  Download files to make media or another bootable configuration.  Prepare system for installation.  Boot the computer and run.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
April WebEx Intel ® Active Management Technology (AMT) LANDesk Provisioning LANDesk Server Manager.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Operating systems.
Hands-On Microsoft Windows Server 2008
Chapter 7 Microsoft Windows XP. Windows XP Versions XP Home XP Home XP Professional XP Professional XP Professional 64-Bit XP Professional 64-Bit XP Media.
ITE 1 Chapter 5. Chapter 5 is a Large Chapter It has a great deal of useful information about operating systems. You will find this VERY helpful when.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Operating Systems  A collection of programs that  Coordinates computer usage among users  Manages computer resources  Handle Common Tasks.
Connecting with Computer Science, 2e Chapter 9 Operating Systems.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Scott Drucker, Systems Engineer Migrating to Microsoft Vista with WinINSTALL.
IPMI 2.0 Overview SOL-Serial redirection over Lan Management of servers and systems in a remote environment over LAN connections Allow IT managers to manage.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
Connecting with Computer Science 2 Objectives Learn what an operating system is Become familiar with the different types of operating systems Identify.
Super Micro IPMI 1.5 Solution
WINDOWS SERVER 2003 Genetic Computer School Lesson 12 Fault Tolerance.
CSC190 Introduction to Computing Operating Systems and Utility Programs.
1 3 Computing System Fundamentals 3.3 Computer Systems.
BABCA Software Operating Systems (OS) aka Systems Software A set of instructions that coordinate all the activities among computer hardware resources.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004.
Third International Workshop on Networked Appliance 2001 SONA: Applying Mobile Agent to Networked Appliance Control S.Aoki, S.Makino, T.Okoshi J.Nakazawa.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.
Future Console Servers devproj project #31. Overview ● Requirements / motivation ● Current approach ● Possible future options – KVM over IP – IPMI – Serial.
Copyright © 2003 by Prentice Hall 1 Computers: Tools for an Information Age Chapter 3 Operating Systems: Software in the Background BSM025 Computers.
BY: SALMAN 1.
For Wolverhampton Linux User Group By Adam Sweet
Chapter Objectives In this chapter, you will learn:
BY: SALMAN.
Operating System Review
CS101 Booting A Computer.
UBUNTU INSTALLATION
Oracle Solaris Zones Study Purpose Only
TYPES OFF OPERATING SYSTEM
Deploy OpenStack with Ubuntu Autopilot
Operating System Review
NCSA Supercluster Administration
OPS235: Lab 2 Virtual Machines – Part I
Introduction to Operating Systems
Introduction to Operating Systems
Presentation transcript:

Running clusters on a Shoestring Fermilab SC 2007

Poor man’s cluster tool-kit PXE boot – cluster installation rgang – run commands on worker nodes IPMI – Intelligent Platform Management

installationPXE boot PXE booting is a way to boot a computer entirely from the network without any form of storage on the destination computer (RAM aside). PXE booting allows us to mass install an OS on the worker nodes. The worker node OS images are stored on the local hard disk on a BOOTP-TFTP server.

installationPXE boot bootp-tftp server root image boot image DOS image BOOTP-TFTP Request Response tomsrtbt and kernel Response It takes 14 minutes to install 80 Opteron 270 nodes, using an Intel Xeon BOOTP-TFTP server, over a 100Mbps network. clusterInstallScript partitions local disk, remote copies images and unzips them into disk partitions, configures network and reboots the node.

deploymentrgang rgang allows one to execute commands on or distribute files to many nodes. rgang forks separate rsh or ssh children, which execute in parallel. After successfully waiting on returns from each child or after timing out, rgang displays the sorted node responses. To scale to kilo clusters, rgang can utilize a tree- structure, via a nway switch. When so invoked, rgang uses rsh or ssh to spawn copies of itself on multiple nodes. These copies in turn spawn additional copies.

deploymentrgang rgang took 350 seconds to transfer a 1GB file from the head-node to 600 nodes using nway=2 option over 100Mbps network which is 13Gbps of transfer rate for the entire cluster head-node 1GB file nway = 2

managementIPMI IPMI is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. The Baseboard Management Controller (BMC) is the brain behind platform management and its primary purpose is to handle the autonomous sensor monitoring and event logging features. ipmitool provides a simple command-line in-band and out-of-band interface to the BMC.

management IPMI OS Running OS not loaded, OS booting, OS unresponsive In-band Server Management Out-of-band Server Management BMC Ethernet Local interface SSIF, KCS, SMBus Power cycle/on/off CPU temperature System temperature CPU/Chassis Fan speed Sensor Event Logs We power on, off or reset a thousand+ worker nodes in seconds.