Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.

Slides:



Advertisements
Similar presentations
COMPUTERS: TOOLS FOR AN INFORMATION AGE Chapter 3 Operating Systems.
Advertisements

Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.
Operating System.
1 Dynamic DNS. 2 Module - Dynamic DNS ♦ Overview The domain names and IP addresses of hosts and the devices may change for many reasons. This module focuses.
Intel® Manager for Lustre* Lustre Installation & Configuration
Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
Computer Parts Assignment
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Leveraging WinPE and Linux Preboot for Effective Provisioning Jonathan Richey | Director of Development | Altiris, Inc.
Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul , Korea {heaven, psiver,
Operating Systems: Software in the Background
MCITP: Microsoft Windows Vista Desktop Support - Enterprise Section 1: Prepare to Deploy.
Connecting with Computer Science, 2e
Installation. Overview  Download files to make media or another bootable configuration.  Prepare system for installation.  Boot the computer and run.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
DCS PowerEdge C Systems Management Overview PowerEdge C Product Group.
April WebEx Intel ® Active Management Technology (AMT) LANDesk Provisioning LANDesk Server Manager.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Introduction to Computers Personal Computing 10. What is a computer? Electronic device Performs instructions in a program Performs four functions –Accepts.
Hands-On Microsoft Windows Server 2008
Configuration of Linux Terminal Server Group: LNS10A6 Thebe Laxmi, Sharma Prabhakar, Patrick Appiah.
Chapter 7 Microsoft Windows XP. Windows XP Versions XP Home XP Home XP Professional XP Professional XP Professional 64-Bit XP Professional 64-Bit XP Media.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Connecting with Computer Science, 2e Chapter 9 Operating Systems.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Scott Drucker, Systems Engineer Migrating to Microsoft Vista with WinINSTALL.
IPMI 2.0 Overview SOL-Serial redirection over Lan Management of servers and systems in a remote environment over LAN connections Allow IT managers to manage.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
Super Micro IPMI 1.5 Solution
WINDOWS SERVER 2003 Genetic Computer School Lesson 12 Fault Tolerance.
CSC190 Introduction to Computing Operating Systems and Utility Programs.
Lecture on Central Process Unit (CPU)
BABCA Software Operating Systems (OS) aka Systems Software A set of instructions that coordinate all the activities among computer hardware resources.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Operating Systems. Categories of Software System Software –Operating Systems (OS) –Language Translators –Utility Programs Application Software.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004.
Running clusters on a Shoestring Fermilab SC 2007.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
This courseware is copyrighted © 2016 gtslearning. No part of this courseware or any training material supplied by gtslearning International Limited to.
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
VIRTUAL NETWORK COMPUTING SUBMITTED BY:- Ankur Yadav Ashish Solanki Charu Swaroop Harsha Jain.
© 2002, Cisco Systems, Inc. All rights reserved..
Future Console Servers devproj project #31. Overview ● Requirements / motivation ● Current approach ● Possible future options – KVM over IP – IPMI – Serial.
Copyright © 2003 by Prentice Hall 1 Computers: Tools for an Information Age Chapter 3 Operating Systems: Software in the Background BSM025 Computers.
BY: SALMAN 1.
For Wolverhampton Linux User Group By Adam Sweet
Cheltenham Courseware
Understanding and Improving Server Performance
Chapter Objectives In this chapter, you will learn:
BY: SALMAN.
Operating System Review
Operating System.
Heterogeneous Computation Team HybriLIT
HP ArcSight ESM 6.8c HA Fail Over Illustrated
Diskless Remote Boot Linux
Oracle Solaris Zones Study Purpose Only
TYPES OFF OPERATING SYSTEM
Deploy OpenStack with Ubuntu Autopilot
Operating System Review
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
NCSA Supercluster Administration
Introduction to Operating Systems
Introduction to Operating Systems
Presentation transcript:

Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007

Poor man’s cluster tool-kit rgang – run commands on worker nodes IPMI – Intelligent Platform Management PXE boot – cluster installation

rgang rgang allows one to execute commands on or distribute files to many nodes. rgang forks separate rsh or ssh children, which execute in parallel. After successfully waiting on returns from each child or after timing out, rgang displays the sorted node responses. To allow scaling to kilo clusters, rgang can utilize a tree-structure, via an nway switch. When so invoked, rgang uses rsh or ssh to spawn copies of itself on multiple nodes. These copies in turn spawn additional copies.

rgang rgang took 78 seconds to transfer a 50MB file from the head-node to 600 worker nodes using nway=2 option. head-node 50MB file nway = 2

IPMI IPMI is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. The Baseboard Management Controller (BMC) is the brain behind platform management and its primary purpose is to handle the autonomous sensor monitoring and event logging features. ipmitool provides a simple command-line in- band and out-of-band interface to the BMC.

IPMI OS Running OS not loaded, OS booting, OS unresponsive In-band Server Management Out-of-band Server Management BMC Ethernet Local interface SSIF, KCS, SMBus Power cycle/on/off CPU temperature System temperature CPU/Chassis Fan speed Sensor Event Logs We power on, off or reset a thousand worker nodes in parallel in seconds.

PXE boot PXE booting is one of the many ways you can boot a computer entirely from the network without any form of storage on the destination computer (RAM aside). PXE booting allows us to mass install an OS on the worker nodes. The worker node OS images are stored on the local hard disk on a PXE-TFTP-BOOTP server.

PXE boot head-node root image boot image DOS image Node IP Response PXE-BOOTP Request TFTP Request tomsrtbt and kernel Response clusterInstallScript partitions local disk, remote copies images and unzips them into disk partitions, configures network and reboots the node. It takes 14 minutes to install 80 AMD Opteron 2.4 GHz nodes, using an Intel Xeon 3.2 GHz PXE-BOOTP-TFTP server, over a fast Ethernet network.