NCSA Supercluster Administration

Slides:



Advertisements
Similar presentations
This course is designed for system managers/administrators to better understand the SAAZ Desktop and Server Management components Students will learn.
Advertisements

IBM Software Group ® Integrated Server and Virtual Storage Management an IT Optimization Infrastructure Solution from IBM Small and Medium Business Software.
Chapter 20 Oracle Secure Backup.
Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
Lesson 18 – INSTALLING AND SETTING UP WINDOWS 2000 SERVER.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Exchange server Mail system Four components Mail user agent (MUA) to read and compose mail Mail transport agent (MTA) route messages Delivery agent.
MCITP: Microsoft Windows Vista Desktop Support - Enterprise Section 1: Prepare to Deploy.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
 Contents 1.Introduction about operating system. 2. What is 32 bit and 64 bit operating system. 3. File systems. 4. Minimum requirement for Windows 7.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
16.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 16: Examining Software Update.
12/04/98HEPNT - Windows NT Days1 NT Cluster & MS Dfs Gunter Trowitzsch & DESY WindowsNT Group.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
Chapter 1 Chapter 1: Networking with Microsoft Windows 2000 Server.
WINDOWS XP PROFESSIONAL Bilal Munir Mughal Chapter-1 1.
ZENworks for Servers Presenter Name Title Directory-Enabled Management Novell is a leader in Directory-Enabled Workstation management, and we are now.
Introduction to Windows XP Professional Chapter 2 powered by dj.
Chapter 7 Microsoft Windows XP. Windows XP Versions XP Home XP Home XP Professional XP Professional XP Professional 64-Bit XP Professional 64-Bit XP Media.
©Kwan Sai Kit, All Rights Reserved Windows Small Business Server 2003 Features.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
Module 1: Installing Microsoft Windows XP Professional.
A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 13 Understanding and Installing Windows 2000 and Windows NT.
Week #3 Objectives Partition Disks in Windows® 7 Manage Disk Volumes Maintain Disks in Windows 7 Install and Configure Device Drivers.
Introduction to Microsoft Windows 2000 Welcome to Chapter 1 Windows 2000 Server.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
(WINDOWS PLATFORM - ITI310 – S15)
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
Microsoft Windows XP Professional MCSE Exam
Compaq Availability Manager Installation, Configuration, Setup and Usage Barry Kierstein.
Install, configure and test ICT Networks
University of Illinois at Urbana-Champaign Using the NCSA Supercluster for Cactus NT Cluster Group Computing and Communications Division NCSA Mike Showerman.
Automating Installations by Using the Microsoft Windows 2000 Setup Manager Create setup scripts simply and easily. Create and modify answer files and UDFs.
Running clusters on a Shoestring Fermilab SC 2007.
Chapter 5 Server Installation NT Server Requirements NT Server File Systems Installation.
Virtual Server Server Self Service Center (S3C) JI July.
Planning Server Deployments Chapter 1. Server Deployment When planning a server deployment for a large enterprise network, the operating system edition.
6/14/20161 System Administration 1-Introduction to System Administration.
Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.
Installing Windows 7 Lesson 2.
Compute and Storage For the Farm at Jlab
DIT314 ~ Client Operating System & Administration
Chapter Objectives In this chapter, you will learn:
Create setup scripts simply and easily.
Introduction to Operating Systems
2016 Citrix presentation.
Chapter 1: Introduction
Chapter 2: System Structures
NGS Oracle Service.
Introduction to Operating System (OS)
CRESCO Project: Salvatore Raia
Solutions: Backup & Restore
Design Unit 26 Design a small or home office network
Chapter 1: Networking with Microsoft Windows 2000 Server
IT Infrastructure: Software
Support for ”interactive batch”
Windows Server Administration Fundamentals
Presentation transcript:

NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant apant@ncsa.uiuc.edu

System Goals Provide a production level of service Integrate the system into current environment Apply current supercomputer policies and procedures Account management Resource usage / allocation Provide conveniences to the users Develop an environment where users can prepare and run their own codes effectively This requires advanced automated administration Provide feedback to users Job status to users via email Provide common applications Get an account, get your data, and run

NCSA NT 320 Pentium® CPU Cluster 256 CPUs - Parallel MPI 64 HP Kayak XU systems Dual 550 MHz Pentium III Xeon 1 GB RAM Dual 300MHz Pentium II 512 MB memory 64 CPUs - Serial 32 Compaq PWS 6000 Dual 333 MHz Pentium II 512 MB memory 64 Dual 550 MHz Pentium III Xeon HP Kayaks back-to-back

System Configuration Software Networking HPVM from Chien’s CSAG group Microsoft NT 4.0 Server LSF from Platform for queuing system MPI HPVM from Chien’s CSAG group Networking Myrinet MPI communication Fast Ethernet Used for network file systems Fibre Channel Storage Networks Giganet Testing environment

Alliance NT Supercluster, July 1999 Front-End Systems ntsc-tsN.ncsa.uiuc.edu File servers LSF master 128 GB Home 200 GB Scratch Fast Ethernet FTP to Mass Storage Daily backups Internet LSF Batch Job Scheduler Apps development Job submission 128 Compute Nodes, 256 CPUs Serial Nodes Myrinet Interconnect and HPVM Fast Ethernet No MPI 64 Dual 550 MHz Systems 64 Dual 300 MHz Systems 24 Dual 333

Accessing the Cluster Windows RDP Client from Microsoft Windows Terminal Server Interactive Nodes Multiuser form of Windows NT Surprisingly good performance Access Methods Windows RDP Client from Microsoft Windows clients only Citrix ICA client Available for most platforms http://www.citrix .com to download the clients A java applet client is available X Windows Rsh daemon to start sessions

Windows NT on a Web Page

System Setup System imaging Initial setup from a network enabled boot floppy Clears system and clones system using Drive Image Professional Uses image file on network file server Manually set hostname/IP in configuration files Reboot and let it retrieve NT image, change Security ID, and configure Small Non-volatile DOS partition Boots from this during subsequent imaging Stores configuration information Runs batch scripts from server every boot All systems can be updated from calling a single script Scripts on the server contain the re-imaging commands < 20 minutes to convert to a new configuration on all systems Simplifies Administration Systems are identical, adverse behavior usually hardware related Add new systems or repair a broken system quickly

Updating software Radical changes through re-imaging Prepare single system Set configuration scripts to run at next boot Boot to DOS and upload image to server Incremental upgrades Scripted using batch Registry files are merged using RCMD RCMD is a resource kit Remote command tool Most common upgrade is LSF OS and MPI do not change often

NT Cluster Monitoring •Scalable Reconfigurable grid Works well over modem Highlights troubled systems Deviation from expected load can be viewed Shows in real time: systems status current load load by user name Load by jobid All running/pending jobs

Node Administration CRUN Scripts RCMD Faster than using LSF’s lsrun Runs scripts sequentially for ranges of machines Used for Rebooting, updating files … Coupled with other tools like Tlist (like ps) and kill Can be used to find processes on hosts RCMD Provides interactive access to compute nodes Useful in manual process management Faster than using LSF’s lsrun

Process Administration Simply start and stop jobs Not so simple Queuing system software may not be fault tolerant Only some of the processes launch Not all of the processes get terminated Shepherding Makes decisions about jobs and processes Can kill jobs if processes do not start or quit Can kill processes if jobs finish Coupled with process tracking software to find orphans Uses semi-intelligent Shepherd Agent Also provides interface for global administration

Account Administration Integrates into our current systems Account creation/deletion occurs in our allocation division Uses command line utilities to manage accounts Password management can be handled through this system System Usage Accounting Custom daemon created Simple, dedicated CPU / Memory accounting Actual process CPU usage is not relevant due to our MPI Processes always use 100% of the CPU Number of process and time info collected by LSF Existing Accounting infrastructure used

Storage Administration Storage systems Storage Central Disk advisor by W Quinn For monitoring file system usage Quota software No quota software currently in use Our scratch system is Windows 2000 and has quota Software available Quotas will be enforced when we switch to W2K Home directories are on Windows NT 4.0 Security Home space is readable by the user only Upon request, administrators can gain access Scratch Space file access is maintained by the user

Scalability Issues Queuing system Monitoring tools LSF is currently working at a scale unexpected a few years ago Where will difficulties arise? Batch system falls behind more often when system size grows Related to the speed and reliability of the network Platform Computing LSF has adapted in the past Monitoring tools Many command line tools are already impractical Visualization methods need to be researched GLMon may not be effective for more that 1000 nodes Detailed monitoring effects system scalability

Future Directions Better integration with the mass storage system High performance shared file systems Improved reliability and process management Advanced user support Advancements in interconnects Better scaling Better performance