National Computational Science OSCAR Jeremy Enos Systems Engineer NCSA Cluster Group www.ncsa.uiuc.edu June 30, 2002 Cambridge, MA.

Slides:



Advertisements
Similar presentations
Cluster Computing Overview. What is a Cluster? A cluster is a collection of connected, independent computers that work together to solve a problem.
Advertisements

I learned about bioinformatics. Day 1 Introduction to Bioinformatics. video of DNA and diseases people can have.
Mobile Computing Dorota Huizinga Department of Computer Science.
Technical Specification / Schedule Department of Computer Science and Engineering Michigan State University Spring 2007 Team : CSE 498, Collaborative Design.
Data analysis Use of computers. Computers  Increasing power  Decreasing costs  Miniaturization.
Improvements in the Spatial and Temporal representation of the Model Owen Woodberry Bachelor of Computer Science, Honours.
HYDROLOGY FOR ENGINEERING MANAGEMENT MSc
Creating Clusters in a Virtual Environment Purpose: To create a development environment with limited hardware resources that allows the testing of parallel.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… John L. Mugler Stephen L. Scott Oak Ridge National Laboratory.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
ActEyes IP Cam Training Part 1: Camera Setup. Network Setup Overview Each IP Cam on the network will need its own IP address to be assigned. This address.
Streaming Twitter. Install pycurl library Use a lab computer From the course website Download the links from pycurl and twitter streamer Extract site-packages.zip,
Installation and Integration of Virtual Clusters onto Pragma Grid NAIST Nara, Japan Kevin Lam 06/28/13.
Slide 1 Auburn University Computer Science and Software Engineering Scientific Computing in Computer Science and Software Engineering Kai H. Chang Professor.
1 Jack Dongarra University of Tennesseehttp://
CIS 460 – Network Design Seminar Network Security Scanner Tool GFI LANguard.
MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.
Tuesday 19 November 2014 Classroom8, 19:00~21:00 meeting The Doxygen Documentation System Kalogiannis A. Grigorios Electrical and Computer.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… Stephen L. Scott Oak Ridge National Laboratory Computer.
Chapter 2 Computer Clusters Lecture 2.2 Computer Cluster Architectures.
SIMPLE DOES NOT MEAN SLOW: PERFORMANCE BY WHAT MEASURE? 1 Customer experience & profit drive growth First flight: June, minute turn at the gate.
The Information Value Chain A New Perspective on the Knowledge Society Daniel Stanton 17 February 2007 Seminar sponsored by the Pinnacle Group and National.
MGC Admin Training. 2Polycom Company Overview | June 2008 Agenda MGC_50 Overview MGC_50 Software Installation  MGC Manager Software Installation  First.
computer
O.S.C.A.R. Cluster Installation. O.S.C.A.R O.S.C.A.R. Open Source Cluster Application Resource Latest Version: 2.2 ( March, 2003 )
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Hidden Markov Toolkit (HTK) Installation Fang-Hui Chu Department of Computer Science & Information Engineering National Taiwan Normal University.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
National Computational Science Alliance Clusters, Cluster in a Box Rob Pennington Acting Associate Director Computing and Communications Division NCSA.
Copying Music From a CD Margaret S. Britt. Loading Media Player  Click Start  Select the Windows Media Player.
Ebon Upton Ebon Upton is a 26 year old man that founded the Raspberry Pi Foundation. In his pursuit of computer science education he attended… St John’s.
HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL.
New Mexico State University Information Engineering Technology Bachelor of Science Engineering Technology.
Programming in Hadoop Guangda HU Huayang GUO
GET CONNECTED Information Technology Career Cluster.
IDC HPC User Forum April 14 th, 2008 A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering
Free, online, technical courses Take a free online course. Microsoft Virtual Academy.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Supercomputing in Plain English What is a Cluster? Blue Waters Undergraduate Petascale Education Program May 23 – June
Installing the Galaxie Driver The 335 DAD Galaxie driver is installed in two steps: 1.The 335 DAD driver files are copied to the Galaxie driver directory.
McGraw-Hill/Irwin The I-Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Windows XP Network Services Chapter 8 - Objectives.
By:miguel iturrade.  A computer network is a group of computers that are connected to each other for the purpose of communication.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Hopper The next step in High Performance Computing at Auburn University February 16, 2016.
BUILDING A HIGHLY AVAILABLE HYPER-V CLUSTER USING FREE MICROSOFT HYPER-V SERVER: PRACTICE Max Kolomyeytsev StarWind Virtual SAN Product Manager Alex Karavanov.
Steps For Quick Formatting a Dell Computer. Steps-1.
Creating Clusters in a Virtual Environment
Virtualization in Grid Rock
Administration Tools Cluster.exe is a command line tool that you can use for scripting or remote administration through slow WAN links. Cluadmin.exe is.
Computational Molecular Modelling
Event Log Cluster service includes event data in the Windows 2000 system log.
Cluster Disks and Cluster File Storage
Lab A: Installing and Configuring the Network Load Balancing Driver
Course: Introduction to Digital Technology
Socrative Question #1 Which of the following would be the person most likely to be installing new network routers for a business or organization? Computer.
2018 Dell EMC E Study Guide Killtest
FutureGrid: a Grid Testbed
Network Coding Testbed
Lego Mindstorm Robot Educator Tutorials
Department of Computer Science University of York
© University of Cambridge
Grid computing.
TeraScale Supernova Initiative
شیوه‌های ارائه‌ مطالب علمی و فنی نیمسال اول سال تحصیلی ۸۷/۸۸
کتابهای تازه خریداری شده دروس عمومی 1397
User interaction and workflow management in Grid enabled e-VLBI experiments Dominik Stokłosa Poznań Supercomputing and Networking Center, Supercomputing.
کتابهای خریداری شده دروس عمومی 1397
USN Introduction Computer Engineering Sejin Oh.
Select Intel International Science and Engineering Fair.
Presentation transcript:

National Computational Science OSCAR Jeremy Enos Systems Engineer NCSA Cluster Group June 30, 2002 Cambridge, MA

National Computational Science OSCAR – An Overview Open Source Cluster Application Resources Cluster on a CD – automates cluster install process IBM, Intel, NCSA, ORNL, MSC Software, Dell, IU NCSA “Cluster in a BOX” base Wizard driven Nodes are built over network OSCAR <= 64 node clusters for initial target Works on PC commodity components RedHat based (for now) Components: Open source and BSD style license

National Computational Science Why OSCAR? NCSA wanted “Cluster-in-a-Box” Distribution –NCSA’s “X-in-a-Box” projects could lie on top –X = Grid, Display Wall, Access Grid Easier, faster deployment Consistency among clusters Lowers entry barrier to cluster computing –no more “Jeremy-in-a-Box” Other organizations had the same interest –Intel, ORNL, Dell, IBM, etc. –Open Cluster Group (OCG) formed –OSCAR is first OCG “working group” –NCSA jumps on board to contribute to OSCAR

National Computational Science OSCAR USAGE TOP500 Poll Results What Cluster system(Distribution) do you use? Other 24% Oscar 23% Score 15% Scyld 12% MSC.Linux 12% NPACI Rocks 8% SCE 6% 233 votes (Feb. 01, 2002)

National Computational Science OSCAR Basics What does it do? –OSCAR is a cluster software packaging utility –Automatically configures software components –Reduces time to build a cluster –Reduces need for expertise –Reduces chance of incorrect software configuration –Increases consistency from one cluster to the next What will it do in the future? –Maintain cluster information database –Work as an interface not just for installation, but also for maintenance –Accelerate software package integration into clusters

National Computational Science Components OSCAR includes (currently): –C3 – Cluster Management Tools (ORNL) –SIS – Network OS Installer (IBM) –MPI-CH – Message Passing Interface –LAM – Message Passing Interface (Indiana University) –OpenSSH/OpenSSL – Secure Transactions –PBS – Job Queuing System –PVM – Parallel Virtual Machine (ORNL) –Ganglia – Cluster Monitor Current Prerequisites: –Networked PC hardware with disk drives –Server machine with Redhat installed –Redhat CDs (for rpms) –1 head node + N compute nodes

National Computational Science OSCAR Basics How does it work? version 1.0, 1.1 –LUI = Linux Utility for cluster Install –Network boots nodes via PXE or floppy –Nodes install themselves from rpms over NFS from the server –Post installation configuration of nodes and server executes version 1.2, 1.3, + –SIS = System Installation Suite –System Imager + LUI = SIS –Creates image of node filesystem locally on server –Network boots nodes via PXE or floppy –Nodes synchronize themselves with server via rsync –Post installation configuration of nodes and server executes

National Computational Science Install RedHat Download OSCAR Print/Read document Copy RPMS to server Run wizard (install_cluster) –Build image per client type (partition layout, HD type) –Define clients (network info, image binding) –Setup networking (collect MAC addresses, configure DHCP, build boot floppy) –Boot clients / build –Complete setup (post install) –Run test suite Use cluster Installation Overview

National Computational Science OSCAR 1.x Step by Step Log on to server as root mkdir –p /tftpboot/rpm copy all RedHat rpms from CDs to /tftpboot/rpm download OSCAR tarball tar –zxvf oscar-1.x.tar.gz cd oscar-1.x./install_cluster

National Computational Science OSCAR 1.x Step by Step After untarring, run the install_cluster script…

National Computational Science OSCAR 1.x Step by Step After starting services, dumps you into GUI wizard

National Computational Science OSCAR 1.x Step by Step Step 1: Prepare Server Select your default MPI (message passing interface) MPICH (Argonne National Labs) LAM (Indiana University)

National Computational Science OSCAR 1.x Step by Step Step 2: Build OSCAR Client Image Build image with default or custom rpm lists and disk table layouts. Image build progress displayed

National Computational Science OSCAR 1.x Step by Step Step 2: Build OSCAR Client Image Image build complete.

National Computational Science OSCAR 1.x Step by Step Step 3: Define OSCAR clients Associate image(s) with network settings.

National Computational Science OSCAR 1.x Step by Step Step 4: Setup Networking Collect MAC addresses and configure DHCP

National Computational Science OSCAR 1.x Step by Step Intermediate Step: Network boot client nodes If the nodes are PXE capable, select the NIC as the boot device. Don’t make this a static change, however. Otherwise, just use the autoinstall floppy disk. It is less convenient than PXE, but a reliable failsafe.

National Computational Science OSCAR 1.x Step by Step Intermediate Step: Boot Nodes Floppy or PXE (if available)

National Computational Science OSCAR 1.x Step by Step Step 5: Complete Cluster Setup Output displayed in terminal window.

National Computational Science OSCAR 1.x Step by Step Step 6: Run Test Suite

National Computational Science Questions and Discussion ? Next up… OSCAR 2.

National Computational Science OSCAR 2.0 November, 2002

National Computational Science Timeline OSCAR invented –First development meeting in Portland, OR, USA –September, 2000 OSCAR 1.0 released –February, 2001 –Real users and real feedback –OSCAR 2 design discussion begins OSCAR 1.1 released –July, 2001 –RedHat 7.1 support –Tidy install process / fix pot-holes OSCAR 1.2 released –February, 2002 –SIS integrated OSCAR 1.3 released –July, 2002 –Add/Remove node support –Ganglia OSCAR 2.0 –November, 2002

National Computational Science OSCAR 2 Major Changes - Summary –No longer bound to OS installer –Components are package based, modular –Core set of components mandatory –API established and published for new packages –Package creation open to community –Database maintained for node and package information –Add/Remove Node process will be improved –Improved wizard –Scalability enhancements –Security Options –Auto-update functionality for OS –Support more distributions and architectures –New Features

National Computational Science OSCAR 2 – Install Options Without OS Installer –Installs on existing workstations w/o re-installing OS –Long list of prerequisites –Unsupported (at least initially) With OS Installer –OSCAR has hooks to integrate nicely with installer –System Installation Suite –RedHat Installer –___?___ Installer

National Computational Science OSCAR 2 - MODEL OSCAR 2 (The Glue)

National Computational Science OSCAR 2 - MODEL C3SSH Core Components OSCAR 2 (The Glue)

National Computational Science OSCAR 2 - MODEL C3SSH MAUISIS MPICHLAM PVMPVFS Grid in a boxVMI Wall in a boxGiganet MyrinetFirewall/NAT MonitoringX Cluster Tools OSCAR 2 (The Glue) Core Components

National Computational Science OSCAR 2 – API Package creation open to community Core set of mandatory packages Each package must have the following: –server.rpmlist –client.rpmlist –RPMS (dir) –scripts (dir) Server software is in package form –enables distribution of server services ODR – OSCAR Data Repository –Node information –Package information –SQL Database or Flat File –Readable by nodes via API calls

National Computational Science OSCAR 2 – Wizard Webmin based? – Perl/Tk based? –current wizard is Perl/Tk Possible Interface: 3 Install buttons –Simple –one click install –tested and supported –Standard –typical combos presented –tested and supported –Expert –every option presented –any configuration combination

National Computational Science OSCAR – Scalability Enhancements LUI –Merging with System Imager (System Installation Suite) –Scalability to improve to at least 128 nodes PBS –Home directory spooling (nfs instead of RSH) –Open file descriptor limit –Max server connections –Job basename length –Polling intervals Maui –Job attributes are limited to N nodes SSH –Non privileged ports (parallel SSH tasks) –User based keys Single Head Node model trashed –Distribution of server services

National Computational Science OSCAR 2 – Security Options Wizard based –Security options selected in wizard installer Potential security schemes –All Open –Nodes isolated to private subnet –Cluster firewall / NAT –Independent packet filtering per node Security is a package, like any other software Probably will use “pfilter”

National Computational Science OSCAR 2 – Distribution and Architecture Support Distribution support goals –Redhat, Debian, SuSE, Mandrake, Turbo –Only when we’re satisfied with Redhat OSCAR –Mandrake to include OSCAR within distro? Architectures –IA32, IA64, Alpha?

National Computational Science OSCAR 2 – New Features High speed interconnect support –Myrinet –Others to come ATLAS, Intel MKL? Maui Scheduler LAM/MPI Monitoring –CluMon (work in progress) –Performance Co-Pilot (PCP) –See –Ganglia

National Computational Science CluMon

National Computational Science Considerations beyond OSCAR 2 Diskless node support (lots of interest) –new OCG (Open Cluster Group) working group Compatibility with other cluster packaging tools! –NPACI Rocks, SCE, Scyld, etc. –Standardized API –Cluster Package “XYZ” can interface with Rocks, OSCAR, etc. PVFS –Still testing NFS3 Cluster of virtual machines (VMware, etc) –variable host operating systems (Windows, etc.) –multiple machine images –imagine where it could take us!

National Computational Science OSCAR Development Path version 1.0 –Redhat 6.2 based –Nodes built by LUI (IBM) –Proof of concept (prototype) –Many steps, sensitive to bad input –Flexibility was intention; identify user needs version 1.1 –Redhat 7.1 based –Nodes built by LUI –More automation for homogenous clusters –SSH: user keys instead of host keys –Scalability enhancements (ssh, PBS) –Latest software versions

National Computational Science OSCAR Development Path (cont.) version 1.2 –moved development to SourceForge –LUI replaced by SIS –Redhat 7.1 based –Packages adjust to SIS based model –Latest software versions (C3 tools, PBS, MPICH, LAM) –Start releasing monthly version 1.21 –bug fixes –version 1.21rh72 (Redhat 7.2 version) version 1.3 –Add/Delete node support implemented –Security configuration implemented, but off by default –ia64 support –Ganglia included –Redhat 7.1, 7.2

National Computational Science OSCAR Development Path (cont.) version 1.4 –Grouping support (nodes) –Package selection? –Core packages read/write configuration to database –SSH, C3, SIS, Wizard –Package API published –modular package support –Security enabled by default –Auto-update implemented? version 1.5 –Formalize use of cluster database API –Package configuration support?

National Computational Science OSCAR Development Path (cont.) version 1.6 (2.0 beta?) –single head node model expires –head node holds OSCAR database –distribute server services –packages can designate their own head node (e.g. PBS) –package writing opened to community –the modularity advantage –“open packages” and “certified packages” –commercial packages can now be offered –licensing issues disappear –compatibility with other packagers (hopefully)

National Computational Science For Information Open Cluster Group Page – Project Page – –Download –Mailing lists –FAQ Questions?

National Computational Science OSCAR Jeremy Enos OSCAR Annual Meeting January 10-11, 2002 Workload Management

National Computational Science Topics Current Batch System – OpenPBS How it Works, Job Flow OpenPBS Pros/Cons Schedulers Enhancement Options Future Considerations Future Plans for OSCAR

National Computational Science OpenPBS PBS = Portable Batch System Components –Server – single instance –Scheduler – single instance –Mom – runs on compute nodes –Client commands – run anywhere –qsub –qstat –qdel –xpbsmon –pbsnodes (-a)

National Computational Science OpenPBS - How it Works User submits job with “qsub” Execution host (mom) must launch all other processes –mpirun –ssh/rsh/dsh –pbsdsh Output –spooled on execution host (or in user’s home dir) –moved back to user node (rcp/scp)

National Computational Science OpenPBS – Job Flow User Node (runs qsub) Compute Nodes Scheduler (tells server what to run) Server (queues job) Execution host (mother superior) Job outputrcp/scp

National Computational Science OpenPBS – Monitor (xpbsmon)

National Computational Science OpenPBS - Schedulers Stock Scheduler –Pluggable –Basic, FIFO Maui –Plugs into PBS –Sophisticated algorithms –Reservations –Open Source –Supported –Redistributable

National Computational Science OpenPBS – in OSCAR2 1.List of available machines 2.Select PBS for queuing system 1.Select one node for server 2.Select one node for scheduler 1.Select scheduler 3.Select nodes for compute nodes 4.Select configuration scheme –staggered mom –process launcher (mpirun, dsh, pbsdsh, etc)

National Computational Science OpenPBS – On the Scale Pros Open Source Large user base Portable Best option available Modular scheduler Cons License issues 1 year+ devel lag Scalability limitations –number of hosts –number of jobs –monitor (xpbsmon) Steep learning curve Node failure intolerance

National Computational Science OpenPBS – Enhancement Options qsub wrapper scripts/java apps –easier for users –allows for more control of bad user input 3 rd party tools, wrappers, monitors Scalability source patches “Staggered moms” model –large cluster scaling Maui Silver model –“Cluster of clusters” diminishes scaling requirements –never attempted yet

National Computational Science Future Considerations for OSCAR Replace OpenPBS –with what? when? –large clusters are still using PBS Negotiate better licensing with Veridian –would allow us to use a later revision of OpenPBS Continue incorporating enhancements –test Maui Silver, staggered mom, etc. –3 rd party extras, monitoring package

National Computational Science Using PBS Popular PBS commands –qsub:submits job –qstat:returns queue status –qdel:deletes a job in the queue –pbsnodes:lists or changes node status –pbsdsh:just used in scripts- a parallel launcher qsub: Not necessarily intuitive –accepts it’s own arguments –accepts only scripts, NOT executables –scripts can’t have arguments either –runs tasks ONLY on a single mom (mother superior) –3 methods of using qsub

National Computational Science Using PBS, qsub Method 1: Type every option per command use qsub and all options to launch a script for each executable qsub –N jobname –e error.out –o output.out –q queuename\ -l nodes=X:ppn=Y:resourceZ,walltime=NN:NN script.sh script.sh #!/bin/sh echo Launchnode is $hostname pbsdsh /my_path/my_executable #done Most flexible

National Computational Science Using PBS, qsub Method 2: Type only varying options per command use qsub and dynamic options to launch a script for each executable qsub -l nodes=X:ppn=Y:resourceZ,walltime=NN:NN script.sh script.sh #!/bin/sh #PBS –N jobname #PBS –o output.out #PBS –e error.out #PBS –q queuename echo Launchnode is $hostname pbsdsh /my_path/my_executable #done Medium flexibility

National Computational Science Using PBS, qsub Method 3: Type fixed arguments in a command, but no need to create a script each time use qsub wrapper and fixed arguments to generate a script for each executable submitjob nodes ppn walltime queue resource jobname “executable +arg1 +arg2” “submitjob” is an arbitrary script that wraps qsub –strips fixed arguments off of command line –what’s left is intended PBS command “executable arg1 arg2” –passes that in environment to qsub, which submits helper script: qlaunch –qlaunch runs on mother superior (first node) and launches actual PBS command intended

National Computational Science Using PBS: qsub, Method 3 (simplified example) submitjob #!/bin/sh # export nodes=$1 export ppn=$2 export walltime=$3 export queue=$4 export resource=$5 export jobname=$6 export outfile=$7 export procs=`expr $nodes \* $ppn` shift export PBS_COMMAND=$* qsub -l walltime=$walltime,nodes=$nodes:ppn=$ppn:$resource\ -N $jobname \ -A $LOGNAME \ -q \ -o $outfile \ -e $outfile.err \ -V \ /usr/local/bin/qlaunch qlaunch #!/bin/sh launchname=`/bin/hostname` echo "Launch node is $launchname" echo PBS_COMMAND is $PBS_COMMAND echo cmd_dir=`pwd` cmd_file=$cmd_dir/.$PBS_JOBID.cmd # Create the shell script to run the MPI program and use pbsdsh to execute it cat > $cmd_file <<EOF #!/bin/sh cd $cmd_dir $PBS_COMMAND EOF chmod u+x $cmd_file pbsdsh $cmd_file rm -rf $cmd_file || echo "Failed to remove temporary command file $cmd_file."

National Computational Science An FAQueue How do I create a queue? qmgr –c “create queue QUEUENAME” qmgr -c “set queue QUEUENAME PARAM = VALUE“ qmgr –c “list queue QUEUENAME” man qmgr (for more information) How do I associate nodes with a queue? You don’t. Think of a queue as a 3 dimensional box* that a job must fit in to be allowed to proceed. The three dimensions are: “nodes X procs X walltime” *Could technically be more than 3 dimensions How do I target specific nodes then? Specify a resource on the qsub command. The resource names are defined in /usr/spool/PBS/server_priv/nodes. They are arbitrary strings.

National Computational Science Tips to get started Check out the C3 commands –cexec, cpush very useful –ls /opt/c3*/bin(see all the C3 commands) Check out PBS commands –ls /usr/local/pbs/bin Check out the Maui scheduler commands –ls /usr/local/maui/bin Join the mailing lists! – –Send feedback

National Computational Science Questions and Discussion ?