Lecture 2 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Slides:



Advertisements
Similar presentations
Setting up Small Grid Testbed
Advertisements

Parallel ISDS Chris Hans 29 November 2004.
What is MCSR? Who is MCSR? What Does MCSR Do? Who Does MCSR Serve?
Job Submission Using PBSPro and Globus Job Commands.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Koç University High Performance Computing Labs Hattusas & Gordion.
TTU High Performance Computing User Training: Part 2 Srirangam Addepalli and David Chaffin, Ph.D. Advanced Session: Outline Cluster Architecture File System.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Using Clusters -User Perspective. Pre-cluster scenario So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
VIPBG LINUX CLUSTER By Helen Wang Sept. 10, 2014.
OSCAR Jeremy Enos OSCAR Annual Meeting January 10-11, 2002 Workload Management.
Tutorial on MPI Experimental Environment for ECE5610/CSC
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
WEST VIRGINIA UNIVERSITY HPC and Scientific Computing AN OVERVIEW OF HIGH PERFORMANCE COMPUTING RESOURCES AT WVU.
High Performance Computing
Information Technology Center Introduction to High Performance Computing at KFUPM.
Job Submission on WestGrid Feb on Access Grid.
Outline Introduction Image Registration High Performance Computing Desired Testing Methodology Reviewed Registration Methods Preliminary Results Future.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.
MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization.
Rocks cluster : a cluster oriented linux distribution or how to install a computer cluster in a day.
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.
Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Lecture 1 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
CCS Overview Rene Salmon Center for Computational Science.
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Introduction to Parallel Computing Presented by The Division of Information Technology Computer Support Services Department Research Support Group.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
GRID COMPUTING.
Specialized Computing Cluster An Introduction
PARADOX Cluster job management
INTRODUCTION TO VIPBG LINUX CLUSTER
HPC usage and software packages
OpenPBS – Distributed Workload Management System
BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.
Computational Physics (Lecture 17)
CommLab PC Cluster (Ubuntu OS version)
Compiling and Job Submission
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Introduction to High Performance Computing Using Sapelo2 at GACRC
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

Lecture 2 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of Computer Science and Engineering

Outline Clusters and SMP systems at USC CEC Ways of using High Performance Systems PBS Job Queuing System How to write Job file How to submit, delete, manage jobs submitted to Linux Cluster How to submit a large Number of Jobs

Systems: NICK Linux OS Hardware 76 Compute Nodes w/ dual 3.4 GHz XEON 2ML2, 4GB RAM 1 Master Node w/ dual 3.2 GHz 2ML2, 4GB RAM Topspin Infiniband Interconnect Storage: 1 terabyte network storage Software Rocks 4.3 CentOS Base, OpenMPI, OpenPBS/Torque Absoft Compilers Intel Compilers Bio Roll that includes the following bio-informatics packages: HMMER, NCBI BLAST, MpiBLAST, biopython, ClustalW, MrBayes, T_Coffee, Emboss, Phylip, fasta, Glimmer, and CPAN Intel Math Kernel Library TURBOMOLE VASP STAR-CD

Systems: Optimus Hardware 64 Nodes: Dual CPU, 2.0 GHz Dual-Core AMD Opterons, Totaling 256 Cores 8GB RAM 1 Terabyte of Storage in Headnode Gigabit Ethernet Interconnect Software ROCKS 5.1 OpenMPI OpenPBS Scheduler GNU Compilers

Systems: ZIA SGI Altix 4700 Shared-memory system Hardware 128 Itanium 1.6 GHz/ 8MB Cache 256 GB RAM 8TB storage NUMAlink Interconnect Fabric Software SUSE10 w/SGI PROPACK Intel C/C++ and Fortran Compilers VASP PBSPro scheduling software Message Passing Toolkit Intel Math Kernel Library GNU Scientific Library Boost library

Other Systems Nataku 8 Nodes: Dual CPU, 2.0 GHz Dual-Core AMD Opterons, Totaling 32 Cores 16 GB RAM in Headnode, 8GB RAM in compute nodes Chemical Engineering machine for Star-CD Jaws2 8 Compute Nodes w/ dual XEON 2.6 GHz, 2GB RAM Remaining parts of original Jaws cluster, currently being rebuilt 1 Terabyte attached storage Dr. Flora’s 12 CPU VASP Cluster Dr. Heyden’s MAC Cluster for VASP

Distributed Multiprocessor Cluster HD1 HD2HD3 Front End Node NFS

How can we utilize large high performance machines like these to speed up applications? Question

Ways of using Linux Clusters App. Type1: data1 data2 data3 dataK Regular program Collect results Each data set is computed in a function independently as a job and can be run independently on one CPU

Ways of using Linux Clusters App. Type2: compute1 compute2 compute3 Compute 4 Communication between processes result Parallel processes can be executed on multiple CPUs and can be summarized together in the main process data Parallel program

PBS System for Clusters PBS is a workload management system for Linux clusters It supplies commands for ◦ job submittion ◦ job monitoring (tracing) ◦ job deletion It consists of the following components: ◦ Job server (pbs_server)  provides the basic batch services  receiving/creating a batch job  modifying the job  protecting the job against system crashes  running the job

PBPBS System for Clusterssing ◦ Job Executor (pbs_mom)  receives a copy of the job from the job server  sets the job into execution  creates a new session as identical user  returns the job's output to the user. ◦ Job Scheduler (pbs_sched)  runs site's policy controlling which job is run and where and when it is run  PBS allows each site to create its own Scheduler  Currently Nick uses the Torque/Maui Scheduler

OpenPBS Batch Processing Maui communicates ◦ with Moms: monitoring the state of a system's resources ◦ with Server: retrieving information about the availability of jobs to execute

Steps needed to run your first production code Suppose your application experiments are: $myprog data $myprog data Steps to use PBS: 1. Create a job script for each running experiment  containing the PBS options to request the needed resources (i.e. number of processors, wall-clock time, etc.)  and user commands to prepare for execution of the executable (i.e. cd to working directory, etc.). 2. Submit the job script file to PBS queue qsub prog1.sh 3. Monitor the job

First example: job1.sh jobfile #!/bin/bash #PBS -N MyAppName #PBS -l nodes=1 #PBS -l walltime=00:01:00 #PBS -e /home/dgtest/dgtest0200/test.err #PBS -o /home/dgtest/dgtest0200/test.out #PBS -V Export PATH=$PATH:yourdir/bin; myprog data Where is your output file located? Where is the screen output?

Jobfile for use on ZIA #!/bin/s #PBS -N helloMPI #PBS -o hello.out #PBS -e hello.err #PBS -l select=1:ncpus=4 #PBS -l place=free:shared cd /home/ /test mpirun -np 4 /home/ /test/hello

PBS Options #PBS -N myJob ◦ Assigns a job name. The default is the name of PBS job script. #PBS -l nodes=4:ppn=2 ◦ The number of nodes and processors per node. #PBS -l walltime=01:00:00 ◦ The maximum wall-clock time during which this job can run. #PBS -o mypath/my.out ◦ The path and file name for standard output. #PBS -e mypath/my.err ◦ The path and file name for standard error. #PBS -j oe ◦ Join option that merges the standard error stream with the standard output stream

PBS Options #PBS -k oe ◦ Define which output of the batch job to retain on the execution host. #PBS -W stagein=file_list ◦ Copies the file onto the execution host before the job starts. #PBS -W stageout=file_list ◦ Copies the file from the execution host after the job completes. #PBS -r n ◦ Indicates that a job should not rerun if it fails. #PBS –V #PBS –V ◦ Exports all environment variables to the job.

Procedure Use command line ◦ Use editor to create an executable script:  vi myExample.sh  Use first example code ◦ Make myExample.sh executable:  chmod +x myExample.sh ◦ Test your script ./ myExample.sh Submit your script: ◦ qsub myExample.sh ◦ remember your job identifier  i.e

Monitor / Control a Job Check wether your job runs qstat qstat –a ◦ check status of jobs, queues, and the PBS server qstat –f ◦ get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc. qdel job.ID ◦ delete a job from the queue qhold job.ID ◦ hold a job if it is in the queue qrls job.ID ◦ release a job from hold

Exercise Problem: Given html pages, count the frequency of all words and report it as: keyword frequeny Keyword1 frequency1... Use PBS to submit 100 jobs

How to submit 100 jobs Typical ways: 1. read file list 2. for each file, create a job file, and submit it to the PBS queue Write a bash script, which submits a jobs for different datasets Write a perl script to submit jobs Write a C program to submit jobs

Quick psub Psub is a perl script that can wrap a command line program into a job file and submit to the cluster queue >psub jobname.sh “prog.pl –i=1” this will create a job file “jobname.sh” and submit to the server for running. No need to edit a job file anymore

Local Disk of Computing Node Normally, the computing node of clusters can directly read and write files on NFS storage space If your program has intense write-read operation, reading and writing to NFS directory will cause high traffics Solution: direct your output and input to local directories at computing nodes and after execution, copy the results file to NSF directory /temp, /tmp /state/partition1

Summary TypeI parallel computing application How PBS works in Linux Cluster Computers How to submit jobs to Linux clusters

Homework Programming Problem: Given a html page, count the frequency of all words and report it as: keyword frequeny Keyword1 frequency1... Use PBS to submit 100 jobs to count frequency for html pages in next Lab session.

Learn how to compile C programs on Linux Learn how to create PBS job file Learn how to submit jobs Learn how to submit multiple jobs Learn how to compile and run MPI program on NICK Homework