Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999.

Slides:



Advertisements
Similar presentations
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.
Chapter 1.2 Operating Systems. Layered Operating System model Hardware Operating System Application.
Patient Charges in Meditech 6.1
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 16 Scheduling II.
Introduction to Unix – CS 21 Lecture 10. Lecture Overview Midterm questions Jobs and processes description The foreground and background Controlling jobs.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Manage Run Activities Cognos 8 BI. Objectives  At the end of this course, you should be able to:  manage current, upcoming and past activities  manage.
Operating Systems (CSCI2413) Lecture 4 Process Scheduling phones off (please)
USC’s Internet-based Time and Attendance Management System.
CS 497C – Introduction to UNIX Lecture 27: - The Process Chin-Chih Chang
Using ITAMS as a Supervisor or ITAMS Approver Login to ITAMS as usual, at: Enter your User Identification Number (Same as your.
Common Services in a network Server : provide services Type of Services (= type of servers) –file servers –print servers –application servers –domain servers.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
UNIX Processes. The UNIX Process A process is an instance of a program in execution. Created by another parent process as its child. One process can be.
 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.
Zellescher Weg 12 Trefftz-Building – HRSK/151 Phone Guido Juckeland Center for Information Services.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Batch Systems In a number of scientific computing environments, multiple users must share a compute resource: –research clusters –supercomputing centers.
Process Control. Module 11 Process Control ♦ Introduction ► A process is a running occurrence of a program, including all variables and other conditions.
Operating Systems Process Management.
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
Linux & Shell Scripting Small Group Lecture 3 How to Learn to Code Workshop group/ Erin.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.
M. Schott (CERN) Page 1 CERN Group Tutorials CAT Tier-3 Tutorial October 2009.
October 8, 2002P. Nilsson, SPD General Meeting1 Paul Nilsson, SPD General Meeting, Oct. 8, 2002 New tools and software updates Test beam analysis Software.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Using Map-reduce to Support MPMD Peng
UNICOS. When it comes to solving real-world problems, leading-edge hardware is only part of the solution. A complete solution also requires a powerful.
Linux+ Guide to Linux Certification Chapter Eleven Managing Linux Processes.
Upcoming Presentations ILM Professional Service – Proprietary and Confidential ( DateTimeTopicPresenter March PM Distributed.
Lab 3 + Using the Terminal 1. "Under Linux there are GUIs (graphical user interfaces). where you can point and click and drag, and hopefully get work.
Getting Started on Emerald Research Computing Group.
Portal Update Plan Ashok Adiga (512)
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.
CS 390 Unix Programming Environment
CSC Multiprocessor Programming, Spring, 2012 Chapter 8 – Applying Thread Pools Dr. Dale E. Parson, week 10.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Introduction to Hartree Centre Resources: IBM iDataPlex Cluster and Training Workstations Rob Allan Scientific Computing Department STFC Daresbury Laboratory.
University of Illinois at Urbana-Champaign Using the NCSA Supercluster for Cactus NT Cluster Group Computing and Communications Division NCSA Mike Showerman.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
ATLAS Computing Wenjing Wu outline Local accounts Tier3 resources Tier2 resources.
Computational chemistry packages (efficient usage issues?) Jemmy Hu SHARCNET HPC Consultant Summer School June 3, 2016 /work/jemmyhu/ss2016/chemistry/
Gridengine Configuration review ● Gridengine overview ● Our current setup ● The scheduler ● Scheduling policies ● Stats from the clusters.
Nortel Contact Center: An Overview
ICS143a 2017 Programming Assignment
Chapter 13 Processes.
How to Schedule Usage Reports in EBSCOadmin
Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.
OpenPBS – Distributed Workload Management System
CPU SCHEDULING.
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Practical aspects of multi-core job submission at CERN
Expense Report Training
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
Lecture 23: Process Scheduling for Interactive Systems
Advanced Computing Facility Introduction
Nachos Assignment#2 Priority Scheduling.
COT 4600 Operating Systems Spring 2011
Uniprocessor scheduling
Chapter 3: Process Management
Presentation transcript:

Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999

29 April Computing Environment SGI Origin 2000, 42 CPUs, 10Gb RAM Mix of interactive and batch jobs 2 CPUs for interactive activity 40 CPUs used by batch jobs Batch jobs managed by LSF (Platform)

How is the system being used?

29 April Monthly System Utilization (CPU days) Monthly System Utilization (CPU days) Theoretical max

29 April Average wait time in queue (hours) Average wait time in queue (hours) Started using load thresholds Need to balance parallel jobs

29 April System usage by job type

29 April Some thoughts on usage Scalar use is predominant (so far) We are starting to push the system Jobs are waiting too long in the queue Need to modify queue policies –Lower runtime limits –Checkpoint/restart –Limit on number of jobs submitted

Using LSF

29 April Job queues Parallel queue  par –High priority –Slot-based: up to 32 processors –Jobs are never suspended Sequential queue  nic –Low priority –threshold-based: up to 95% system utilization –Jobs can be preempted by parallel jobs

29 April Job queues II Two special queues –npseq For sequential jobs that do not wish to be preempted Very low priority Only 4 slots available –special Jobs that need to run longer than system limit Only 1 slot available Must be approved by committee

29 April Fairshare system Jobs are scheduled according to priorities Priorities are dynamic and based on –Number of shares –Past usage (currently 2 weeks of history) –Type of job (parallel jobs higher priority) Resource availability also important

29 April Getting started Complete LSF documentation online Man pages also available Add one line to your login files source /usr/local/lsf/cshrc.lsf ( C shell ) or. /usr/local/lsf/profile.lsf ( Bourne shell )

29 April Submitting jobs % bsub [options] pgm args -q name Which queue to use -n num How many processors -o file Output file Queue defaults to ‘nic’. If no output file is given, results are mailed to you.

29 April Watching jobs % bjobs [options] -l All the details -p Only pending jobs (and why) -a All jobs (even finished ones) -uall All the jobs in the system jobid Just the job with this id

29 April Manipulating jobs % bkill jobid Kills the job (can also send signal) % bstop jobid Suspends the job (even if not running) % bresume jobid Resumes the job

29 April Getting usage statistics We keep monthly stats in our web page For current information % bacct [opts] Total usage for your jobs. Can specify dates and jobs % priorities (or bhpart -r ) Lists all the priorities for different groups

29 April Monitoring load on the system % bqueues Shows queues and how loaded they are % lsload Quick glance at the load on the system Also GUI tools ( xlsbatch, xlsmon ) Please use sparingly as they add to interactive load on the system.

29 April Contact Information Visit our home page Questions and comments