Yeti Operations INTRODUCTION AND DAY 1 SETTINGS. Rob Lane HPC Support Research Computing Services CUIT

Slides:



Advertisements
Similar presentations
Scheduling Introduction to Scheduling
Advertisements

S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Introduction to HPC Workshop October Introduction Rob Lane HPC Support Research Computing Services CUIT.
WELCOME TO THETOPPERSWAY.COM
Ask-a-Librarian LIVE: Live Reference Proposal for Rutgers University Libraries Shelby Anfenson December 10, 2002.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
Chapter 2 Processes and Threads Scheduling Classical Problems.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
HPC at IISER Pune Neet Deo System Administrator
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
CPU Scheduling Chapter 6 Chapter 6.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.
Ian Alderman A Little History…
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004.
Hotfoot HPC Cluster March 31, Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance.
Common Practices for Managing Small HPC Clusters Supercomputing 12
Introduction to the Grid N1™ Grid Engine 6 Software.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Yeti Operations Committee MARCH MEETING. Agenda 1.Usage Report 2.Downtime 3.Research Publications 4.Rentals 5.Free Tier 6.Next Purchase Round 7.Issues?
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,
Operating Systems Scheduling. Bursts of CPU usage alternate with periods of waiting for I/O. (a) A CPU-bound process. (b) An I/O-bound process. Scheduling.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
ITFN 2601 Introduction to Operating Systems Lecture 4 Scheduling.
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Process Control Management Prepared by: Dhason Operating Systems.
Oct. 6, 1999PHENIX Comp. Mtg.1 CC-J: Progress, Prospects and PBS Shin’ya Sawada (KEK) For CCJ-WG.
Virtual Cluster Computing in IHEPCloud Haibo Li, Yaodong Cheng, Jingyan Shi, Tao Cui Computer Center, IHEP HEPIX Spring 2016.
Introduction to HPC Workshop March 1 st, Introduction George Garrett & The HPC Support Team Research Computing Services CUIT.
Status of Grid & RPC-Tests Stand DAQ(PU) Sumit Saluja Programmer EHEP Group Deptt. of Physics Panjab University Chandigarh.
Lecture 5 Scheduling. Today CPSC Tyson Kendon Updates Assignment 1 Assignment 2 Concept Review Scheduling Processes Concepts Algorithms.
CFI 2004 UW A quick overview with lots of time for Q&A and exploration.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
OAR : a batch scheduler Grenoble University LIG (Mescal Team)
Advanced Computing Facility Introduction
What is HPC? High Performance Computing (HPC)
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Architecture & System Overview
CREAM-CE/HTCondor site
QoS in the Tier1 batch system(LSF)
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
Introduction to HPC Workshop
Lecture 23: Process Scheduling for Interactive Systems
How to make hundreds of servers work for you
Shared Research Computing Policy Advisory Committee (SRCPAC)
湖南大学-信息科学与工程学院-计算机与科学系
Lecture 21: Introduction to Process Scheduling
Processes and Threads Part III
CPU SCHEDULING.
CARLA Buenos Aires, Argentina - Sept , 2017
Building and running HPC apps in Windows Azure
Lecture 21: Introduction to Process Scheduling
HPC Operating Committee Spring 2019 Meeting
Presentation transcript:

Yeti Operations INTRODUCTION AND DAY 1 SETTINGS

Rob Lane HPC Support Research Computing Services CUIT

Topics 1.Yeti Operations Committee 2.Introduction to Yeti 3.Rules of Operation

1.Yeti Operations Committee Determines cluster policy In the process of being set up In the meantime we need a policy for day 1 of operations

2. Introduction to Yeti

Final Node Count Node TypeNumber of Nodes Standard (64 GB)38 Intermediate (128 GB)8 High Memory (256 GB)35 Infiniband16 GPU4 Total101

Meet Your New Neighbors Group afsisocp astropsych cclssscc eeengstats journxenon

Group Shares GroupShare %GroupShare % afsis2.12ocp10.60 astro6.36psych2.12 ccls19.43sscc19.08 eeeng2.12stats33.92 journ2.12xenon2.12

Other Groups Renters Free Tier CUIT

Rules of Operation 1.Job Priority 2.Job Characteristics 3.Queues 4.Guaranteed Access

Job Priority Every job waiting to run is assigned a priority by the scheduling software The priority determines the order of jobs waiting in the queue

Job Priority Components Group’s share vs. recent usage User’s recent usage Other factors

Recent Usage What does “recent” mean? It’s configurable Yeti’s setting: 7 Days

Job Characteristics Nodes and cores Time Memory

Job Queues (subject to change) QueueTime LimitMemory LimitMax. User Run Batch 112 hours4 GB512 Batch 212 hours16 GB128 Batch 35 days16 GB64 Batch 43 daysNone8 Interactive4 hoursNone4

Guaranteed Access New mechanism Subject to review by Yeti Operations Committee We’re going to try it out in the meantime

Guaranteed Access Groups have each been assigned systems Group jobs get priority access to their own systems “Guaranteed Access” means there will be a known maximum wait time before your job starts running

Guaranteed Access Example The group astro owns the node Brussels Only two types of jobs will be allowed on Brussels 1.Astro jobs 2.Short jobs

Job Queues (subject to change) QueueTime LimitMemory LimitMax. User Run Batch 112 hours4 GB512 Batch 212 hours16 GB128 Batch 35 days16 GB64 Batch 43 daysNone8 Interactive4 hoursNone4

Guaranteed Access Debate Good because researchers have guaranteed access rights to nodes Bad because long jobs lose access to many nodes

Thanks! Comments and Questions?