Download presentation
Presentation is loading. Please wait.
Published byShannon Holt Modified over 9 years ago
1
Yeti Operations INTRODUCTION AND DAY 1 SETTINGS
2
Rob Lane HPC Support Research Computing Services CUIT hpc-support@columbia.edu
3
Topics 1.Yeti Operations Committee 2.Introduction to Yeti 3.Rules of Operation
4
1.Yeti Operations Committee Determines cluster policy In the process of being set up In the meantime we need a policy for day 1 of operations
5
2. Introduction to Yeti
6
Final Node Count Node TypeNumber of Nodes Standard (64 GB)38 Intermediate (128 GB)8 High Memory (256 GB)35 Infiniband16 GPU4 Total101
8
Meet Your New Neighbors Group afsisocp astropsych cclssscc eeengstats journxenon
9
Group Shares GroupShare %GroupShare % afsis2.12ocp10.60 astro6.36psych2.12 ccls19.43sscc19.08 eeeng2.12stats33.92 journ2.12xenon2.12
10
Other Groups Renters Free Tier CUIT
11
Rules of Operation 1.Job Priority 2.Job Characteristics 3.Queues 4.Guaranteed Access
12
Job Priority Every job waiting to run is assigned a priority by the scheduling software The priority determines the order of jobs waiting in the queue
13
Job Priority Components Group’s share vs. recent usage User’s recent usage Other factors
14
Recent Usage What does “recent” mean? It’s configurable Yeti’s setting: 7 Days
15
Job Characteristics Nodes and cores Time Memory
16
Job Queues (subject to change) QueueTime LimitMemory LimitMax. User Run Batch 112 hours4 GB512 Batch 212 hours16 GB128 Batch 35 days16 GB64 Batch 43 daysNone8 Interactive4 hoursNone4
17
Guaranteed Access New mechanism Subject to review by Yeti Operations Committee We’re going to try it out in the meantime
18
Guaranteed Access Groups have each been assigned systems Group jobs get priority access to their own systems “Guaranteed Access” means there will be a known maximum wait time before your job starts running
19
Guaranteed Access Example The group astro owns the node Brussels Only two types of jobs will be allowed on Brussels 1.Astro jobs 2.Short jobs
20
Job Queues (subject to change) QueueTime LimitMemory LimitMax. User Run Batch 112 hours4 GB512 Batch 212 hours16 GB128 Batch 35 days16 GB64 Batch 43 daysNone8 Interactive4 hoursNone4
21
Guaranteed Access Debate Good because researchers have guaranteed access rights to nodes Bad because long jobs lose access to many nodes
22
Thanks! Comments and Questions? hpc-support@columbia.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.