Download presentation
Presentation is loading. Please wait.
1
QoS in the Tier1 batch system(LSF)
Alessandro Italiano (INFN-CNAF) Tier1 - Farming Group
2
QoS definition From WikiPedia: Quality of service is the ability to provide different priority to different applications and users in resource usage. QoS mechanisms are not required if there is not resource contention
3
Tier1 scenario More than 20 different Experiments(Application)
Each Experiment has several computing activities with different priorities Each year the Tier1 committee defines the highest amount of resources that each Experiment can use
4
From LSF Documentation
FairShare definition From LSF Documentation Fairshare scheduling divides the processing power of the LSF cluster among users to provide fair access to resources, so that no user or subgroup of users can monopolize the resources of the cluster
5
Hierarchical FairShare
a first level of QoS Define dynamic priorities for every group/subgroup Dynamically grants a resource quota to each group/subgroup Used only where there is resource contention Optimized resource usage
6
Hierarchical Fairshare: Parameters
Share assigned Resource percentage assigned to every group e subgroup Resource usage Time Window time slot used to compute the total amount of resources used by every group Normalization factors Dynamic priority formula: Share DP = (ResourceUsage x Nf) + 1
7
Hierachical Fairshare: How it works
cms rel share = 15 abs share = 4.5 cmsprd rel share = 70 abs share = 21 cmssgm alice abs share = 4.05 alicesgm rel share = 85 abs share = 22.95 Available Resources CMS share = 30 ALICE share = 27 SHARE_INFO_FOR: SLC4_GLOBAL/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIMEgroup_test group_admin group_dteam group_egee group_ops group_magic group_ams group_ingv group_theophys group_biomed group_t1bio group_cdfcaf group_infngrid group_pamela group_lhcb u_cms group_babar u_atlas group_alice group_argo group_virgo
8
Hierarchical Fairshare: constraint
In case of no intra-VO resources contention, one user could use all the resources available to his experiment. In this way all the others users, also those belong to high priority group, could wait for a long time before to run a job
9
LSF SLA Second level of QoS
LSF Service Level Agreement are batch system functionalities which can provide different service level goals oriented. There are four goals available: Deadline: complete a specified #jobs in a time window Velocity: maintain #jobs running in a time window. Used for short jobs Throughput: complete #jobs per hour. Used for medium and long jobs Combination of different goals
10
the specific SLA to each user or subgroup
LSF SLA: Constraint You can’t configure a specific queue or user subgroup to use a SLA, because SLAs can be only invoked at submission time. To avoid this limitation the batch manager can easily provide an automatic hook in order to grant the specific SLA to each user or subgroup
11
A detail which can improve QoS:
One queue for each Application in order to customized execution environment and make easier the administration of application requirements Run time resources limits Dedicate computing resources Use specific computing architectures Queue administrator Scheduling parameters Pre and post execution script ……
12
How GRID can match the right service class ?
LSF QoS Role: cms QoS: Low Priority Role: cmsprd QoS: Medium Priority Role: cmssgm QoS: High Priority
13
Matching service class: Statically
GRID LSF QoS Role: cms QoS: Low Priority lcmaps configuration file "/VO=cms/GROUP=/cms/ROLE=lcgadmin" cmssgm"/VO=cms/GROUP=/cms/ROLE=production" .cmsprd"/VO=cms/GROUP=/cms/HeavyIons/" .cms Role: cmsprd QoS: Medium Priority Role: cmssgm QoS: High Priority
14
Matching service class: Dynamically
GRID LSF QoS Role: cms QoS: Low Priority GPBox Role: cmsprd QoS: Medium Priority Role: cmssgm QoS: High Priority
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.