Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
Outline CAF Usage and Users’ grouping Disk monitoring Datasets CPU Fairshare monitoring User query Conclusions & Outlook
CERN Analysis Facility Cluster of 40 machines since two years 80 CPUs, 8 TB of disk pool 35 machines as PRO partition, 5 as DEV Head node is xrootd redirector and PROOF master Other nodes are xrootd data servers and PROOF slaves
Available resources in CAF must be fairly used Highest attention to how disks and CPUs are used Users are grouped At present, sub-detectors and physics working groups Users can belong to several groups (PWG has precedence over sub-detector) Each group has a disk space (quota) which is used to stage datasets from AliEn has a CPU fairshare target (priority) to regulate concurrent queries CAF Usage
CAF Groups Groups#UsersDisk quota (GB)CPU quota (%) PWG PWG PWG PWG PWG EMCAL110 HMPID110 ITS310 T0110 MUON310 PHOS110 TPC210 TOF110 ZDC110 proofteam testusers4010 marco COMMON Not absolute quotas 18 registered groups ~ 60 users 165 users have used CAF: please register to groups!
Resource Monitoring ML ApMon running on each node Sends monitoring information each minute Default monitoring (Load, CPU, memory, swap, disk I/O, network) Additional information: PROOF and disk servers status (xrootd/olbd) Number of PROOF sessions (proofd master) Number of queued staging requests and hosted files (DS manager)
Status Table
lxb6047: 310 lxb6048: 309 lxb6049: 308 lxb6050: 308 lxb6051: 308 lxb6052: 309 lxb6053: 309 lxb6054: 0 lxb6055: 309 lxb6056: 311 lxb6057: 307 lxb6058: 308 lxb6059: 309 lxb6060: 310 lxb6061: 311 lxb6062: 309 lxb6063: 309 lxb6064: 307 lxb6065: 308 lxb6066: 1089 lxb6067: 309 lxb6068: 311 lxb6069: 309 lxb6070: 313 lxb6071: 311 lxb6072: 309 lxb6073: 312 lxb6074: 312 lxb6075: 310 lxb6076: 311 lxb6077: 309 lxb6078: 307 lxb6079: 312 lxb6080: lxb6047: lxb6048: lxb6049: lxb6050: lxb6051: lxb6052: lxb6053: lxb6054: 0 lxb6055: lxb6056: lxb6057: lxb6058: lxb6059: lxb6060: lxb6061: lxb6062: lxb6063: lxb6064: lxb6065: lxb6066: lxb6067: lxb6068: lxb6069: lxb6070: lxb6071: lxb6072: lxb6073: lxb6074: lxb6075: lxb6076: lxb6077: lxb6078: lxb6079: lxb6080: Hosted files and Disk Usage #Raw files: 11k #Sim files: 54k Raw on disk:154GB Sim on disk:4.5TB #Raw files: 11k #Sim files: 54k Raw on disk:154GB Sim on disk:4.5TB Number of FilesDisk Pool usage (Kb) Raw dataSim dataRaw dataSim data ESDs from RAW data production ready to be staged
Datasets (DS) are used to stage files from AliEn A DS is a list of files (usually ESDs or archives) registered by users for processing with PROOF DSs may share same physical files Staging script issues new staging requests and touch files every 5 mins Files are uniformly distributed by the xrootd data manager Interaction with the GRID
Dataset Manager The DS manager takes care of the quotas at file level Physical location of files is regulated by xrootd The DS manager daemon sends: The overall number of files Number of new, touched, disappeared, corrupted files Staging requests Disk utilization for each user and for each group Number of files on each node and total size
Dataset Monitoring - PWG1 is using 0% of 1TB - PWG3 is using 5% of 1TB - PWG1 is using 0% of 1TB - PWG3 is using 5% of 1TB
Datasets List /COMMON/COMMON/ESD5000_part | 1000 | /esdTree | | 50 GB | 100 % /COMMON/COMMON/ESD5000_small | 100 | /esdTree | | 4 GB | 100 % /COMMON/COMMON/run15034_PbPb | 967 | /esdTree | 939 | 500 GB | 97 % /COMMON/COMMON/run15035_PbPb | 962 | /esdTree | 952 | 505 GB | 98 % /COMMON/COMMON/run15036_PbPb | 961 | /esdTree | 957 | 505 GB | 99 % /COMMON/COMMON/run82XX_part1 | | /esdTree | | 289 GB | 99 % /COMMON/COMMON/run82XX_part2 | | /esdTree | | 289 GB | 92 % /COMMON/COMMON/run82XX_part3 | | /esdTree | | 288 GB | 94 % /COMMON/COMMON/sim_160000_esd | 95 | /esdTree | 9400 | 267 MB | 98 % /PWG0/COMMON/run30000X_10TeV_0.5T | 2167 | /esdTree | | 90 GB | 100 % /PWG0/COMMON/run31000X_0.9TeV_0.5T | 2162 | /esdTree | | 57 GB | 100 % /PWG0/COMMON/run32000X_10TeV_0.5T_Phojet | 2191 | /esdTree | | 83 GB | 100 % /PWG0/COMMON/run33000X_10TeV_0T | 2191 | /esdTree | | 108 GB | 100 % /PWG0/COMMON/run34000X_0.9TeV_0T | 2175 | /esdTree | | 65 GB | 100 % /PWG0/COMMON/run35000X_10TeV_0T_Phojet | 2190 | /esdTree | | 98 GB | 100 % /PWG0/phristov/kPhojet_k5kG_10000 | 100 | /esdTree | 1100 | 4 GB | 11 % /PWG0/phristov/kPhojet_k5kG_900 | 97 | /esdTree | 2000 | 4 GB | 20 % /PWG0/phristov/kPythia6_k5kG_10000 | 99 | /esdTree | 1600 | 4 GB | 16 % /PWG0/phristov/kPythia6_k5kG_900 | 99 | /esdTree | 1100 | 4 GB | 11 % /PWG2/COMMON/run82XX_test4 | 10 | /esdTree | 1000 | 297 MB | 100 % /PWG2/COMMON/run82XX_test5 | 10 | /esdTree | 1000 | 297 MB | 100 % /PWG2/akisiel/LHC500C0005 | 100 | /esdTree | 97 | 663 MB | 100 % /PWG2/akisiel/LHC500C2030 | 996 | /esdTree | 995 | 4 GB | 99 % /PWG2/belikov/40825 | 1355 | /HLTesdTree | | 143 GB | 99 % /PWG2/hricaud/LHC07f_160033DataSet | 915 | /esdTree | | 2 GB | 99 % /PWG2/hricaud/LHC07f_160038_root_archiveDataSet| 862 | /esdTree | | 449 GB | 100 % /PWG2/jgrosseo/sim_1600XX_esd | | /esdTree | | 103 GB | 98 % /PWG2/mvala/PDC07_pp_0_9_82xx_1 | 99 | /rsnMVTree | | 1 GB | 100 % /PWG2/mvala/RSNMV_PDC06_14TeV | 677 | /rsnMVTree | | 24 GB | 100 % /PWG2/mvala/RSNMV_PDC07_09_part1 | 326 | /rsnMVTree | | 5 GB | 100 % /PWG2/mvala/RSNMV_PDC07_09_part1_new | 326 | /rsnMVTree | | 5 GB | 100 % /PWG2/pganoti/FirstPhys900Field_ | 1088 | /esdTree | | 28 GB | 100 % /PWG3/arnaldi/PDC07_LHC07g_ | 615 | /HLTesdTree | | 787 MB | 94 % /PWG3/arnaldi/PDC07_LHC07g_ | 594 | /HLTesdTree | | 744 MB | 95 % /PWG3/arnaldi/PDC07_LHC07g_ | 366 | /HLTesdTree | | 513 MB | 99 % /PWG3/arnaldi/PDC07_LHC07g_ | 251 | /HLTesdTree | | 333 MB | 100 % /PWG3/arnaldi/PDC08_170167_001 | 1 | N/A | 33 MB | 0 % /PWG3/arnaldi/PDC08_LHC08t_ | 976 | /HLTesdTree | | 4 GB | 99 % /PWG3/arnaldi/PDC08_LHC08t_ | 990 | /HLTesdTree | | 4 GB | 100 % /PWG3/arnaldi/PDC08_LHC08t_ | 975 | /HLTesdTree | | 8 GB | 87 % /PWG3/arnaldi/myDataSet | 975 | /HLTesdTree | | 8 GB | 87 % /PWG4/anju/myDataSet | 946 | /esdTree | | 27 GB | 99 % /PWG4/arian/jetjet15-50 | 9817 | /esdTree | | 630 GB | 99 % /PWG4/arian/jetjetAbove_50 | 94 | /esdTree | 8000 | 7 GB | 85 % /PWG4/arian/jetjetAbove_50_real | 958 | /esdTree | | 73 GB | 94 % /PWG4/elopez/jetjet15-50_28000x | 7732 | /esdTree | | 60 GB | 95 % /PWG4/elopez/jetjet50_r27000x | 8411 | /esdTree | | 92 GB | 94 % Jury produced Pt specturm plots staging his own DS (run #40825, TPC+ITS, field on) Start staging common DSs of reconstructed runs? Jury produced Pt specturm plots staging his own DS (run #40825, TPC+ITS, field on) Start staging common DSs of reconstructed runs? ~ 4.7GB used out of 6GB (34 * 200MB - 10%)
Usages retrieved each 5 mins, averaged each 6 hours Compute new priorities applying a correction formula in [ *quota.. *quota] 100% f(x) = q + q*exp(kx) k = 1/q*Ln(1/4) 10% 40% quota (q) priorityMin priorityMax 0% 20% CPU Fairshare α = 0.5, β = 2 usage
Priorities are used for CPU fairshare and converge to quotas Usages are averaged to gracefully converge to quotas If no competition, users get max CPUs Only relative priorities are modified! Priority Monitoring
CPU quotas in practice - only PWGs + default groups - default usually has the highest usage - only PWGs + default groups - default usually has the highest usage
Query Monitoring When a user query completes, PROOF master sends statistics: Read bytes Consumed CPU time (base for CPU fairshare) Number of processed events User waiting time Values are aggregated per user and group
accumulated per interval Query Monitoring
Outlook User sessions monitoring in average 4-7 sessions in parallel (daily hours, EU time), with peek of users during the tutorial sessions: running history missing need to monitor #workers per user when load-based scheduling will be introduced Additional monitoring per single query (disk used and Files/sec not implemented yet) Network traffic correlation among nodes Xrootd activity with the new bulk staging requests Debug Tool to monitor and kill a hanging session when Reset doesn’t work (need to restart the cluster) Hardware New ALICE MAC cluster “ready” (16 workers) New IT 8-core machines coming Training PROOF/CAF is the key setup for interactive user analysis (and more) Number of people attending the monthly tutorial is increasing (20 persons last week!)