Operations in 2012 and plans for the LS1 ALICE T1/T2 workshop CCIN2P3 Lyon 4 June 2013
Raw data in 2012/2013 1.65PB 8 periods in 2012 (LHC12a to LHC12h) 6 periods in 2013 (LHC13a to LHC13f) 2 Copies @CERN (T1) and a distributed copy @6 T1s) 7.5PB since start of LHC
RAW data Last year – p+p, with p+Pb for 45 days in 2013 Selective triggers (very little minimum bias), calorimeter-enhanced triggers, high multiplicity... Reflects the HI data taking of 2011 and provides reference data Replication principles for RAW – unchanged since the beginning of times – one copy @T0, one distributed copy @T1s p-Pb data also fully replicated to KISTI (test of tapes) Thanks to the excellent network, the RAW data replication is quasi-online This simplifies significantly the process during Pb+Pb data taking
RAW data transfer 340 TB of RAW: 20% of total 2012+2013 6 sub-periods, divided by triggering conditions Pb+Pb periods p+Pb period
Processing strategy 2012/2013 The standard RAW processing chain was strengthened with additional offline calibration Two calibration passes (CPass0/CPass1) + manual calibration + validation pass (on 10% of statistics) Elaborated by PWG-P(hysics)P(erformance) together with the detector experts Goal – provide high quality QA and calibration before a production pass The entire 2012/2013 RAW data was processed using this procedure
Processing chain example Does not include MC, all chained and automagic
Application software 2012/2013 Software releases policy Weekly Revisions – include bug fixes as well as well as code improvements >60 AliRoot revisions, used for all data processing steps During ‘special’ (p-A was one of these) periods, 2x revisions weekly Analysis software release policy 2x weekly AN tags – used for individual user and organized analysis
More on analysis Organized analysis Individual user analysis Analysis trains per Physics Working Group Most of these run 2-3 times weekly (with AliRoot AN tag) Tested and quite efficient (more later) Individual user analysis Using AN tags with additional code, compiled on the fly Steered by ‘analysis plugin’, contained in AliRoot Local test mode before submission on the Grid Sometimes things do not go as expected
Storage load
Storage load (2) The (read) load on storage is induced primarily by analysis tasks, all kinds The read/write ratio is at 9/1 The same numbers in 2011 are Read/write: 6/1, data read: 110PB (2.4x less) Scales (almost) with the number of analysis jobs
Storage availability Last year SE availability (read test) Non-weighted (by SE size) average = 83% 21 of 65 (32) are > 95% 6 of 65 (9%) are < 50%
File replication policy Since middle 2013 – new ESD/AOD replication policy 2x ESD replicas, 2x AOD replicas (both RAW and MC) Primary goal – reduce disk space usage (and it worked) Replica reduction of ‘older’ productions is ongoing In addition some ‘really old’ and unused in daily work production cycles are reduced to single replica The 2x replica model works within the following conditions SEs are up and running, otherwise the ‘loss’ in performance can be significant The analysis is mostly done on AODs
Summary of Grid use 2012/13 60% simulation 10% organized analysis 10% RAW data reco, 20% individual user analysis 465 individual users
Job efficiency Defined as CPU/Wall time of the process (batch) i.e. if your job runs for 1 hour elapsed time and uses 30 minutes of CPU, the efficiency is 50% One of the most-watched metrics in the Grid world Easy to calculate, understandable by all Just one of the metrics to judge what is the ‘usefulness’ of the processing We have to make sure the numbers are high >85% for T0/T1s, >80% for T2s
One week of site efficiency User jobs in the system Tue Wed Thu Fri Sat Sun Mon Clearly visible ‘weekday effects working hours’ pattern Average efficiency = 84%
One week aliprod efficiency aliprod = MC + AOD filtering + QA No brainer – the efficiency is high Contributing to the overall efficiency at @60% Nothing to gain here, 100% efficiency is not possible by definition
One week alidaq efficiency alidaq = RAW data processing + AOD filtering + QA Not too bad, contributing to the overall efficiency @10%
One week alitrain efficiency alitrain = LEGO trains Very respectable, low efficiency (mostly) when few jobs (tails of merging etc..) – small contribution to the overall efficiency Contributes to the overall efficiency @10% level
One week individual user efficiencies The ‘carpet’, 180 users, average = 26%, contribution to total @20% level
User job profile
Efficiency gains The largest efficiency gain can be achieved through improvements in individual user jobs Despite high complexity, the LEGO train efficiency is already high Moving ½ of the individual user jobs to LEGO would result in ~7% increase in overall efficiency This is the primary ‘push’ in the analysis community Then there is moving from ESDs to AODs Slowly gaining momentum
… looking at the user jobs Top 10…. Bottom 10…names removed to protect the innocent
Top user analysis job The efficiency is high and equal at (almost) all sites This is generally what we observe for all jobs – the exceptions will be covered in Costin’s presentation
Low efficiency user job analysis The efficiency does not show a clear site dependency There is weak dependency on number of input files (job overhead) Again as for the ‘top’ efficiency, if the efficiency is low, it is low everywhere
Medium efficiency user job analysis Data intensive job… Efficiency depends on the input rate To achieve > 95% efficiency, this type of job requires ~3MB/sec/core throughput For an average 24-core node, this translates to 72MB/sec/node Not reasonable to expect the centres to support such rates This job clearly belongs in a train…
Summary on analysis The proportion of analysis on the Grid continues to grow (2x increase in a year), but has to flatten at some point There is a steady consolidation of analysis tasks in the framework of LEGO trains The efficiency of the analysis is still to be improved ‘TTree cache’ already made the organized analysis highly efficient The ‘high throughput demand’ analysis must be put in trains (some optimization of SE infrastructure at the sites can help too) In general – the analysis on the Grid works quite well
Summary of resources 2012/2013 78 active sites 32K jobs average - 62% increase over the 2011 In line with pledged resources
Summary of resources 2012/2013 computing in 520 days 280Mio hours => 11Mio days => 22K years => 22 centuries computing in 520 days
Who provided what 49% T0+T1s, 51% T2 The T1/T2 50/50 ratio is remarkably constant!
Jobs 97 Mio jobs => 2.2 completed jobs/sec
Catalogue stats 735 Mio LFNs 2.7 Billion entries Still going strong! Major cleanups … but constant growth
What is done 110 RAW data production cycles, 3.6 billion events 160 MC production cycles, 1 billion events Different types (p+p, Pb+Pb, p+Pb), various generator types, various triggers MCs are coupled to RAW data production cycles 110 RAW data production cycles, 3.6 billion events Including Pass1/2/3 and detector-specific processing ~700 analysis trains Per PWG, new trains and scheduled running of old trains included
Almost at the end…
Processing strategy 2013/14 p+p and Pb+Pb Full reprocessing of 2010/2011 data p+p and Pb+Pb With respective MC Using the standard (CPass) calibration schema elaborated in 2012 QA and validation protocol unchanged Resources – some modest increase during LS1, but mostly flat Counting on inclusion of two new T1s – KISTI (Korea) and UNAM (Mexico)
Operations - general The Grid infrastructure is mature and performing well Kudos to the sites expertise and delivery There are no incidents worth long discusssion The upgrades are routine: EMI-2 mostly done, new VO-box, AliEn, Torrent… Storage is still a bit ‘slow’ to upgrade, not surprising as it is the most critical point in the site infrastructure
Operations – general (2) 2013/2014 can be used for further consolidation There is constant pressure to deliver, however the absence of new data helps to calm things down Tune-ups of monitoring (see Costin’s presentation) Torrent -> CVMFS (see Predrag’s presentation) IPv6 (Costin/Ramiro) Storage upgrades (Lukasz/Andreas) Local site upgrades (site presentations) Virtualization/the death of batch…
Summary The Grid did not disappoint Stable infrastructure The 2012/13 data was a very successful year for ALICE in terms of data taking and new physics The Grid did not disappoint Stable infrastructure No productions/analysis/RAW data handling/etc.. was stopped/delayed due to Grid performance issues Some impressive work was accomplished (see stats) 2013/2014 will be years of steady data re-processing and the usual analysis load On ~flat resources data deletion/consolidation is a must, storage should be upgraded
Summary (2) Strong site performance All of this success could not be accomplished without Strong site performance Steadfast, dedicated work of site and regional experts Stable Grid middleware and AliEn performance Efforts from the central Grid team: developers and operations (a bit of a self-pat on the back) Thank you all and thank you for the attention