ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.

Slides:



Advertisements
Similar presentations
During the last three years, ALICE has used AliEn continuously. All the activities needed by the experiment (Monte Carlo productions, raw data registration,
Advertisements

Grid operations in 2014 ALICE Offline week 20 March 2015 Latchezar Betev.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
ALICE data access WLCG data WG revival 4 October 2013.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
ALICE Roadmap for 2009/2010 Patricia Méndez Lorenzo (IT/GS) Patricia Méndez Lorenzo (IT/GS) On behalf of the ALICE Offline team Slides prepared by Latchezar.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Analysis infrastructure/framework A collection of questions, observations, suggestions concerning analysis infrastructure and framework Compiled by Marco.
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop Tsukuba 5 March 2014 Latchezar Betev Updated for the ALICE.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
Analysis trains – Status & experience from operation Mihaela Gheata.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
CERN – Alice Offline – Thu, 20 Mar 2008 – Marco MEONI - 1 Status of Cosmic Reconstruction Offline weekly meeting.
ALICE Operations short summary ALICE Offline week June 15, 2012.
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Status of the Production ALICE TF MEETING 11/02/2010.
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.
M. Gheata ALICE offline week, 24 June  A new analysis train macro was designed for production  /ANALYSIS/macros/AnalysisTrainNew.C /ANALYSIS/macros/AnalysisTrainNew.C.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
ALICE computing Focus on STEP09 and analysis activities ALICE computing Focus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Data Formats and Impact on Federated Access
T1/T2 workshop – 7th edition - Strasbourg 3 May 2017 Latchezar Betev
Analysis trains – Status & experience from operation
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
ALICE Monitoring
Operations in 2012 and plans for the LS1
ALICE Computing : 2012 operation & future plans
Readiness of ATLAS Computing - A personal view
Disk capacities in 2017 and 2018 ALICE Offline week 12/11/2017.
Storage elements discovery
Simulation use cases for T2 in ALICE
AliEn central services (structure and operation)
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Performance optimizations for distributed analysis in ALICE
Presentation transcript:

ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012

2 The 2011 Pb Pb data processing Second pass of LHC11h (Pb Pb period) completed In time for “Hard Probes” Still high memory issue (3.5-4GB), helped by the centres setting up ‘high memory’ temporary queues Third pass: possible before QM’2012 in August Presently running large MC productions..and user analysis – more on this later

3 Production efficiencies MC (aliprod), RAW (alidaq), QA and AOD filtering Averages: aliprod: 90%, alidaq: 75%, global: 82% LHC11h Pass1 LHC11h Pass2

4 Analysis efficiencies Still need serious improvements Average ~30%, large variations I/O intensive jobs are not ‘inefficient’ they simply use other aspects of the hardware… … that said, the CPU/Wall is what is being measured and reported and thus we work to improve it Efficiency strongly depends on the type of analysis

5 User analysis – increasing… 3 weeks ago – 8.5K 7.6GB/sec from SE 69% CPU/Wall 2 weeks ago – 9.9K (+16%) 8.4GB/sec from SE (+11%) 68% CPU/Wall (-2%) Last week – 13.2K (+55%) 12GB/sec from SE (+58%) 66% CPU/Wall (-4%) The efficiency remains almost constant (trains)

6 The LEGO Analysis Trains Pulling together many user analysis tasks (wagons) in a single set of Grid jobs (the train) Pulling together many user analysis tasks (wagons) in a single set of Grid jobs (the train) Managed through a web interface by a Physics Working Group conductor (ALICE has 8 PWGs) Managed through a web interface by a Physics Working Group conductor (ALICE has 8 PWGs) Provides a configuration and test platform (functionality, memory, efficiency) and a submission/monitoring interface Provides a configuration and test platform (functionality, memory, efficiency) and a submission/monitoring interface Speed – few days to go through a complete period Speed – few days to go through a complete period AliROOT Analysis Framework MonALISA Web interface LPM AliEn Grid jobs

7 Status Trains created for 6 PWGs Trains created for 6 PWGs 4 PWGs already actively submitting trains 4 PWGs already actively submitting trains 2 in testing phase 2 in testing phase Gains Gains Up to 30 wagons (individual tasks) for the most advanced Up to 30 wagons (individual tasks) for the most advanced Even for non-optimized trains the average efficiency is >50% Even for non-optimized trains the average efficiency is >50%

8 Future of trains We see that as the only viable method for running large scale (many hundreds of TB input data) analysis on a regular basis ~300TB is what the users have to go through today to analyze the 2010 and 2011 Pb data samples + associated MC Works equally well with smaller samples – the time gain is significant over chaotic user analysis Does not replace chaotic fully, but no need to run over everything anymore…

9 Storage ALICE has EOS – and we are extremely satisfied with the performance It took ½ hours (4- s) to configure and put in production – many thanks to IT/DSS New xrootd v is being deployed at all SEs Following a T1-T2 workshop in January – sites call for a significant improvement of SE monitoring and control xrootd development is ongoing new MonALISA sensors for the servers

10 VO-boxes Re-write of proxy renewal service v2 much more robust than v1 Running at several sites since many weeks To be deployed ASAP at the remaining sites Reduction of number of ALICE services SE already eliminated (last year) PackMan (legacy) not needed with torrent All of this is in AliEn v.2-20

11 AliEn v-2.20 upgrade New version ready New catalogue structure (lighter) M PFNs, 2.5x LFNs Growing at 10Mio new entries per week Extreme job brokering The jobs are no longer pre-split and with determined input data set Potentially one job could process all input data (of the set) at a given site The data locality principle remains 1 day downtime to upgrade all sites End of May or early June

12 SLC6 notes – from the web… from the webfrom the web Many applications will see a large increase in virtual memory usage (but not resident memory) when running on RHEL 6 vs. RHEL 5. For example, see the following "not a bug" bugbug glibc 2.11 which includes a new "per thread arena memory allocator."… especially relevant for applications using many threads Note that this is all virtual memory, not committed/resident memory. An application should not use appreciably more resident memory than it did under RHEL 5. May have implications on sites batch system limits

13 Services summary NTR (as per daily operations meetings) Minor hiccups, very good support at all centres Disk is critical Continuous cleanup of replicas (of older productions) Looking into AOD and ESD sizes Priority to full storages and T1s The ongoing installation of 2012 pledges is reducing the urgency

14 Summary The 2011 HI period data is processed and being analysed 2012 pp data taking ongoing Many thanks to the T0/T1/T2 centres for the excellent performance and support ALICE is happy