ALICE Computing : 2012 operation & future plans

Slides:



Advertisements
Similar presentations
During the last three years, ALICE has used AliEn continuously. All the activities needed by the experiment (Monte Carlo productions, raw data registration,
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
ALICE data access WLCG data WG revival 4 October 2013.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
ALICE – networking LHCONE workshop 10/02/ Quick plans: Run 2 data taking Both for Pb+Pb and p+p – Reach 1 nb -1 integrated luminosity for rare.
ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop Tsukuba 5 March 2014 Latchezar Betev Updated for the ALICE.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
Analysis trains – Status & experience from operation Mihaela Gheata.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
ALICE Operations short summary ALICE Offline week June 15, 2012.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
2012 RESOURCES UTILIZATION REPORT AND COMPUTING RESOURCES REQUIREMENTS September 24, 2012.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week,
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Analysis efficiency Andrei Gheata ALICE offline week 03 October 2012.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Data Formats and Impact on Federated Access
INFNGRID Technical Board, Feb
Atlas IO improvements and Future prospects
Ian Bird WLCG Workshop San Francisco, 8th October 2016
ALICE internal and external network
ALICE Monitoring
Data Challenge with the Grid in ATLAS
INFN-GRID Workshop Bari, October, 26, 2004
for the Offline and Computing groups
ALICE Physics Data Challenge 3
Bernd Panzer-Steindel, CERN/IT
Operations in 2012 and plans for the LS1
LHCb Software & Computing Status
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
MC data production, reconstruction and analysis - lessons from PDC’04
Disk capacities in 2017 and 2018 ALICE Offline week 12/11/2017.
Simulation use cases for T2 in ALICE
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
New strategies of the LHC experiments to meet
Performance optimizations for distributed analysis in ALICE
ATLAS DC2 & Continuous production
Presentation transcript:

ALICE Computing : 2012 operation & future plans Rencontre LCG-France, SUBATECH Nantes 18-20 September 2012

A quick glimpse of 2012 Standard data taking year for ALICE p-p – emphasis on rare triggers, high Pt (Calorimeter) pilot p-A run (few million events) long p-A run in February 2013 (still counts as ‘2012’) Bulk of analysis on 2011 Pb-Pb - the largest single period data sample

2012 - RAW Standard treatment – 2 copies of RAW data One at T0, one replica at T1s, proportional to the fraction of mass storage capacity ~1PB of RAW until now

2012 – job profile Average 28K jobs in parallel Increases as capacities become available

2012 – site contribution Wall time – 50/50 T0/1 to T2s

2012 – French sites Wall time – 25/75 T1 to T2s

Other important parameters Storage (always insufficient…) 2PB of disk, 45% at T1 The balance is not as equal as CPU Network – extremely well provisioned T2 connectivity will further improve with LHCONE

More details on workload Organized activities, including trains Chaotic

Resources use - tasks  Last year goal – increase the fraction of organized analysis Tool – analysis trains Long-term goal, takes a substantial amount of coordination and user education The resources use distribution 10% RAW reconstruction (constant) 16% train analysis (5% beginning of year) 23% chaotic analysis (36% beginning of 2012) 51% Monte-Carlo productions (49% beginning of 2012)

The Analysis Trains Polling together many user analysis tasks (wagons) in a single set of Grid jobs (the train) Managed through a web interface by a Physics Working Group conductor (ALICE has 8 PWGs) Provides a configuration and test platform (functionality, memory, efficiency) and a submission/monitoring interface Speed – few days to go through a complete period (PBs of data!) MonALISA Web interface LPM AliROOT Analysis Framework AliEn Grid jobs

Data access in analysis The chaotic and to some extent organized analysis is I/O bound (efficient use of disk/network resources) Average 8 GB/s, peak 20 GB/s Total read data from 1-st of April is 120 PB.

CPU efficiency Stable (but ‘low’), some improvement with time – increase of trains share over chaotic analysis

CPU efficiency for organized tasks MC - high, RAW ~OK, trains – still needs improvement

Analysis efficiency Processing phases per event Reading event data from disk – sequential De-serializing the event object hierarchy – sequential Processing the event parallelizable Cleaning the event structures - sequential Writing the output – sequential but parallelizable Merging the outputs – sequential but parallelizable tread tds tproc tcl twrite Event #0 Event #1 Event #2 Event #m Event #0 Event #1 Event #2 Event #n Event #0 Event #1 Event #2 Event #p tmerge A.Gheata – improving analysis efficiency

Analysis efficiency (2) The efficiency of the analysis job: job_eff = (tds+tproc+tcl)/ttotal analysis_eff = tproc / ttotal Time/event for different phases depending on many factors Tread~ IOPS*event_size/read_throughput – to be minimized Minimize event size, keep under control read throughput Tds+Tcl~ event_size*n_branches – to be minimized Minimize event size and complexity Tproc = ∑wagonsTi – to be maximized Maximize number of wagons and useful processing Twrite = output_size/write_throughput – to be minimized A.Gheata – improving analysis efficiency

Grid upgrades New AliEn version (v.2-20) – ready for deployment Lighter catalogue structure Presently  @500 M LFNs, 2.5x PFNs (replicas) Growing at 10Mio new entries per week Extreme job brokering The jobs are no longer pre-split and with pre-determined input data set Potentially one job could process all input data (of the set) at a given site The data locality principle remains (for now) The site/central services upgrade – need some downtime, after end of data taking in Feb.2013

analyze as much as possible File Brokering Site A Site B Site C File 1 File 2 File 3 File 4 File 5 Current schema Submit 4 jobs: File1 File 4 File2 File3 File 5 Broker per file Submit 3 empty subjobs If nothing left, just exit File 1,2,4,5 When a job starts, analyze as much as possible File 3 From P.Saiz – AliEn development

Short development roadmap Data management: Popularity service SE layout (EOS-like) GUIDless catalogue Job Processing: Job Merging Error classification Multicore and multiagent Remote access optimization Combine AF/classical grid CE – interactive Grid

General remarks on the future 2013-2014 – Long Shutdown 1 No revolution is (ever) planned, however… All LHC experiments have submitted LoI for the LS3 (HL-LHC) upgrades in 2022 For the computing is massively larger than today (data rates and volumes, CPU needs) – 10-30x of today – the factors are not yet finalized Massive online DAQ and HLT event filtering farms 2x size of what a T1 is today No clear ideas how this will be achieved – technologically and financially Moore’s and Kryder’s laws will not ‘cover’ the needs

General remarks on the future The present Grid profited from ~10 years of planning and development (on par with the detectors) And it delivered from day 1, continues to this day The future planning and development of Grid/Cloud/<Insert name here> should start now – years of experience will help, but not enough Parallel programming cannot be done by physicists… there are other hurdles too

General remarks on the future Big improvement is expected from the frameworks and code Undoubtedly a common effort and professional help will be necessary Parallelism is a no-brainer, given the technological trends Big parts of the code must be re-engineered and re-written Every experiment has a panel which is charged with the design of the ‘new’ software Crystal balls have been ordered 

Summary – back to today  The 2012 is so far a standard data taking/processing/analysis year for ALICE – much excitement is expected in February with the p-A data The operation is smooth and is helped a lot by the mature Grid around the world The French T1/T2s are part of this structure, with remarkably stable performance and well balanced components – CPU, storage, networks … and of course a solid expert support at all levels – a big **thank you** for this!

Summary – cont.  The (near)future developments are focused on analysis tasks and tools Emphasis on data containers and process synchronization Whole node is a promising path and will naturally help the multicore development Progressive introduction of new features, improvements The Grid must run continuously also during the LS1 shutdown Resources (disk) are scarce More efficient use – less replicas, WAN access