Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 An Update.

Slides:

Advertisements

Similar presentations

1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.

Advertisements

1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.

ALICE Operations short summary LHCC Referees meeting June 12, 2012.

ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.

1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.

ALICE Roadmap for 2009/2010 Patricia Méndez Lorenzo (IT/GS) Patricia Méndez Lorenzo (IT/GS) On behalf of the ALICE Offline team Slides prepared by Latchezar.

Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.

DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.

IST E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA Raquel Pezoa Universidad.

ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.

Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

The ALICE Computing F.Carminati May 4, 2006 Madrid, Spain.

The ALICE Distributed Computing Federico Carminati ALICE workshop, Sibiu, Romania, 20/08/2008.

Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA.

Planning and status of the Full Dress Rehearsal Latchezar Betev ALICE Offline week, Oct.12, 2007.

INFSO-RI Enabling Grids for E-sciencE Experience of using gLite for analysis of ATLAS combined test beam data A. Zalite / PNPI.

AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

ALICE experiences with CASTOR2 Latchezar Betev ALICE.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,

ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.

ALICE computing Focus on STEP09 and analysis activities ALICE computing Focus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP.

The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.

CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.

Federating Data in the ALICE Experiment

Report PROOF session ALICE Offline FAIR Grid Workshop #1

Status of the CERN Analysis Facility

INFN-GRID Workshop Bari, October, 26, 2004

ALICE Physics Data Challenge 3

Update on Plan for KISTI-GSDC

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

MC data production, reconstruction and analysis - lessons from PDC’04

Simulation use cases for T2 in ALICE

ALICE Computing Model in Run3

ALICE Computing Upgrade Predrag Buncic

LHC Data Analysis using a worldwide computing grid

Presentation transcript:

Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 Federico Carminati, Peter Hristov NEC’2011 Varna September 12-19, 2011 An Update about ALICE Computing

NEC’2001 AliRoot for simulation NEC’2003 Reconstruction with AliRoot NEC’2005 AliRoot for analysis NEC’2007 Still no LHC data => Status and plans of the ALICE offline software. Calibration & Alignment NEC’2009 In preparation for the first LHC data, no presentation NEC’2011 Almost 2 years of stable data taking, a lot of published physics results => An update about the ALICE computing

Why HI collisions? STAR Indication of trans. HG to QGP at T c  170 MeV  c  1 GeV/fm 3 Phase trans. or crossover? Intermediate phase of strongly interacting QGP? Chiral symmetry restoration ? Constituent mass  current mass NA49 3 flavors Study QCD at its natural energy scale T = Λ QCD =200 MeV by creating a state of matter at high density and temperature using high energetic heavy ion collisions. ALICE

4 History of High-Energy A+B Beams BNL-AGS: mid 80’s, early 90’s O+A, Si+A15 AGeV/c√s NN ~ 6 GeV Au+Au11 AGeV/c√s NN ~ 5 GeV CERN-SPS: mid 80’s, 90’s O+A, S+A200 AGeV/c√s NN ~ 20 GeV Pb+A160 AGeV/c√s NN ~ 17 GeV BNL-RHIC: early 00’s Au+Au√s NN ~ 130 GeV p+p, d+Au √s NN ~ 200 GeV LHC: 2010 (!) Pb+Pb √s NN ~ 5,500 (2,760 in ’10-’12) GeV p+p√s NN ~ 14,000 (7000 in ’10-’12) GeV

2 level 0 - special hardware 8 kHz (160 GB/sec) level 1 - embedded processors level 2 - PCs 200 Hz (4 GB/sec) 30 Hz (2.5 GB/sec) 30 Hz (1.25 GB/sec) data recording & offline processing Total weight 10,000t Overall diameter 16.00m Overall length 25m Magnetic Field 0.5Tesla ALICE Collaboration ~ 1/2 ATLAS, CMS, ~ 2x LHCb ~1000 people, 30 countries, ~ 80 Institutes A full pp programme Data rate for pp is

Organization Core Offline is CERN responsibility Framework development Coordination activities Documentation Integration Testing & release Resource planning Each sub detector is responsible for its own offline system It must comply with the general ALICE Computing Policy as defined by the Computing Board It must integrate into the AliRoot framework

PLANNING IN PREPARING FOR BATTLE I ALWAYS FOUND PLANS USELESS BUT PLANNING ESSENTIAL GEN D.EISENHAUER (155 open items, 3266 total)

RESOURCES Sore point for ALICE computing

9 Computing model – pp Generation of calibration parameters RAW Calibration Disk buffer T0 CERN T0 First pass Reco Tape T0 T0 tape To Grid FC Alien FC CAF analysis to T1s T1s MC data T2s Pass 1& 2 reco ordered analysis end-user analysis

10 Computing model – AA Generation of calibration parameters RAW Calibration Disk buffer T0 Tape T0 To tape To Grid FC Alien FC CAF analysis to T1s T1s CERN T0 Pilot Reco First pass Reco HI data taking LHC shutdown From tape MC data T2s Pass 1& 2 reco ordered analysis end-user analysis

Prompt reconstruction Based on PROOF (TSelector) Very useful for high-level QA and debugging Integrated in the AliEVE event display Full Offline code sampling events directly from DAQ memory

12 Visualization V0

13 Ξ  →    Λ→ p    ALICE Analysis Basic Concepts Analysis Models Prompt data processing (calib, align, reco, with PROOF Batch Analysis using GRID infrastructure Local analysis Interactive analysis PROOF+GRID User Interface Access GRID via AliEn or ROOT UIs PROOF/ROOT Enabling technology for (C)AF GRID API class TAliEn Analysis Object Data contain only data needed for a particular analysis Extensible with ∆-AODs Same user code local, on CAF and Grid Work on the distributed infrastructure has been done by the ARDA project  →    Λ→ p   

Analysis train AOD TASK 1TASK 2TASK 3TASK 4 ESD Kine Eff cor AOD production is organized in a ‘train’ of tasks To maximize efficiency of full dataset processing To optimize CPU/IO Using the analysis framework Needs monitoring of memory consumption and individual tasks

Analysis on the Grid

Production of RAW Successful despite rapidly changing conditions in the code and detector operation 74 major cycles events (RAW) passed through the reconstruction Processed 3.6PB of data Produced 0.37TB of ESDs and other data

17 ALICE central services Sending jobs to data Job 1lfn1, lfn2, lfn3, lfn4 Job 2lfn1, lfn2, lfn3, lfn4 Job 3lfn1, lfn2, lfn3 Optimizer Submits job User ALICE Job Catalogue Registers output lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} ALICE File Catalogue Computing Agent Site Computing Agent Site Computing Agent Site Computing Agent Send results Fetch job Job 1.1lfn1 Job 1.2lfn2 Job 1.3lfn3, lfn4 Job 2.1lfn1, lfn3 Job 2.1lfn2, lfn4 Job 3.1lfn1, lfn3 Job 3.2lfn2

18 Storage strategy WN VOBOX::SA xrootd (manager) MSS xrootd (server) Disk SRM xrootd (server) DPM xrootd (server) Castor SRM SRM MSS xrootd emulation (server) dCache SRM

The access to the data Application ALICE FC File GUID, lfn or MD SE & pfn & envelope lfn → guid → ( acl, size, md5) build pfn who has pfn? SE & pfn xrootd ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… ev#guidTag1, tag2, tag3… Tag catalogue Direct access to data via TAliEn/TGrid interface

The ALICE way with XROOTD Pure Xrootd + ALICE strong authz plugin. No difference among T1/T2 (only size and QOS) WAN-wide globalized deployment, very efficient direct data access Tier-0: CASTOR+Xrd serving data normally. Tier-1: Pure Xrootd cluster serving conditions to ALL the GRID jobs via WAN “Old” DPM+Xrootd in some tier2s Xrootd site (GSI) A globalized cluster ALICE global redirector Local clients work Normally at each site Missing a file? Ask to the global redirector Get redirected to the right collaborating cluster, and fetch it. Immediately. A smart client could point here Any other Xrootd site (CERN) Cmsd Xrootd Virtual Mass Storage System … built on data Globalization More details and complete info in “Scalla/Xrootd WAN globalization tools: where we CHEP09

21 CAF lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} xrootd CASTOR WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD WN PROOF XROOTD PROOF master The whole CAF becomes a xrootd cluster Powerful and fast machinery – very popular with users Powerful and fast machinery – very popular with users Allows for any use pattern, however quite often leading to contention for resources Allows for any use pattern, however quite often leading to contention for resources Expected speedup Observed speedup 70% utilization

22 Analysis facilities - profile 1.8 PB of data through CAF, 550TB through SKAF For comparison – on the Grid, we have written 15PB, read 37PB

The ALICE Grid AliEn working prototype in 2002 Single interface to distributed computing for all ALICE physicists File catalogue, job submission and control, software management, user analysis ~80 participating sites now 1 T0 (CERN/Switzerland) 6 T1s (France, Germany, Italy, The Netherlands, Nordic DataGrid Facility, UK) KISTI and UNAM coming (!) ~73 T2s spread over 4 continents ~30,000 (out of ~150,000 WLCG) cores and 8.5 PB of disk Resources are “pooled” together No localization of roles / functions National resources must integrate seamlessly into the global grid to be accounted for FAs contribute proportionally to the number of PhDs (M&O-A share) T3s have the same role than T2s, even if they do not sign the MoU

All is in MonALISA

GRID operation principle Central AliEn services Site VO-box WMS (gLite/ARC/OSG/Local) SM (dCache/DPM/CASTOR/xrootd) Monitoring, Package management The VO-box system (very controversial in the beginning) Has been extensively tested Allows for site services scaling Is a simple isolation layer for the VO in case of troubles

Operation – central/site support Central services support (2 FTEs equivalent) There are no experts which do exclusively support – there are 6 highly- qualified experts doing development/support Site services support - handled by ‘regional experts’ (one per country) in collaboration with local cluster administrators Extremely important part of the system In normal operation ~0.2FTEs/site Regular weekly discussions and active all-activities mailing lists

Summary ALICE offline framework (AliRoot) is mature project that covers simulation, reconstruction, calibration, alignment, visualization and analysis Successful operation with “real data” since 2009 The results for several major physics conferences were obtained in time The Grid and AF resources are adequate to serve the RAW/MC and user analysis tasks More resources would be better of course The sites operation is very stable The gLite (EMI now) software is mature and few changes are necessary

Some Philosophy

29 The code Move to C++ was probably inevitable But it made a lot of “collateral damage” Learning process was long, and it is still going on Very difficult to judge what would have happened “had root not been there” The most difficult question is now “what next” A new language? there is none at the horizon Different languages for different scopes (python, java, C, CUDA…) just think about debugging A better discipline in using C++ (in ALICE no STL / templates) Code management tools, build systems, (c)make, autotools Still a lot of “glue” has to be provided, no comprehensive system “out of the box” Move to C++ was probably inevitable But it made a lot of “collateral damage” Learning process was long, and it is still going on Very difficult to judge what would have happened “had root not been there” The most difficult question is now “what next” A new language? there is none at the horizon Different languages for different scopes (python, java, C, CUDA…) just think about debugging A better discipline in using C++ (in ALICE no STL / templates) Code management tools, build systems, (c)make, autotools Still a lot of “glue” has to be provided, no comprehensive system “out of the box”

30 The Grid A half empty glass We are still far from the “Vision” A lot of tinkering and hand-holding to keep it alive 4+1 solutions for each problem We are just seeing now some light at the end of the tunnel of data management The half full glass We are using the Grid as a “distributed heterogeneous collection of high-end resources”, which was the idea after all LHC physics is being produced by the Grid A half empty glass We are still far from the “Vision” A lot of tinkering and hand-holding to keep it alive 4+1 solutions for each problem We are just seeing now some light at the end of the tunnel of data management The half full glass We are using the Grid as a “distributed heterogeneous collection of high-end resources”, which was the idea after all LHC physics is being produced by the Grid

31 Grid need-to-have Far more automation and resilience Make the Grid less manpower intensive More integration between workload management and data placement Better control of upgrades (OS, MW) Or better transparent integration of different OS/MW Integration of the network as an active, provisionable resource “Close” storage element, file replication / caching vs remote access Better monitoring Or perhaps simply more coherent monitoring... Far more automation and resilience Make the Grid less manpower intensive More integration between workload management and data placement Better control of upgrades (OS, MW) Or better transparent integration of different OS/MW Integration of the network as an active, provisionable resource “Close” storage element, file replication / caching vs remote access Better monitoring Or perhaps simply more coherent monitoring...