ALICE: le strategie per l’analisi Massimo Masera Dipartimento di Fisica Sperimentale e INFN sezione di Torino Workshop CCR e INFN-GRID 2009.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Ideas for a virtual analysis facility Stefano Bagnasco, INFN Torino CAF & PROOF Workshop CERN Nov 29-30, 2007.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Overview of grid activities in France in relation to FKPPL FKPPL Workshop Thursday February 26th, 2009 Dominique Boutigny.
Analysis trains – Status & experience from operation Mihaela Gheata.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
II EGEE conference Den Haag November, ROC-CIC status in Italy
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Workload Management Workpackage
Current Generation Hypervisor Type 1 Type 2.
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Data Challenge with the Grid in ATLAS
Walter Binder Giovanna Di Marzo Serugendo Jarle Hulaas
Status of the CERN Analysis Facility
INFN-GRID Workshop Bari, October, 26, 2004
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
ALICE Physics Data Challenge 3
STORM & GPFS on Tier-2 Milan
Comments about PROOF / ROOT evolution during coming years
Simulation use cases for T2 in ALICE
Analysis framework - status
WLCG Collaboration Workshop;
TYPES OFF OPERATING SYSTEM
Support for ”interactive batch”
Lecture Topics: 11/1 General Operating System Concepts Processes
Simulation in a Distributed Computing Environment
Presentation transcript:

ALICE: le strategie per l’analisi Massimo Masera Dipartimento di Fisica Sperimentale e INFN sezione di Torino Workshop CCR e INFN-GRID 2009

Outline The approach of the ALICE experiment to analysis tasks:  Analysis framework  Implications for the Tier-1 and Tier-2 computing centres Issues:  Lessons from the previous talk  Stability of the services  Human resources: Management of the sites Support to the users WITHIN the ALICE national community Interactive analysis  Analysis with PROOF  PROOF facilities: use of virtualization within a Tier-2 centre: A protoype 213/05/2009ALICE analysis

Analysis in ALICE The proposed sub-title for this talk is:  Un rappresentante senior dell'esperimento descrive il piano con cui la collaborazione italiana si sta preparando ad utilizzare le risorse ai T2 e T3 per l'analisi dei dati prodotti ad LHC… (it goes on, but it is enough for now) There are 2 points which deserve a preliminary comment  “Italian collaboration”. Well, the plan to analyze LHC data is unique for the whole experiment. There is no room, within the ALICE computing rules, for approaches, which are not integrated with the ALICE offline framework. Nevertheless: The Italian community has to organize the human resources to manage the computing centres and to support the users. This is critical, but it is not peculiar to the analysis part only. Interactive analysis. We are planning to deploy an analysis facility to be operated with PROOF. 313/05/2009ALICE analysis

Analysis in ALICE There are 2 points which deserve a preliminary comment (cont’d)  How do we plan to use Tier-3  Within INFN-Grid, Tier-3 has been a forbidden word so far (often a forbidden dream): A too easy way to keep normal users away from GRID solutions; A way to make everybody happy with a local computing farm to play with.  In the ALICE computing model there is no a specific role for Tier-3  in the model are considered “community oriented” contributions, open to the whole collaboration  Physics results can be published only if obtained on the GRID, with input/output available on the GRID  Local computing resources are hence considered “private” and intended only for the development phase of the code and of the analysis tasks  very small communities/single physicists Sometimes a desktop does the job Small farms proved to be very effective for these purposes 413/05/2009ALICE analysis

Analysis in ALICE There are 2 points which deserve a preliminary comment (cont’d)  How do we plan to use Tier-3  PROOF clusters do not fit in the MONARC naming scheme and are not operated as LCG centres  However the only PROOF cluster mentioned in our computing model, the CERN Analys Facility, proved to be very effective and it is quite popular in the ALICE community.  PROOF analysis facilities have been deployed in Germany (GSI) and are in the final stage of deployment in France (Lyon). They are opened (at least nominally) to any member of the collaboration.  We are planning to build PROOF clusters also also in Italy, intended as virtual facilities largely based on existing hardware in Tier-2 centres.  PROOF clusters are somehow the ALICE reinterpretation of the Tier-3 concept They need to be integrated in the production framework and opened to potentially every ALICE user 513/05/2009

Analysis framework The analysis framework has been introduced by A. Dainese in the previous talk The framework is common to both scheduled and chaotic analysis 6 The analysis code is organized in classes which inherit from AliAnalysisTask The analysis is steered by a AliAnalysisManager object The goal is to be able to develop with a single macro the analysis Locally (1 st phase) Possibly on a PROOF cluster (test) On the GRID (production) It works with AOD as well ALICE analysis

Scheduled analysis The first analysis step has ESD (Event Summary Data) as input. It is a scheduled analysis procedure. Analyses which are carried out on ESDs are done where ESDs are stored: mainly in Tier-1 sites for RAW ESDs and Tier-2 for MC ESDs. The output can range from simple histograms to AODs (Analysis Object Datasets), which in turn can be used as input for subsequent analysis passes AODs are replicated in at least 3 Tier-2 centres (more depending on popularity) to optimize data access and availability 713/05/2009ALICE analysis

Scheduled analysis Analysis tasks to be included in scheduled analyses must obey to basically two validation criteria:  Physics-wise: validation is done within the Physics Working Groups  Framework-wise: the code must run seamlessly on the GRID with a released version of AliRoot. Different tasks are executed together, grouped in the so called analysis trains 8 Scheduled analysis passes are steered by the core offline team on the sites contributing to ALICE Priorities defined by the Physics Board 13/05/2009ALICE analysis

Chaotic analysis Steered by small groups of physicists (at the limit single physicists) Done on Tier-2 sites: AODs as input Same framework as for scheduled analysis  Essentially shorter trains sent by individual users Tier-2 resources will be devoted (actually are) with priority to the user analysis The management of user priorities is not an issue (single task queue for all the jobs)  Priorities are managed at central services level Results of the analysis can be directly accessed from a root session running on a ALICE user’s laptop  Only alien-client is needed  Done with TGrid class  File accessed via xrootd 913/05/2009ALICE analysis

Chaotic analysis The analysis framework provides tools to allow the development of an analysis macro on local resources and its usage on a PROOF cluster or on the GRID Local CPU resources do not mean local data: ALICE catalogue and data can be accessed from virtually any PC All the AODs corresponding to a given dataset are replicated on several Tier-2. There are no different roles for different Tier-2’s. Data localization in principle is not relevant  In some cases we asked to replicate on Italian T2’s data of particular interest for our community, mostly to have the opportunity to test our sites performance  In any case, Physics Working Groups are naturally spread  We need to support ALICE users from any country participating in the experiment 1013/05/2009ALICE analysis

Issues for INFN Tier-x sites Lessons from the previous talk:  The analysis framework is being tested by physicists in view of its use with real data  Results are positive: no urgent need (and time is running out) for new features Problems like file collocation issues will be addressed at the level of the central catalogue With recent xrootd, the global redirector solves this problem  Stability and reliability of the sites and of local ALICE services are essential to run an analysis in an effective way. 1113/05/2009ALICE analysis

Issues for INFN Tier-x sites Lessons from the previous talk:  Human resources, also within the collaboration, are the main asset for a successful usage of the resources As a national community we are aiming to form a group of collaboration members able to monitor and to some extent manage Tier-2 activities This group should contribute also to the user support. We had a first “hands-on” meeting last January in Torino.  We decided to exploit the existing INFN-GRID ticketing system An ALICE support unit has been created A couple of people/site has been registered as supporter 1213/05/2009ALICE analysis

Interactive analysis There are several tasks which would greatly benefit from an interactive approach. For instance:  Validation of the analysis algorithms on a sizeable, but limited data set  Tuning of multidimensional data selection cuts  In general analysis items for which a prompt response is critical, typically in view of a major analysis pass The CERN Analysis Facility is highly appreciated by ALICE physicists, but it can accommodate a limited number of users In Italy we cannot afford to build a static facility for interactive use Using the XEN hypervisor it is possible to build a fully virtual Analysis Facility on the same Worker Nodes that compose an existing LCG Grid Farm  this maximize HW usage A small protoype is being tested in Torino 13 XEN dom0 LCG WorkerNode PROOF slave 13/05/2009 Preliminary results presented at ACAT08 and at CHEP09

root Remote PROOF Cluster Data root Client – Local PC ana.C stdout/result node1 node2 node3 node4 ana.C root PROOF Schema Data Proof master Proof slave Result Data Result Data Result 13/05/200914ALICE analysis

“LCG” Configuration 15 LCG CE WN PROOF Master  Start with an LCG CE farm  Virtualize the Worker Nodes and add a second Virtual Machine on each of them, running a Parallel Interactive platform like PROOF  Both environments are sandboxed and independent  Xen can dynamically allocate resources (both CPU priority and memory) to either machine, no reboot or restart needed  Normal operation: PROOF slaves are “dormant” (minimal memory allocation, very low CPU priority) 13/05/2009ALICE analysis

“PROOF” Configuration 16 LCG CE WN PROOF Master Slave  When needed, resource can be moved to from virtual LCG WNs to virtual PROOF slaves with minimal latency  Grid batch job on the WN ideally never completely stops, only slows down: non- CPU-intensive I/0 operations can go on and do not timeout  As the demand for interactive access increases, resources can be added either by shrinking further the WNs or by “waking up” more PROOF slaves  As soon as everybody goes home, resources can be moved back to the WNs, that resume batch processing at full speed 13/05/2009ALICE analysis

The prototype 13/05/2009ALICE analysis17 Hardware 4x HP ProLiant 360DL, dual quad-core, plus one head node for access, management and monitoring Separate physical 146GB SAS disk for each virtual machine performance isolation Private network with NAT to outside world (currently including storage) Software Linux CentOS 5.1 with kernel on dom0 Xen 3.0 gLite 3.1 Worker Node suite PROOF/Scalla Custom monitoring & management tools

Moving resources 13/05/2009ALICE analysis18  Moving resources from the Wn to the PROOF slave, as seen by the WN  About 1.5 minutes to complete transition (but the slave can be started almost immediately)  Moving resources from the Wn to the PROOF slave, as seen by the WN  About 1.5 minutes to complete transition (but the slave can be started almost immediately)  CPU efficiency (CPU time/wall clock time) for regular ALICE MC production/reconstruction jobs, in three different resource configurations  Jobs become increasingly I/O-bound as swap activity increases  No abnormal job terminations observed  CPU efficiency (CPU time/wall clock time) for regular ALICE MC production/reconstruction jobs, in three different resource configurations  Jobs become increasingly I/O-bound as swap activity increases  No abnormal job terminations observed

Performance 13/05/2009ALICE analysis19  Parallel interactive analysis on PROOF nodes  Regular load on WNs (mix of memory- hungry ALICE production jobs, ALICE user jobs and some jobs from other VOs)  PROOF scaling OK (saturates only when # of workers = # of CPUs)  Parallel interactive analysis on PROOF nodes  Regular load on WNs (mix of memory- hungry ALICE production jobs, ALICE user jobs and some jobs from other VOs)  PROOF scaling OK (saturates only when # of workers = # of CPUs)  PROOF processing rate in several resource configuration  Separate physical disks ensure minimal impact from swapping  System behave as expected: Grid jobs go on running, even though very slowly, and resources are efficiently exploited by PROOF  PROOF processing rate in several resource configuration  Separate physical disks ensure minimal impact from swapping  System behave as expected: Grid jobs go on running, even though very slowly, and resources are efficiently exploited by PROOF

to-do list Develop an automatic system for resource allocation Investigate the optimal resource allocation policy Test for diret access to local SE  Local network optimization Accounting:  Interface with standard accounting system 13/05/2009ALICE analysis20

Conclusions INFN sites will participate in the ALICE analysis effort, according to the computing model The analysis framework is being used by real users  it has the functionality to deal with real LHC data Issues:  Storage solution must be stable and efficient. See S. Bagnasco’s talk for a discussion of the existing solutions suitable for ALICE (data accessed with xrootd)  Human resources: growing involvement of ALICE members in production and user support activities. Needed to have an adequate service level We are planning to devote dynamically a limited amount of resources (at T2) to PROOF 13/05/2009ALICE analysis21

13/05/2009ALICE analysis22 Siamo attrezzati per l’analisi