Presentation is loading. Please wait.

Presentation is loading. Please wait.

L’analisi in LHCb Angelo Carbone INFN Bologna

Similar presentations


Presentation on theme: "L’analisi in LHCb Angelo Carbone INFN Bologna"— Presentation transcript:

1 L’analisi in LHCb Angelo Carbone INFN Bologna

2 Introduction The analysis in LHCb is handled by GANGA
an Atlas/LHCb project enabling a user to perform the complete life cycle of a job Build – Configure – Prepare – Monitor – Submit – Merge – Plot It allows to run jobs on the local machine, either interactive or in background on batch systems (LSF, PBS, …) on the Grid Jobs look the same whether the run locally or on the Grid Workshop CCR e INFN-GRID rd May Angelo Carbone

3 LHCb jobs For LHCb the main use of Ganga is for running Gaudi jobs
This means: Configure analysis applications Specify the datasets Split and submit the jobs Managing the output data Merge n-tuples and histogram files Workshop CCR e INFN-GRID rd May Angelo Carbone

4 The Ganga job object Workshop CCR e INFN-GRID rd May Angelo Carbone

5 The Ganga job object Workshop CCR e INFN-GRID rd May Angelo Carbone

6 Application There is a specific application handler for each Gaudi app: ['Brunel', 'Moore', 'DaVinci‘, 'Gauss', 'Boole‘, Root,…] # Define a DaVinci application object d = DaVinci() d.optsfile = d.user_release_area + ’myopts.py' ApplicationMgr().EvtMax = 1000 HistogramPersistencySvc().OutputFile = "DVHistos_1.root“ myopts.py  include the configuration of the user analysis Algorithms, variable cuts, input data sets, etc… Workshop CCR e INFN-GRID rd May Angelo Carbone

7 The Ganga job object Workshop CCR e INFN-GRID rd May Angelo Carbone

8 Backend There are 4 backends of interest for running LHCb jobs:
Interactive – in the foreground on the client Local – in the background on the client LSF – on the LSF batch system (SGE/PBS/Condor systems supported as well) Dirac – on the Grid # Define a Dirac backend object d = Dirac() print d Workshop CCR e INFN-GRID rd May Angelo Carbone

9 Access to the Grid User sends job to DIRAC
WMS sends a pilot agent as a WLCG job When pilot agent runs safely on a worker node it fetches job from DIRAC Small data files returned in the sendbox Large files registered in LFC file catalogue User queries DIRAC for the status and finally retrieves the output Workshop CCR e INFN-GRID rd May Angelo Carbone

10 The Ganga job object Workshop CCR e INFN-GRID rd May Angelo Carbone

11 Input dataset Use the LHCb bookkeeping to get a list of files to run over j.inputdata = browseBK() # opens BK browser Only LFN are accessible Workshop CCR e INFN-GRID rd May Angelo Carbone

12 The Ganga job object Workshop CCR e INFN-GRID rd May Angelo Carbone

13 Output Dataset When a job is finished the output dir will contain the stdout and stderr of the job and your output sandbox files. Output data files are stored in a storage element on the Grid. Large files are uploaded to a storage element - Download with j.backend.getOutputData You can build a list of LFNs of these files – j.backend.getOutputDataLFNs Workshop CCR e INFN-GRID rd May Angelo Carbone

14 The Ganga job object Workshop CCR e INFN-GRID rd May Angelo Carbone

15 Job splitting and data drive submission
Splitter main Job List of LFN catalog LFC Job splitting and data drive submission GANGA List of PFN sub-jobs sub-jobs sub-jobs CNAF RAL CERN IN2P3 GRIDKA PIC NIKHEF Workshop CCR e INFN-GRID rd May Angelo Carbone

16 Merging Jobs produce lots of output files that need to be merged together to obtain final results Different file merging root  RootMerger text  TextMerger DST  DSTMerger Want something really special? CustomMerger Workshop CCR e INFN-GRID rd May Angelo Carbone

17 Monitoring Workshop CCR e INFN-GRID rd May Angelo Carbone

18 Ganga End-Users Over 1000 unique users in the past 6 months:
Dip caused by monitoring outage Over 1000 unique users in the past 6 months: Generally 50% ATLAS (blue), 25% LHCb (green), 25% other Monthly ~500 unique ~2000 unique since January 2007 Workshop CCR e INFN-GRID rd May Angelo Carbone

19 Job efficiency Workshop CCR e INFN-GRID rd May Angelo Carbone

20 Failure Data access failure (19%). There are two main causes for the 19% jobs failing to access input data from the WN. The first is due to instability in the site SRM layer at the Tier-1 sites. not being able to construct TURLs for the software application t access input datasets The other cause of such problems are zero-size or incorrectly registered dataset replicas for which it is impossible to obtain a correct TURL. Workshop CCR e INFN-GRID rd May Angelo Carbone

21 Stalled Stalled (8%) A job is ‘stalled’ if the Job Monitoring Service stops receiving signal of life One of the main causes of this is user proxy expiration on the WN. Submitted Pilot Agents may wait in a site batch queue for several hours, which is a significant portion of a default (12 hour) proxy validity. application failure loss of open data connections at sites and also user code crashes, all of which can result in expending the available wall-clock time of the resource. Workshop CCR e INFN-GRID rd May Angelo Carbone

22 Other minor failures Failed to upload output data (1%)
This caused by the transfer and register operation to the LFC failing. It can happen due to network outages, power cuts, site mis-configurations, and also during LFC downtime. Application failure (1%) The Job Wrapper can identify the exit state of the software applications running on the Grid. A common cause of this type of failure is corrupted software shared-areas at the sites. Workshop CCR e INFN-GRID rd May Angelo Carbone

23 Conclusion The LHCb distributed analysis framework allows users to transparently submit jobs to the Grid Real job efficiency measured so far ~70% Main source of failures data inconsistencies service instabilities Although usable (and used), GRID analysis for LHCb is not yet at production quality Still far from %... Workshop CCR e INFN-GRID rd May Angelo Carbone


Download ppt "L’analisi in LHCb Angelo Carbone INFN Bologna"

Similar presentations


Ads by Google