ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.

Slides:

Advertisements

Similar presentations

The LEGO Train Framework

Advertisements

Spark: Cluster Computing with Working Sets

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.

Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.

ALICE Operations short summary LHCC Referees meeting June 12, 2012.

Chiara Zampolli in collaboration with C. Cheshkov, A. Dainese ALICE Offline Week Feb 2009C. Zampolli 1.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.

Experience with analysis of TPC data Marian Ivanov.

 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.

Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,

PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.

Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.

Analysis trains – Status & experience from operation Mihaela Gheata.

Working with AliEn Kilian Schwarz ALICE Group Meeting April

Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.

ALICE analysis framework References for Analysis Tools used to the ALICE simulated data.

Argonne Jamboree January 2010 Esteban Fullana AOD example analysis.

Debugging of # P. Hristov 04/03/2013. Introduction Difficult problem – The behavior is “random” and depends on the “history” – The debugger doesn’t.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.

PESAsim – the e/  analysis framework Validation of the framework First look at a trigger menu combining several signatures Short-term plans Mark Sutton.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.

PWG-CF Jan Fiete Grosse-Oetringhaus Analysis Session Offline Week March 2012.

The PESAsim analysis framework What it is How it works What it can do How to get it and use it Mark Sutton Tania McMahon Ricardo Gonçalo.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,

A. Gheata, ALICE offline week March 09 Status of the analysis framework.

AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.

Gustavo Conesa ALICE offline week Gamma and Jet correlations analysis framework Short description, Status, HOW TO use and TO DO list 1/9.

1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.

Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.

Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.

M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+

Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.

Analysis train M.Gheata ALICE offline week, 17 March '09.

M. Gheata ALICE offline week, 24 June  A new analysis train macro was designed for production  /ANALYSIS/macros/AnalysisTrainNew.C /ANALYSIS/macros/AnalysisTrainNew.C.

V5-01-Release & v5-02-Release Peter Hristov 20/02/2012.

Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.

Improving the analysis performance Andrei Gheata ALICE offline week 7 Nov 2013.

ATLAS Physics Analysis Framework James R. Catmore Lancaster University.

CCR e INFN-GRID Workshop, Palau, Andrea Dainese 1 L’analisi per l’esperimento ALICE Andrea Dainese INFN Padova Una persona attiva come utente.

AliRoot survey: Calibration P.Hristov 11/06/2013.

HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.

CALIBRATION: PREPARATION FOR RUN2 ALICE Offline Week, 25 June 2014 C. Zampolli.

The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.

Data Formats and Impact on Federated Access

Atlas IO improvements and Future prospects

Analysis trains – Status & experience from operation

Analysis tools in ALICE

Running a job on the grid is easier than you think!

Running a job on the grid is easier than you think!

Status of the Analysis Task Force

Status of the CERN Analysis Facility

ALICE analysis preservation

Existing Perl/Oracle Pipeline

Experience in ALICE – Analysis Framework and Train

Analysis Trains - Reloaded

AliRoot status and PDC’04

MC data production, reconstruction and analysis - lessons from PDC’04

Analysis framework - status

Operation System Program 4

Performance optimizations for distributed analysis in ALICE

Overview of Workflows: Why Use Them?

Presentation transcript:

ANALYSIS TRAIN ON THE GRID Mihaela Gheata

AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full dataset processing ◦ To optimize CPU/IO ◦ Using the analysis framework ◦ Testing was done to see if the AOD production approach using a train of tasks adding information one- by-one really works AOD TASK 1TASK 2TASK 3TASK 4 ESD Kine

What was tested Small train (2 tasks) ◦ AliAnalysisTaskESDFilter – ESD to AOD filter task  Filtering tracks and copying other ESD information into AOD structure ◦ AliAnalysisTaskJets – task for executing jet analysis using the analysis manager framework  Doing jet reconstruction and writing results to AOD Possibility to run in different modes ◦ Local (with files accessed from AliEn/CAF) – testing if the algorithm works ◦ PROOF mode (CAF) – testing parallel run mode and merging ◦ Grid mode – the production usage

The setup Many changes happened in a short time in the framework ◦ Packages re-organized: base/virtual classes moved in STEERBase ◦ Analysis framework: event handlers (AOD handler, MC handler) ◦ Data structure (ESD,AOD) have been re-designed The test setup was.par based (submit code rather than use existing libs) ◦ Set of base common packages: STEERBase, ESD, AOD, ANALYSIS ◦ Analysis tasks: PWG0base, JETAN ◦ The realistic use case will be using AliRoot libraries !

Problems on the way Several crashes or inconsistent results in early tests ◦ Accessing old data (PDC’06) using new ESD structure was not possible for a while ◦ Some fixes (protections) were added in the filtering code ◦ Event loop not processing all chained files in GRID mode for tag-based analysis (uninitialized variable in TChain) Needed quite long to familiarize with usage of AliEn + event tag system ◦ Some procedures not exactly working as described (at least for my rather exotic setup)  Hopefully users will be giving more and more feedback  More FAQ’s and working examples

Splitting in GRID Some specific GRID issues like: how to split the input ESD chain (how many files/ sub- process) ◦ Too few (e.g. 1/job) good for minimizing the failure rate but not always efficient (jobs waiting long from SPLITTING to being processed) ◦ Tested 1,5,10,20,50 files/subjob (100KB to 5MB aod.root) ◦ Too many – jobs starting to fail pointing to memory problems ? (or just being killed…)  Cannot be reproduced in local mode  Hard to investigate what happened  Efficiency depends also on processing time  Cannot really be tested now (filtering and jet analysis are fast)

# this is the startup process for root Executable="root.sh"; Jobtag={"comment:JET analysis test"}; # we split per storage element Split="se"; # we want each job to read 10 input files SplitMaxInputFileNumber="10"; # this job has to run in the ANALYSIS partition Requirements=( member(other.GridPartitions,"Analysis") ); # we need ROOT and the API service configuration package TTL = "30000"; #ROOT will read this collection file to know, which files to analyze InputDataList="wn.xml"; #ROOT requires the collection file in the xml-single format InputDataListFormat="merge:/alice/cern.ch/user/m/mgheata/test/global.xml"; # this is our collection file containing the files to be analyzed InputDataCollection="LF:/alice/cern.ch/user/m/mgheata/test/global.xml,nodownload"; InputFile= {"LF:/alice/cern.ch/user/m/mgheata/test/runProcess.C", "LF:/alice/cern.ch/user/m/mgheata/test/ESD.par", "LF:/alice/cern.ch/user/m/mgheata/test/AOD.par", "LF:/alice/cern.ch/user/m/mgheata/test/ANALYSIS.par", "LF:/alice/cern.ch/user/m/mgheata/test/JETAN.par", "LF:/alice/cern.ch/user/m/mgheata/test/STEERBase.par", "LF:/alice/cern.ch/user/m/mgheata/test/PWG0base.par", "LF:/alice/cern.ch/user/m/mgheata/test/ConfigJetAnalysis.C", "LF:/alice/cern.ch/user/m/mgheata/test/CreateChain.C"}; # Output archive # Output directory OutputDir="/alice/cern.ch/user/m/mgheata/test/output/#alien_counter#"; # Output files OutputFile={"aod.root","histos.root","jets_local.root"}; # Merge the output Merge = {"aod.root:/alice/jdl/mergerootfile.jdl:aod-merged.root"}; MergeOutputDir={"/alice/cern.ch/user/m/mgheata/test/"}; # Validation Validationcommand ="/alice/cern.ch/user/m/mgheata/bin/validate.sh"; #

AOD validation As well as ESD’s, produced AOD’s need to pass a validation test Otherwise there is no control in what was really produced ◦ As in AliEn’s Validationcommand={…} I implemented this in a very simple way Trying to open aod.root and writing a tiny file in case of success ◦ No need be too elaborated at this level ◦ Can be refined per run (doing QA histograms…)

# this is the startup process for root Executable="root.sh"; Jobtag={"comment:JET analysis test"}; # we split per storage element Split="se"; # we want each job to read 10 input files SplitMaxInputFileNumber="10"; # this job has to run in the ANALYSIS partition Requirements=( member(other.GridPartitions,"Analysis") ); # we need ROOT and the API service configuration package TTL = "30000"; #ROOT will read this collection file to know, which files to analyze InputDataList="wn.xml"; #ROOT requires the collection file in the xml-single format InputDataListFormat="merge:/alice/cern.ch/user/m/mgheata/test/global.xml"; # this is our collection file containing the files to be analyzed InputDataCollection="LF:/alice/cern.ch/user/m/mgheata/test/global.xml,nodownload"; InputFile= {"LF:/alice/cern.ch/user/m/mgheata/test/runProcess.C", "LF:/alice/cern.ch/user/m/mgheata/test/ESD.par", "LF:/alice/cern.ch/user/m/mgheata/test/AOD.par", "LF:/alice/cern.ch/user/m/mgheata/test/ANALYSIS.par", "LF:/alice/cern.ch/user/m/mgheata/test/JETAN.par", "LF:/alice/cern.ch/user/m/mgheata/test/STEERBase.par", "LF:/alice/cern.ch/user/m/mgheata/test/PWG0base.par", "LF:/alice/cern.ch/user/m/mgheata/test/ConfigJetAnalysis.C", "LF:/alice/cern.ch/user/m/mgheata/test/CreateChain.C"}; # Output archive # Output directory OutputDir="/alice/cern.ch/user/m/mgheata/test/output/#alien_counter#"; # Output files OutputFile={"aod.root","histos.root","jets_local.root"}; # Merge the output Merge = {"aod.root:/alice/jdl/mergerootfile.jdl:aod-merged.root"}; MergeOutputDir={"/alice/cern.ch/user/m/mgheata/test/"}; # Validation Validationcommand ="/alice/cern.ch/user/m/mgheata/bin/validate.sh"; #

Merging AOD’s We may want have some strategy of storing AOD’s for a run CAF merging OK Using wrappers from AliAnalysisManager A problem that scales with the size of the train ◦ Filtering + jets ~ 100MB/run (100K pp events) ◦ Merging has to fit memory More parameters in the automatic merging in AliEn ? ◦ To follow the desired granularity depending of the number of job input files and of the AOD size ◦ Problems with TFileMerger ◦ If non-mergeable object in file, trees not merged (extra TProcessId object – put by whom if files were validated?)

# this is the startup process for root Executable="root.sh"; Jobtag={"comment:JET analysis test"}; # we split per storage element Split="se"; # we want each job to read 10 input files SplitMaxInputFileNumber="10"; # this job has to run in the ANALYSIS partition Requirements=( member(other.GridPartitions,"Analysis") ); # we need ROOT and the API service configuration package TTL = "30000"; #ROOT will read this collection file to know, which files to analyze InputDataList="wn.xml"; #ROOT requires the collection file in the xml-single format InputDataListFormat="merge:/alice/cern.ch/user/m/mgheata/test/global.xml"; # this is our collection file containing the files to be analyzed InputDataCollection="LF:/alice/cern.ch/user/m/mgheata/test/global.xml,nodownload"; InputFile= {"LF:/alice/cern.ch/user/m/mgheata/test/runProcess.C", "LF:/alice/cern.ch/user/m/mgheata/test/ESD.par", "LF:/alice/cern.ch/user/m/mgheata/test/AOD.par", "LF:/alice/cern.ch/user/m/mgheata/test/ANALYSIS.par", "LF:/alice/cern.ch/user/m/mgheata/test/JETAN.par", "LF:/alice/cern.ch/user/m/mgheata/test/STEERBase.par", "LF:/alice/cern.ch/user/m/mgheata/test/PWG0base.par", "LF:/alice/cern.ch/user/m/mgheata/test/ConfigJetAnalysis.C", "LF:/alice/cern.ch/user/m/mgheata/test/CreateChain.C"}; # Output archive # Output directory OutputDir="/alice/cern.ch/user/m/mgheata/test/output/#alien_counter#"; # Output files OutputFile={"aod.root","histos.root","jets_local.root"}; # Merge the output Merge = {"aod.root:/alice/jdl/mergerootfile.jdl:aod-merged.root"}; MergeOutputDir={"/alice/cern.ch/user/m/mgheata/test/"}; # Validation Validationcommand ="/alice/cern.ch/user/m/mgheata/bin/validate.sh"; #

What we have learned The system is complex enough… ◦ Testing all changes at a time does not help much in debugging it … but it works ◦ The analysis framework stable in all modes  CAF case working from the beginning ◦ AOD production can be done following a rather simple pattern ◦ Testing in local mode with files from GRID was very useful for debugging

Overview and conclusions Tested a small AOD train based on AliAnalysisManager framework ◦ Several problems came up but the global result OK ◦ Strategy related to job splitting, output merging and validation to be established ◦ Train size may influence.jdl parameters Analysis framework running stable in all modes ◦ Running successfully in local mode is a good hint for what will happen in PROOF or GRID ◦ Using such trains may lead to memory issues that should be understood before running in GRID