Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.

Slides:



Advertisements
Similar presentations
Inference without the Engine!. What is EZ-Xpert 3.0? EZ-Xpert is a Rapid Application Development (RAD) environment for creating fast and accurate rule-based.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
H.G.Essel: Go4 - J. Adamczewski, M. Al-Turany, D. Bertini, H.G.Essel, S.Linev CHEP 2004 Go4 v2.8 Analysis Design.
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
Combined tracking based on MIP. Proposal Marian Ivanov.
ALICE analysis at GSI (and FZK) Kilian Schwarz CHEP 07.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
CS 104 Introduction to Computer Science and Graphics Problems
June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
Chiara Zampolli in collaboration with C. Cheshkov, A. Dainese ALICE Offline Week Feb 2009C. Zampolli 1.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
Experience with analysis of TPC data Marian Ivanov.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Recent Software Issues L3 Review of SM Software, 28 Oct Recent Software Issues Occasional runs had large numbers of single-event files. INIT message.
ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis.
A Technical Validation Module for the offline Auger-Lecce, 17 September 2009  Design  The SValidStore Module  Example  Scripting  Status.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Off-line and Detector Database Kopenhagen TPC meeting A.Sandoval.
Infrastructure for QA and automatic trending F. Bellini, M. Germain ALICE Offline Week, 19 th November 2014.
PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Analysis trains – Status & experience from operation Mihaela Gheata.
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
ALICE Offline Week, CERN, Andrea Dainese 1 Primary vertex with TPC-only tracks Andrea Dainese INFN Legnaro Motivation: TPC stand-alone analyses.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
Status of global tracking and plans for Run2 (for TPC related tasks see Marian’s presentation) 1 R.Shahoyan, 19/03/14.
Statistical feature extraction, calibration and numerical debugging Marian Ivanov.
Alberto Colla - CERN ALICE off-line week 1 Alberto Colla ALICE off-line week Cern, May 31, 2005 Table of contents: ● Summary of requirements ● Description.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Computing for Alice at GSI (Proposal) (Marian Ivanov)
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
ALICE Offline Week October 4 th 2006 Silvia Arcelli & Chiara Zampolli TOF Online Calibration - Strategy - TOF Detector Algorithm - TOF Preprocessor.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
Quality assurance for TPC. Quality assurance ● Process: ● Detect the problems ● Define, what is the problem ● What do we expect? ● Defined in the TDR.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
1 Reconstruction tasks R.Shahoyan, 25/06/ Including TRD into track fit (JIRA PWGPP-1))  JIRA PWGPP-2: Code is in the release, need to switch setting.
Calibration algorithm and detector monitoring - TPC Marian Ivanov.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
Some topics for discussion 31/03/2016 P. Hristov 1.
AliRoot survey: Calibration P.Hristov 11/06/2013.
V4-19-Release P. Hristov 11/10/ Not ready (27/09/10) #73618 Problems in the minimum bias PbPb MC production at 2.76 TeV #72642 EMCAL: Modifications.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
CALIBRATION: PREPARATION FOR RUN2 ALICE Offline Week, 25 June 2014 C. Zampolli.
1 14th June 2012 CPass0/CPass1 status and development.
V4-18-Release P. Hristov 21/06/2010.
Marian Ivanov, Anar Manafov
Jacek Otwinowski (Data Preparation Group)
Software Architecture in Practice
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Status of the Analysis Task Force
News on the CDB Framework
v4-18-Release: really the last revision!
Experience in ALICE – Analysis Framework and Train
Jacek Otwinowski (for the DPG QA tools and WP7 groups)
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Analysis framework - status
QA tools – introduction and summary of activities
PROOF - Parallel ROOT Facility
Use Of GAUDI framework in Online Environment
Presentation transcript:

Analysis experience at GSIAF Marian Ivanov

HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical algorithms needs continuous algorithm refinement cycles ● The refinement cycles should be (optimally) on the seconds, minutes level ● Using the parallelism is the only way to analyze HEP data in reasonable time

Data analysis (calibration, alignment) ● Where do we run? ● DAQ farm (calibration) ● HLT farm (calibration, alignment) ● Prompt data processing (calib, align, reco, analysis) with PROOF ● Batch Analysis on the Grid infrastructure

Tuning of statistical algorithms ● For our analysis it is often not enough to study just histograms. – Deeper understanding of correlations between variables is needed. To study correlations on large statistic the Root TTrees can be used. – In case of non trivial processing algorithm, intermediate steps might be needed. – It should be possible to analyze (debug) the intermediate results of such non trivial algorithm ● Process function: 1)Preprocess data (can be CPU expensive, e.g track refitting) ● optionally store the preprocessed data in TTrees 2)Histogram and/or fit and/or fill matrices with preprocessed data

Component model ● Algorithmic part of our analysis and calibration software should be independent of the running environment ● Example: TPC calibration classes (components)(running, tuning Offline, used in HLT, DAQ and Offline) ● Analysis and calibration code should be written following a component based model ● TSelector (for PROOF) and AliAnalysisTask (see presentation of Andreas Morsch) – just simple wrapper

Components ● Basic functionality: ● Process(...) ● process your input data e.g. Track ● Merge() ● merge component ● Analyze() ● analyze the preprocessed data (e.g histograms, matrices, TLinearFitters) ● To enable merging of the information from the slaves the component has to be fully streamable

Example component ● class AliTPCcalibTracks : public TNamed {..... – virtual void Process(AliTPCseed * seed); – void Merge(TCollection *); – void Analyze() – // histograms, Fitters, arrays of histograms, fitter, matrices.... – TObjArray *fArrayQDY; // q binned delta Y histograms – TObjArray *fArrayQRMSZ; // q binned delta Z histograms – TlinearFitter *fFitterXXX; – }

Example selector wrapper ● User defined light Selector derives from the base selector AliTPCSelectorESD ● class AliTPCSelectorTracks : public AliTPCSelectorESD {.... ● AliTPCcalibTracks *fCalibTracks; ● AliTPCcalibTracksGain *fCalibTracksGain; – } – Bool_t AliTPCSelectorTracks::ProcessIn(Long64_t entry) ● fCalibTracks->Process(seed, esd); ● fCalibTracks->Process(seed) ●

AliTPCSelectorESD ● Additional functionality on top of Tselector implemented – Takes care of the data input – Stores the system information about user process (memory, cpu usage versus time, user stamps) in syswatch.log files- simple visualization using the TTree::Draw ● Optionally memory checker can be enabled – Store the intermediate results in common space for further algorithm refinement (if requested) ● The TProofFile/TFileMerger mechanism to handle file resident trees in the future – Is it possible to use it also for local analysis ? ● Another possible solution – can we use the schema of Alien? – 2"};

Assumptions – Data volume to process - accessibility ● Alice - pp event ● ESD size ~ 0.03 Mby/ev ● ESDfriend ~ 0.45 Mby/ev ● (0.5 Tby-5 TBy) per 10^6 events - (no overlaps – 10 overlapped events) ● ESD (friends) Raw data (zero suppressed) – Local ~ 10^5-10^6 pp - 10^4 pp – Batch ~ 10^6-10^7 pp - 10^5 pp – Proof ~ 10^6-10^7 pp - – Grid >10^7 pp - 10^6 pp

Software development – Write component – Software validation - Sequence: ● Batch system (second filter) ● Memory consumption – valgrind, memstat ● CPU profiling ● Output – same as local, but on bigger statistic ● PROOF ● For rapid development – fast user feedback ● Iterative improvement of algorithms, selection criteria... ● Processing on bigger statistic ● Be ready for GRID/ALIEN ● Processing on bigger statistic ● Local environment (first filter) ● Stability – debugger ● Memory consumption – valgrind, memstat (root) ● CPU profiling - callgrind, vtune ● Output – rough, quantitative – if possible

Proof experience (0) ● It is impossible to write code without bugs ● Only privileged users can use debugger directly on the Proof slaves and/or master ● For normal users the code has to be debugged locally ● ==> It would be nice if the code running locally and on the PROOF could be the same – It is (almost) the case now – Input lists? TProofFile ?

Proof experience (1) ● Debugging on PROOF is not trivial ● We tried dumping important system information into a special log file (memory, cpu...). Analyzing these TTrees really helps to understand the processing problems.

Proof experience (2) ● In our approach the users can generate files with preprocessed information ● In order to allow users to control those files we had to create a way to interact with XRD in a “file system manner” – AliXRDProofToolkit: ● Generate the list of files - similar to find command ● Check the consistency of the data from the list - reject corrupted files. Corrupted files are one of our biggest problems, as the network at GSI is less stable than in Cern. ● process log files

Conclusion ● PROOF is easy to use and well suited for our needs ● We have observed some problems, but they are usually fixed fast ● We use it successfully for development of calibration components ● Further development of tools to simplify debugging on PROOF is very welcome.