ALICE experience with ROOT I/O

Slides:



Advertisements
Similar presentations
H.G.Essel: Go4 - J. Adamczewski, M. Al-Turany, D. Bertini, H.G.Essel, S.Linev CHEP 2004 Go4 v2.8 Analysis Design.
Advertisements

5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
Texan Streamline Processing (TSP) Eliminating the intermediate step in converting Texan raw data to standard SEG-Y. Steven Harder Presented at the Active.
– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Quality Control B. von Haller 8th June 2015 CERN.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
ATLAS Peter van Gemmeren (ANL) for many. xAOD  Replaces AOD and DPD –Single, EDM – no T/P conversion when writing, but versioning Transient EDM is typedef.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,
PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.
Analysis trains – Status & experience from operation Mihaela Gheata.
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
Planning and status of the Full Dress Rehearsal Latchezar Betev ALICE Offline week, Oct.12, 2007.
ALICE analysis framework References for Analysis Tools used to the ALICE simulated data.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
Optimizing I/O Performance for ESD Analysis Misha Zynovyev, GSI (Darmstadt) ALICE Offline Week, October 28, 2009.
ALICE Operations short summary ALICE Offline week June 15, 2012.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Features needed in the “final” AliRoot release P.Hristov 26/10/2006.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Big picture: What’re the sub-topics of the software framework? What’s the relationship of them? How to arrange data pipe? 1.
Offline Weekly Meeting, 24th April 2009 C. Cheshkov & C. Zampolli.
Improving the analysis performance Andrei Gheata ALICE offline week 7 Nov 2013.
Analysis efficiency Andrei Gheata ALICE offline week 03 October 2012.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
AliRoot survey: Calibration P.Hristov 11/06/2013.
V4-19-Release P. Hristov 11/10/ Not ready (27/09/10) #73618 Problems in the minimum bias PbPb MC production at 2.76 TeV #72642 EMCAL: Modifications.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
CALIBRATION: PREPARATION FOR RUN2 ALICE Offline Week, 25 June 2014 C. Zampolli.
ROOT IO workshop What relates to ATLAS. General Both CMS and Art pushing for parallelization at all levels. Not clear why as they are anyhow CPU bound.
Data Formats and Impact on Federated Access
Atlas IO improvements and Future prospects
Understanding Operating Systems Seventh Edition
Analysis trains – Status & experience from operation
Diskpool and cloud storage benchmarks used in IT-DSS
Analysis tools in ALICE
Workshop Computing Models status and perspectives
Lecture 16: Data Storage Wednesday, November 6, 2006.
ALICE analysis preservation
Operations in 2012 and plans for the LS1
ALICE Computing : 2012 operation & future plans
Experience in ALICE – Analysis Framework and Train
Commissioning of the ALICE HLT, TPC and PHOS systems
Progress with MUON reconstruction
Simulation use cases for T2 in ALICE
Analysis framework - status
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Performance optimizations for distributed analysis in ALICE
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Offline framework for conditions data
Presentation transcript:

ALICE experience with ROOT I/O Andrei Gheata ROOT I/O Worksop and Future Plans 25 June 2014

Outline The I/O issues in ALICE Analysis and ROOT I/O A.Gheata, ROOT I/O Workshop Outline The I/O issues in ALICE Analysis and ROOT I/O Tree caching and prefetching Flattening the AOD structure

I/O issues in ALICE AliRoot framework using ROOT as foundation A.Gheata, ROOT I/O Workshop I/O issues in ALICE AliRoot framework using ROOT as foundation Entire I/O based on ROOT: raw data, simulation, reconstruction, analysis Complex detector with 18 subsystems Large event size, dominated by TPC contribution Up to 5-8 MB Pb-Pb uncompressed I/O “contributors” Raw data reconstruction: raw data as input (local disk), reconstructed ESD events as output (local disk) Simulation: no input data, hits&digits + kinematics as output (local disk) Filtering/skimming: ESD as input (LAN/WAN), (nano)AOD as output (local disk) Analysis: all event formats (ESD, (nano)AOD) as input (LAN/WAN), histograms or other intermediate formats as output (local disk)

ALICE data flow ONLINE OFFLINE ALIEN + FILE CATALOG OCDB DETECTOR ALGORITHMS RAW ROOTIFY CALIBRATION RECONSTRUCTION FILTERING QUALITY ASSURANCE ANALYSIS OCDB Calibration data AODB (analysis database) SHUTTLE Detector algorithms ALIEN + FILE CATALOG results.root AliAOD.root RAW.root AliESDs.root

Event size by collision type Year RAW size [kB/ev] ESD size* AOD size* RAW/ESD ESD/AOD p-p 2010 550 62 10.5 8.8 6.0 2011 500 61 6.7 8.2 9.8 2012 1820 113 9.6 16.1 11.8 Pb-Pb 11380 1710 365 4.7 5490 4070 1800 1.35 2.3 p-Pb 2013 612 271 60 2.6 4.5 1660 1640 379 1.0 4.3 * Compressed data with compression factor ~4-5

A.Gheata, ROOT I/O Workshop Analysis and ROOT I/O Most I/O issues become visible in distributed analysis Big input data, not necessarily matching CPU workload Combined LAN/WAN read access Workload management policy enforcing locality as much as possible, WAN/LAN access ratio evolved from ~20-30% to less than 10% average Several directions to tackle the I/O problem Combine analysis in trains Restrict the number of branches to the minimum needed Filter to more compact formats (nanoAOD’s) Enable ROOT I/O helpers: tree caching & prefetching Flatten the AOD format

Asynchronous prefetching A.Gheata, ROOT I/O Workshop Asynchronous prefetching Elvin Selaru, ROOT Users Workshop, Saas-Fee 2013

Caching and prefetching in analysis A.Gheata, ROOT I/O Workshop Caching and prefetching in analysis ALICE analysis built on top of TChain::Process(TSelector) We can use ROOT built-in functionality to enhance read performance Caching enabled by default in ALICE analysis framework Adjustable cache size Most frequent use case for trains: sequential read, all branches enabled Testing prefetching: WAN read AOD: ~300MB read from Torino::SE (RTT ~22ms) Analysis task reading in all events and emulating CPU per track Plotting “useful” analysis CPU versus real time for the job No caching/prefetching TTreeCache enabled Prefetching enabled

Prefetching exercise I/O bound CPU bound no caching caching A.Gheata, ROOT I/O Workshop Prefetching exercise I/O bound CPU bound no caching caching prefetching

Prefetching gain 10-15% in I/O bound regime, less than 5% if CPU bound A.Gheata, ROOT I/O Workshop Prefetching gain 10-15% in I/O bound regime, less than 5% if CPU bound

A.Gheata, ROOT I/O Workshop Flattening AOD’s Current AOD event structure: more than 400 branches, deep object hierarchy Format highly dependent on AliRoot types How costly is hierarchy, can we get benefits from flattening the structure Better compression Analysis less dependent on AliRoot Faster I/O Structure more adapted for vectorization

A.Gheata, ROOT I/O Workshop Flat AOD exercise Use AODtree->MakeClass() to generate a skeleton, then rework Keep all AOD info, but restructure the format Int_t fTracks.fDetPid.fTRDncls -> Int_t *fTracks_fDetPid_fTRDncls; //[ntracks_] More complex cases to support I/O: typedef struct { Double32_t x[10]; } vec10dbl32; Double32_t fTracks.fPID[10] -> vec10dbl32 *fTracks_fPID; //[ntracks_] Double32_t *fV0s.fPx //[fNprongs] -> TArrayF *fV0s.fPx; //[nv0s_] Convert AliAODEvent-> FlatEvent Try to keep the full content AND size Write FlatEvent on file Compare compression, file size and read speed

Results Tested on AOD PbPb: AliAOD.root Reading speed ****************************************************************************** *Tree :aodTree : AliAOD tree * *Entries : 2327 : Total = 2761263710 bytes File Size = 660491257 * * : : Tree compression factor = 4.18 * *Tree :AliAODFlat: Flattened AliAODEvent * *Entries : 2327 : Total = 2248164303 bytes File Size = 385263726 * * : : Tree compression factor = 5.84 * Data smaller (no TObject overhead, TRef->int) 30% better compression Reading speed Old: CPU time= 103s , Real time=120s New: CPU time= 54s , Real time= 64s

Implications User analysis more simple, working mostly with basic types (besides the event) Simplified access to data, highly reducing number of (virtual) calls ROOT-only analysis Backward incompatible, but migration from old to new format possible Sacrificing performance as first step (built-in transient event converter) Much better vectorizable track loops This approach is now taken seriously for Run2 and Run3

A.Gheata, ROOT I/O Workshop Some concerns… Few things that I hope will be covered by the discussions Asynchronous de-serialization of different entries in trees/chains In local processing, this is a serial step overwhelming disk reading Thread safe concurrent containers as replacement for existing ones Including I/O friendly ones C++11 comes at rescue… Re-design of thread safety in handling critical data structures related to I/O Like histograms and trees, avoiding TThread::Lock