Atlas IO improvements and Future prospects

Slides:



Advertisements
Similar presentations
File Systems.
Advertisements

1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
ATLAS Analysis Model. Introduction On Feb 11, 2008 the Analysis Model Forum published a report (D. Costanzo, I. Hinchliffe, S. Menke, ATL- GEN-INT )
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
XAOD persistence considerations: size, versioning, schema evolution Joint US ATLAS Software and Computing / Physics Support Meeting Peter van Gemmeren.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
ROOT I/O recent improvements Bottlenecks and Solutions Rene Brun December
ATLAS Peter van Gemmeren (ANL) for many. xAOD  Replaces AOD and DPD –Single, EDM – no T/P conversion when writing, but versioning Transient EDM is typedef.
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Analysis of the ROOT Persistence I/O Memory Footprint in LHCb Ivan Valenčík Supervisor Markus Frank 19 th September 2012.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
DPDs and Trigger Plans for Derived Physics Data Follow up and trigger specific issues Ricardo Gonçalo and Fabrizio Salvatore RHUL.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,
Analysis trains – Status & experience from operation Mihaela Gheata.
Argonne Jamboree January 2010 Esteban Fullana AOD example analysis.
Optimizing I/O Performance for ESD Analysis Misha Zynovyev, GSI (Darmstadt) ALICE Offline Week, October 28, 2009.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
ROOT I/O: The Fast and Furious CHEP 2010: Taipei, October 19. Philippe Canal/FNAL, Brian Bockelman/Nebraska, René Brun/CERN,
I/O Strategies for Multicore Processing in ATLAS P van Gemmeren 1, S Binet 2, P Calafiura 3, W Lavrijsen 3, D Malon 1 and V Tsulaia 3 on behalf of the.
The “Comparator” Atlfast vs. Full Reco Automated Comparison Chris Collins-Tooth 19 th February 2006.
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Ilija Vukotic Data size and IO performance report ATLAS Software & Computing Workshop.
Max Baak 1 Efficient access to files on Castor / Grid Cern Tutorial Max Baak, CERN 30 October 2008.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
Main parameters of Russian Tier2 for ATLAS (RuTier-2 model) Russia-CERN JWGC meeting A.Minaenko IHEP (Protvino)
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
I/O aspects for parallel event processing frameworks Workshop on Concurrency in the many-Cores Era Peter van Gemmeren (Argonne/ATLAS)
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Peter van Gemmeren (ANL) Persistent Layout Studies Updates.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
ROOT IO workshop What relates to ATLAS. General Both CMS and Art pushing for parallelization at all levels. Not clear why as they are anyhow CPU bound.
Virtual Memory Chapter 8.
AOD example analysis Argonne Jamboree January 2010
Getting the Most out of Scientific Computing Resources
Data Formats and Impact on Federated Access
Getting the Most out of Scientific Computing Resources
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
ALICE experience with ROOT I/O
Event Data Model in ATLAS
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
FileStager test results
Database Management Systems (CS 564)
Multicore Computing in ATLAS
for the Offline and Computing groups
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Grid Canada Testbed using HEP applications
So far… Text RO …. printf() RW link printf Linking, loading
CENG 351 Data Management and File Structures
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Database Systems (資料庫系統)
Presentation transcript:

Atlas IO improvements and Future prospects Wahid Bhimji, Ilija Vukotic Most slides from Ilija…. Wahid Bhimji and Ilija Vukotic

Overview: From past to future PAST (Up until 2010) Unordered ROOT files – horrible I/O -> Basket Ordering / TTreeCache CURRENT AutoFlush / New Root Versions FUTURE Near Far Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic Formats Produced at Tier-0 RAW ESD AOD DESD(M) DAOD(M) D2ESD(M) D2AOD(M) D3ESD(M) D3AOD(M) D2PD D3PD D1PD Reconstruction Reduction All the formats are root files Athena versions utill 15.9.0 used ROOT 5.22 (with a lot of features backported). Now ROOT 5.26. Soon 5.28 D3PDs have completely flat ROOT structure Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic Formats At CHEP last year – so not current but idea of scale data MC egamma JETMET ESDs AODs D3PDs Ball surface ~ event size Format Size [PB] RAW 2 ESD 1.8 AOD 0.1 DESD 0.3 D3PD Wahid Bhimji and Ilija Vukotic

transient/persistent split Transient objects are converted to persistent ones. To store it efficiently data from each sub-detector or algorithm passes a different (sometimes very complex) set of transformations. Converters of complex objects call converters for its members. It provides possibility for schema evolution Example: TracksCollection is composed of 20 different objects which can and do evolve individually Wahid Bhimji and Ilija Vukotic

Root File Organisation: Past -> Present Currently fully split (better compression) Baskets are written to file as soon as they get full - that makes parts of the same event scattered over the file. 2010: re-writing the files with baskets reordered. 2011: Autoflush std events baskets doubles floats file AutoFlush – after first n events to write are collected, baskets are resized in order to have the same number of baskets per branch. In that way all the branches should be “synchronized”. Wahid Bhimji and Ilija Vukotic

The Past: Basket reordering improvements Six ATLAS athena D3PDMaker jobs running on different 2GB AOD files accessing one disk partition. Unordered files show large scatter in reads and hit seek limits. Ordered files is significantly more linear and so seek counts reduced. Wahid Bhimji and Ilija Vukotic

The past: Basket ordering - remote reading example ROOT test on these AODs, output from TTreePerfStats. GPFS running on same site as DPM. Ordering makes a much bigger impact. Unordered - DPM (Rfio) Disk Time 1500s Wall Clock 1700s‘ Unordered - GPFS Disk time 100s Wall Clock 230s Ordered - DPM (Rfio) Disk time 20s Wall Clock 160s Wahid Bhimji and Ilija Vukotic

The past: TTreeCache also showed an improvement … Varies by site storage type Still site settings like read-ahead buffers can make big difference Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic The present: Current running Full split, autoflush at 30MB for the largest tree. Other trees with fixed basket size. ROOT 5.26/00e Format branches Event size Zip mem level factor ESD 12799 1272.34 6 3.40 34.90 AOD 8231 177.73 3.38 33.44 DPD - EGAMMA 2500 23.96 1 2.86 41.09 DPD – PHOTON 1490 13.94 2.53 28.57 DPD – JETMET 5671 90.12 2.88 28.39 DPD - SUSY 2132 14.95 3.65 28.61 Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic The very near future (17.0.1) Reduced buffer size To improve read time when sparse read of events we plan to change default (30MB) to memory equivalent of 10 events (AOD) and 5 events (RDO,ESD) Split 0 Drastically reduces number of baskets – but not to 300ish* Memberwise streaming Reduce of zip level Current optimums are 4 (ESD) and 5 (AOD) ROOT 5.28/00b Format branches Event size [kb] Zip Mem [MB] level factor ESD 807* 1338.63 3 3.56 19.88 AOD 658* 191.67 3.48 5.52 8% 5% * Two very large and fast containers remain fully split for size benefits. Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic ESD Baskets per branch No “synchronization” effect observed! Should see peaks in baskets per branch And basket sizes not optimal cannot e.g. allocate 771 bytes Baskets per branch Wahid Bhimji and Ilija Vukotic

Athena reading – CPU only Format Root read speed [MB/s] Athena reading Full* [MB/s] Speed [MB/s] Time [ms] ESD 13.05 3.41 5.93 192.45 ESD 17.0.1 19.71 3.85 7.27 161.00 AOD 4.57 2.85 3.15 43.00 AOD 17.0.1 10.32 4.91 6.42 25.86 Two effects folded Improvements in TP converters New ROOT version * Includes time to recreate certain objects (ie. CaloTowers) Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic Present AOD X-axis: Collections ordered in size, so left are largest Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic Near Future: AOD Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic Real time[s] CPU time[s] HDD reads Transferred [MB] HDD time [s] Egamma 100 93.32 87.67 5060 254 11 10 22.46 17.06 5428 1 14.31 8.74 4987 237 JetMEt 193.61 163.34 31354 1378 63 100.31 62.37 37579 1370 72 65.15 34.52 26309 922 57 Susy 19.86 17.08 3149 157 5 11.32 7.74 3532 6 10.05 6.25 3687 156 Photon 17.52 14.55 3613 216 11.37 7.81 3705 7 10.34 6.93 3770 D3PD WALL CLOCK Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic D3PD 1% events In this case TTC doesn’t learn as first 100 events not read! 10% events Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic Near Future Basket optimization needs to be fixed (or go back to basket reordering) Need to retest all the read/write scenarios for 17.0.1 Re-establishing automated IO tests Local disk /LAN WAN? – Not really tried at all – unlike CMS. HammerCloud – Regular test job sent to all sites Measuring efficiencies for real jobs. Wahid Bhimji and Ilija Vukotic

Wahid Bhimji and Ilija Vukotic Further future What can be done to optimise collections: Further improve P-T converters ? Reduce complexity of objects e.g. “AllCells” collection that is a vector of ints. What can be done to improve ROOT speeds? Many potential improvements in user analysis / D3PD reading codes. Wahid Bhimji and Ilija Vukotic