Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.
A. Frank - P. Weisberg Operating Systems Evolution of Operating Systems.
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
XAOD persistence considerations: size, versioning, schema evolution Joint US ATLAS Software and Computing / Physics Support Meeting Peter van Gemmeren.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
ATLAS Peter van Gemmeren (ANL) for many. xAOD  Replaces AOD and DPD –Single, EDM – no T/P conversion when writing, but versioning Transient EDM is typedef.
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Analysis of the ROOT Persistence I/O Memory Footprint in LHCb Ivan Valenčík Supervisor Markus Frank 19 th September 2012.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,
Report on CHEP ‘06 David Lawrence. Conference had many participants, but was clearly dominated by LHC LHC has 4 major experiments: ALICE, ATLAS, CMS,
PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Kyle Cranmer (BNL)HCP, Isola d’Elba, March 23, The ATLAS Analysis Architecture Kyle Cranmer Brookhaven National Lab.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
ROOT I/O: The Fast and Furious CHEP 2010: Taipei, October 19. Philippe Canal/FNAL, Brian Bockelman/Nebraska, René Brun/CERN,
I/O Strategies for Multicore Processing in ATLAS P van Gemmeren 1, S Binet 2, P Calafiura 3, W Lavrijsen 3, D Malon 1 and V Tsulaia 3 on behalf of the.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Max Baak 1 Efficient access to files on Castor / Grid Cern Tutorial Max Baak, CERN 30 October 2008.
ELSSISuite Services QIZHI ZHANG Argonne National Laboratory on behalf of the TAG developers group ATLAS Software and Computing Week, 4~8 April, 2011.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
I/O aspects for parallel event processing frameworks Workshop on Concurrency in the many-Cores Era Peter van Gemmeren (Argonne/ATLAS)
I/O and Metadata Jack Cranshaw Argonne National Laboratory November 9, ATLAS Core Software TIM.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
Mini-Workshop on multi-core joint project Peter van Gemmeren (ANL) I/O challenges for HEP applications on multi-core processors An ATLAS Perspective.
Multi Process I/O Peter Van Gemmeren (Argonne National Laboratory (US))
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
Peter van Gemmeren (ANL) Persistent Layout Studies Updates.
ROOT IO workshop What relates to ATLAS. General Both CMS and Art pushing for parallelization at all levels. Not clear why as they are anyhow CPU bound.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Data Formats and Impact on Federated Access
Atlas IO improvements and Future prospects
Solid State Disks Testing with PROOF
FileStager test results
Multicore Computing in ATLAS
Ákos Frohner EGEE'08 September 2008
Presentation transcript:

Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012

 ATLAS analysis workflows and ROOT usage  Optimisations of ATLAS data formats “AOD/ESD” and “D3PD”  An ATLAS ROOT I/O testing framework 2 ATLAS ROOT-based data formats - CHEP 2012

RAW ESD AOD dESD dAO D D3PD Reco Reduce Analysis Bytestream Not ROOT AOD/ESD ROOT with POOL Persistency D3PD ROOT ntuples with only primitive types and vectors of those Athena Software Framework Non-Athena User Code (standard tools and examples) 3 Some Simplification! User NtuplesUser Ntuples User NtuplesUser Ntuples TAG Analysis ATLAS ROOT-based data formats - CHEP 2012

 User analysis here called “analysis” as opposed to “production” is: Not centrally organised Heavy on I/O – all ATLAS analysis is ROOT I/O By no. of jobs, Analysis =55% By wallclock time, Analysis=22% 4 ATLAS ROOT-based data formats - CHEP 2012

POOL:  AOD/ESD use ROOT I/O via the POOL persistency framework.  Could use other technologies for object streaming into files but in practice ROOT is the only supported. ROOT versions in Athena s/w releases:  2011 data (Tier 0 processing) : ROOT 5.26  2011 data (Reprocessing): ROOT 5.28  2012 data : ROOT ATLAS ROOT-based data formats - CHEP 2012

Writing files:  Split level: object data members placed in separate branches  Initial 2011 running: AOD /ESD fully-split (99) into primitive data  From 2011 use ROOT “autoflush” and “optimize baskets”  Baskets (buffers) resized so they have similar number of entries  Flushed to disk automatically once a certain amount of data is buffered to create a cluster that can be read back in a single read  Initial 2011 running we used the default 30 MB  Also use member-wise streaming  Data members of object stored together on disk Reading files:  There is a memory buffer TTreeCache (TTC) which learns used branches and then pre-fetches them  Used by default for AOD->D3PD in Athena  For user code its up to them See e.g. P. Canal’s talk at CHEP10 6 ATLAS ROOT-based data formats - CHEP 2012

7

 ATLAS Athena processes event by event – no partial object retrieval  Previous layout (fully-split, 30 MB AutoFlush): many branches and many events per basket  Non–optimal particularly for event picking: E.g. selecting with TAG:  Using event metadata: e.g. on trigger, event or object  No payload data is retrieved for unselected events  Can make overall analysis much faster (despite slower data rate) Also multi-processor AthenaMP framework:  Multiple workers, each read a non-sequential part of input 8 ATLAS ROOT-based data formats - CHEP 2012

 Switched splitting off but kept member-wise streaming Each collection stored in a single basket Except for largest container (to avoid file size increase)  Number of baskets reduced from ~10,000 to ~800, Increases average size by more than x10 Lowers the number of reads  Write fewer events per basket in optimisation: ESD flush every 5 events AOD every 10 events  Less data needed if selecting events when reading 9 ATLAS ROOT-based data formats - CHEP 2012

 Reading all events is ~30% faster  Selective reading (1%) using TAGs: 4-5 times faster AOD LayoutReading all eventsSelective 1% read OLD: Fully split, 30 MB Auto-flush 55 (±3) ms/ev.270 ms /ev. CURRENT: No split, 10 event Auto-flush 35 (±2) ms /ev.60 ms/ev. 10 Repeated local disk read; Controlled environment; Cache cleaned ATLAS ROOT-based data formats - CHEP 2012

 File Size is very similar in old and current format.  Virtual Memory foot print reduced by about MB for writing and reading: Fewer baskets loaded into memory.  Write speed increased by about 20%. The write speed was increased even further (to almost 50%), as the compression level was relaxed. New scheme used for autumn 2011 reprocessing Athena AOD read speed (including Transient/Persistent conversion and reconstruction of some objects) ~ 5 MB/s from ~3 MB/s in original processing (including ROOT 5.26 to 5.28 as well as layout change) 11 ATLAS ROOT-based data formats - CHEP 2012

 ROOT I/O changes affect different storage systems on the Grid differently E.g. TTC with Rfio/DPM needed some fixes  Also seen cases where AutoFlush and TTC don’t reduce HDD reads/time as expected Need regular tests on all systems used (in addition to controlled local tests) to avoid I/O “traps”  Also now have a ROOT IO group well attended by ROOT developers ; ATLAS; CMS and LHCbROOT IO group Coming up with a rewritten basket optimization We promised to test any developments rapidly 12 ATLAS ROOT-based data formats - CHEP 2012

Using hammercloud:hammercloud Takes our tests from SVN Runs on all large Tier 2s Highly instrumented  ROOT (e.g. reads; bytes);  WN (traffic; load; cpu type);  storage type etc. 13 Hammercloud Oracle Db SVN Define Code; Release; Dataset;… SVN Define Code; Release; Dataset;… Uploads stats Regularly submitting single tests Sites Data mining tools Command line, Web interface, ROOT scripts Data mining tools Command line, Web interface, ROOT scripts ROOT source (via curl) datase t Extremely fast feedback: a.m.: New feature to test p.m.: Answers for all storage systems in the world. Identical dataset – pushed to all sites ATLAS ROOT-based data formats - CHEP 2012

1. ROOT based reading of D3PD: Provides metrics from ROOT (no. of reads/ read speed) Like a user D3PD analysis Reading all branches and 100% or 10% events (at random); Reading limited 2% branches (those used in a real Higgs analysis) 2. Using different ROOT versions Latest Athena Releases. Using 5.32 (not yet in Athena) Using trunk of ROOT 3. Athena D3PD making from AOD 4. Instrumented user code examples 5. Wide-Area-Network Tests 14 ATLAS ROOT-based data formats - CHEP 2012

15 ATLAS ROOT-based data formats - CHEP 2012

ROOT 5.32 (red) again no big change on average over all sites 100% read, TTC on (30MB) 16 ROOT 5.28 (Athena ) ROOT 5.30 (Athena ) Tracking change to ROOT 5.30 in Athena – no significant changes in wall times for D3PD read on any storage system Walltime (s) ATLAS ROOT-based data formats - CHEP 2012

17 Total D3PDs More jobs Top and SM groups have most jobs AOD: 46%  “pathena” jobs running in framework Eg. Private Simulation ; AOD Analysis  “prun” jobs Whatever user wants! Mostly D3PD analysis Optimisation of D3PD analysis becoming increasingly important ATLAS ROOT-based data formats - CHEP 2012

 Rewrite D3PD files: Using ROOT 5.30 – current Athena release  Try different zip levels, current default 1: Local testing suggested “6” more optimal (in read speed) so copied this to all sites Zip 6 files are ~5% smaller so also important gains in disk space and copy times  Change autoflush setting, currently 30MB: Try extreme values of 3 and 300 MB  Showing results here for a few example sites Labeled by the storage system they run But use different hardware so this is not a measure of storage system or site performance 18 ATLAS ROOT-based data formats - CHEP 2012

19 Zip 6 at least as good at 1 so default changed 3 MB auto flush not good! ATLAS ROOT-based data formats - CHEP 2012

 TTreeCache essential at some sites  Users still don’t set it  Different optimal values per site  Ability to set in job environment would be useful MB TTC No TTC ATLAS ROOT-based data formats - CHEP 2012

Systems using vector reading protocol (dCap; xrootd) still have high eff. Drop in CPU efficiency on other storage systems 21 ATLAS ROOT-based data formats - CHEP 2012 TTreeCache: 300 MB

22 CPU Eff. 100%- 50%- 0%- Ping time Running on various US sites Local read Reading from other US sites Reading from CERN ATLAS ROOT-based data formats - CHEP 2012

 First measurements …. not bad rates 94% local read eff. drops to 60%-80% for other US sites and around 45% for reading from CERN  Offers promise for this kind of running if needed Plan to use such measurements for scheduling decisions 23 ATLAS ROOT-based data formats - CHEP 2012

Made performance improvements in ATLAS ROOT I/O Built I/O Testing framework: for monitoring and tuning Made performance improvements in ATLAS ROOT I/O Built I/O Testing framework: for monitoring and tuning 24 Plan to do: Lots more mining of our performance data Test and develop core ROOT I/O with working group: Basket Optimisation Asynchronous Prefetching Provide sensible defaults for user analysis Further develop WAN reading Site tuning New I/O strategies for multicore (see next talk!) Plan to do: Lots more mining of our performance data Test and develop core ROOT I/O with working group: Basket Optimisation Asynchronous Prefetching Provide sensible defaults for user analysis Further develop WAN reading Site tuning New I/O strategies for multicore (see next talk!) ATLAS ROOT-based data formats - CHEP 2012