Download presentation
Presentation is loading. Please wait.
Published byDonald Thomas Modified over 9 years ago
1
Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon
2
Overview Formats Optimizations Local tests Large scale tests 2
3
Formats data MC data MC egamma JETMET ESDsAODs D3PDs Ball surface ~ event size 3 Sizes just as indication, in reality depends on: Pile-up Stream Repro tag D3PDs not flat tree any more. With additional trees some D3PDs have ~10k branches
4
example 4
5
Root File organization 5 std eventsbaskets doubles floats file We choose to fully split (better compression factor) Baskets are written to file as soon as they get full That makes parts of the same event scattered over the file
6
Tradeoffs Optimizations Options Split level – 99 (full) Zip level - 6 Basket size - 2kb Memberwise streaming Basket reordering by event by branch TTreeCache “New root” – AutoFlush matching TTC size 6 Constrains Memory Disk size Read/write time Read scenarios Full sequential Some events Parts of events Proof
7
Current settings 7 AODs and ESDs 2010 – fixed size baskets (2kB), files re-ordered but basket sizes not optimized at the end of production jobs 2011 until now – All the trees (9 of them) given default 30 MB of memory, basket sizes “optimized”, autoflushed ( if its unzipped size was larger than 30MB ) 17.X.Y : The largest tree “Collection Tree” optimized split level 0 and memberwise streaming ESD/RDO autoflush each 5 events, AOD each 10 events other trees back to 2010 model.
8
Current settings 8 D3PDs 2010 fixed size baskets (2kB) reodered by event basket size optimized properly zip level changed to 6 done in merge step 2011 till now ROOT basket size optimization autoflush at 30 MB No information if re-optimization done or not (need to check!) 17.X.Y not clear yet
9
Local disk performance D3PD 9 When reading all events real time dominated by CPU time Not so for sparse reading Root optimized (file rewritten using hadd –f6) improves in CPU but not in HDD time (!) We are here now 2010
10
D3PD reading Egamma dataset 11 files – 90 GB Tests: 100% 1% TTreeCache ON root optimized 10 Large scale tests
11
EOS – xroot disk pool Experimental 1 setup for a large scale analysis farm Xroot server with 24 nodes each with 20 x 2TB raid0 FS (for this test only 10 nodes were used with maximum theoretical throughput 1GB/s ) To stress it used 23 x 8 cores with ROOT 5.26.0b (slc4, gcc 3.4) Only Proof reading D3PDs tested 11 1 Caveat: real life performance will be significantly worse.
12
EOS – xroot disk pool cont. Here only maximal sustained event rates (real use case averages will be significantly smaller) Original – it would be faster to read all the events even if we would need only 1% Reading full optimized data gave sustained read speed of 550 MB/s 12 Log scale !
13
dCache vs. Lustre Tested in Zeuthen and Hamburg Minimum bias D3PD data HDD read requests 13 Single unoptimized file (Root 5.22, 1k branches of 2kb, CF=1) Single optimized file (Root 5.26, hadd -f2) TTCTest 1Test 2 dCache No 17339440547 Yes44 Lustre No17339440504 Yes19397
14
Conclusions Many possible ways and parameters to optimize data for faster input Different formats and use cases with sometimes conflicting requirements makes optimization more difficult In 2010 we used file reordering and that significantly decreased job duration and stress on the disk systems Currently taken data optimized by ROOT but that may be suboptimal for some D3PDs In need of new performance measurements and search for optimal settings DPM, Lustre, dCache Need careful job specific tuning to reach optimal performance 14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.