Download presentation
Presentation is loading. Please wait.
Published byJodie Walsh Modified over 8 years ago
1
PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011
2
Introduction Caching at T2 using PD2P and Victor works well Have 6 months experience (>3 months with all clouds) Almost zero complaint from users Few operational headaches Some cases of disk full, datasets disappearing… Most issues addressed with incremental improvements like space checking, rebrokering, storage cleanup and consolidation What I propose today should solve remaining issues Many positives No exponential growth in storage use Better use of Tier 2 sites for analysis Next step – PD2P for Tier 1 This is not a choice – but necessity (see Kors’ slides) We should treat part of Tier 1 storage as dynamic cache Feb 4, 2011 Kaushik De 2
3
Life Without ESD New plan – see document and Ueda’s slides Reduction in storage requirement from 27 PB -> ~10 PB for 2011 for data @ 400 Hz (but could be as much as 13 PB) Reduction of 2010 data from 13PB to ~6 PB But we should go farther We are still planning to fill almost all T1 disks with pre-placed data 2010+2011+MC = 6 + 10 + 8 = 24 PB = available space Based on past experience, reality will be tougher, and disk crises will hit us sooner – we should do things differently this time We must trust caching model Feb 4, 2011 Kaushik De 3
4
What can we do? Make some room for dynamic caches For discussion below, do not count T0 copy Use DQ2 tags – custodial/primary/secondary – rigorously Custodial = LHC Data = Tape only (1 copy) Primary = minimal, disk at T1, so we have room for PD2P caching LHC Data primary == RAW (1 copy), AOD, DESD, NTUP (2 copies) MC primary == Evgen, AOD, NTUP (2 copies only) Secondary = copies made by ProdSys (ESD, HITS, RDO), PD2P (all types except RAW, RDO, HITS) and DaTri only Lifetimes – required strictly for all secondary copies (i.e. consider secondary == cached == temporary) Locations – custodial ≠ primary; primary ≠ secondary Deletions – any secondary copy can be deleted by Victor Feb 4, 2011 Kaushik De 4
5
Reality Check Primary copy (according to slide 4) 2010 data ~ 4 PB 2011 data ~ 4.5 PB MC ~ 5 PB Total primary = 14 PB Available space for secondaries > ~10 PB at Tier 1’s Can accommodate additional copies, only if ‘hot’ Can accommodate some ESD’s (expired gracefully after n months) Can accommodate large buffers during reprocessing (new release) Can accommodate better than expected LHC running Can accommodate new physics driven requests Feb 4, 2011 Kaushik De 5
6
Who Makes Replicas? RAW - managed by Santa Claus (no change) 1 copy to TAPE (custodial), 1 copy DISK (primary) at a different T1 First pass processed data – by Santa Claus (no change) Tagged primary/secondary according to slide 4 Secondary will have lifetime (n months) Reprocessed data – by PanDA Tagged primary/secondary according to slide 4, and set lifetime Additional copies made to a different T1 disk, according to MoU share, automatically based on slide 4 (not by AKTR anymore) Additional copies at Tier 1’s – only by PD2P and DaTri Must always set lifetime Note – only PD2P makes copies to Tier 2’s Feb 4, 2011 Kaushik De 6
7
Additional Copies by PD2P Additional copies at Tier 1’s – always tagged secondary If dataset is ‘hot’ (defined on next slide) Use MoU share to decide which Tier 1 gets extra copy Copies at Tier 2’s – always tagged secondary No changes for first copy – keep current algorithm (brokerage), use age requirement if we run into space shortage (see Graeme’s talk) If dataset is ‘hot’ (see next slide) make extra copy Reminder – additional replicas are secondary = temporary by definition, may/will be removed by Victor Feb 4, 2011 Kaushik De 7
8
What is ‘Hot’? ‘Hot’ decides when to make secondary replica Algorithm is based on additive weights w1 + w2 + w3 + wN… > N (tunable threshold) – make extra copy w1 – based on number of waiting jobs nwait/2*nrunning, averaged over all sites Currently disabled due to DB issues – need to re-enable Don’t base on number of reuse – did not work well w2 – inversely based on age Either Graeme’s table, or continuous, normalized to 1 (newest data) w3 – inversely based on number of copies wN – other factors based on experience Feb 4, 2011 Kaushik De 8
9
Where to Send ‘Hot’ Data? Tier 1 site selection Based on MoU share Exclude site if dataset size > 5% (as proposed by Graeme) Exclude site if too many active subscriptions Other tuning based on experience Tier 2 site selection Based on brokerage, as currently Negative weight – based on number of active subscriptions Other tuning based on experience Feb 4, 2011 Kaushik De 9
10
What About Broken Subscriptions? Becoming an issue (see Graeme’s talk) PD2P already sends datasets within a container to different sites to reduce wait time for users But what about datasets which take more than few hours? Simplest solution ProdSys imposes maximum limit on dataset size Possible alternative Cron/PanDA to break up datasets and rebuild container Difficult but also possible solution Use _dis datasets in PD2P Search DQ2 for _dis datasets in brokerage (there will be performance penalty if we use this route) But this is perhaps the most robust solution? Feb 4, 2011 Kaushik De 10
11
Data Deletions will be Very Important Since we are caching everywhere (T1+T2), Victor plays equally important role as PD2P Asynchronously cleanup all caches Trigger based on disk fullness threshold Algorithm based on (age+popularity)&secondary Also automatic deletion of n-2 – by AKTR/Victor Feb 4, 2011 Kaushik De 11
12
How Soon Can we Implement? Before LHC startup! Big load initially on ADC operations to cleanup 2010 data, and to migrate tokens Need some testing/tuning of PD2P before LHC starts So, we need decision on this proposal quickly Feb 4, 2011 Kaushik De 12
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.