Roadmap for Data Management and Caching

Roadmap for Data Management and Caching
Kaushik De Univ. of Texas at Arlington US ATLAS Facility, Boston Mar 7, 2011

Introduction LHC is starting up again Are we ready with computing?
First collisions already seen last week Physics data may start flowing in few weeks Optimistic scenario – could be >2 fb-1 this year! Could be the year of discovery! Are we ready with computing? Yes – but we may run out of Tier 1 storage space by summer Jim Shank showed preliminary resource estimate to CB on Friday Severely reduced data distribution plan (compared to 2010) Still need 25.8 PB (total for all tokens) assuming 200 Hz and no ESD Free space right now ~5 PB on DATADISK (need ~17 PB) Tier 2 storage should be ok – after some further PD2P refinements What can we do Need new data management and data distribution plan for Tier 1 Kaushik De Mar 7, 2011

DATADISK Status at Tier 1’s
Kaushik De Mar 7, 2011

DATADISK at US Tier 2’s Kaushik De Mar 7, 2011

New DD Proposal from DP Group
Jamie Boyd presented DP plan last Tuesday: Jim’s estimate of 25.8 PB needed for Tier 1 disk space is for this plan 1 copy of RAW among all Tier 1 disks (1 more copy on tape) ESD’s will be kept on Tier 1 disk only for ~5 weeks Some special stream ESD’s (~10%) will be kept longer 10 copies of AOD and DESD (basically 1 copy per cloud) Will this work? If we pre-place all data – we run out of space in a few months In addition, this plan assumes deleting almost all current data (about 11 PB more) – but publications in the pipeline! Plan assumes 200 HZ – but trigger group wants 400 Hz Some physics groups are worried about ESD deletion after ~5 weeks ADC discussion since Tuesday How to fit all of this within current budget? How to reconcile this plan with Naples (ADC retreat) planning Kaushik De Mar 7, 2011

ADC Plan Still evolving Learn from 2010:
Need flexible plan, since usage of formats change with time Plan should adjust automatically Do not fill up all space too early Clean up often Be prepared for 400 Hz RAW ESD AOD DESD T0 Tape 1 1? 0? T0->T1 2 T1->T1 T1 Tape T1 Disk 3 All additional copies at Tier 1 by PD2P All Tier 2 copies made by PD2P Kaushik De Mar 7, 2011

Caching – PD2P Caching at T2 using PD2P and Victor worked well in 2010
Have 6 months experience (>3 months with all clouds) Almost zero complaint from users Few operational headaches Some cases of disk full, datasets disappearing… Most issues addressed with incremental improvements like space checking, rebrokering, storage cleanup and consolidation Many positives No exponential growth in storage use Better use of Tier 2 sites for analysis Next step – PD2P for Tier 1 This is not a choice – but necessity We should treat part of Tier 1 storage as dynamic cache Kaushik De Mar 7, 2011

Advantages of Caching at Tier 1
If we do not fill all space with pre-placed data: We are not in disk crises all the time We can accommodate additional copies of ‘hot’ data We can accommodate some ESD’s (expired gracefully after n months, when no longer needed) We can accommodate large buffers during reprocessing and merging (when new release is available) We can accommodate higher trigger rate 200->300->400 Hz We can accommodate better than expected LHC running We can accommodate new physics driven requests Kaushik De Mar 7, 2011

Caching Requires Some DDM Changes
To make room for dynamic caches Use DQ2 tags – custodial/primary/secondary – rigorously Custodial == LHC Data == Tape only Primary = minimal, disk at T1, so we have room for PD2P caching LHC Data primary == RAW (1 copy), AOD, DESD, NTUP (2 copies) ESD (1 copy) with limited lifetime (no lifetime for special ~10%) MC primary == Evgen, AOD, NTUP (2 copies only) Secondary == copies made by ProdSys (i.e. HITS, RDO, unmerged), PD2P and DaTri only Start with 1 additional secondary copy by AKTR Locations – custodial ≠ primary; primary ≠ secondary Deletions – any secondary copy can be deleted by Victor Kaushik De Mar 7, 2011

Additional Copies by PD2P
Additional copies at Tier 1’s – always tagged secondary If dataset is ‘hot’ (defined on next slide) Use MoU share to decide which Tier 1 gets extra copy Simultaneously make a copy at Tier 2 Copies at Tier 2’s – always tagged secondary No changes for first copy – keep current algorithm (brokerage), use age requirement if we run into space shortage If dataset is ‘hot’ (see next slide) make extra copy Reminder – additional replicas are secondary == temporary by definition, may/will be removed by Victor Kaushik De Mar 7, 2011

What is ‘Hot’? ‘Hot’ decides when to make secondary replica
Algorithm is based on additive weights w1 + w2 + w3 + wN… > N (tunable threshold) – make extra copy w1 – based on number of waiting jobs nwait/2*nrunning, averaged over all sites Currently disabled due to DB issues – need to re-enable Don’t base on number of reuse – did not work well w2 – inversely based on age Table proposed by Graeme, or continuous distribution normalized to 1 (newest data) w3 – inversely based on number of copies wN – other factors based on experience Kaushik De Mar 7, 2011

Where to Send ‘Hot’ Data?
Tier 1 site selection Based on MoU share Exclude site if dataset size > 5% (proposed by Graeme) Exclude site if too many active subscriptions Other tuning based on experience Tier 2 site selection Based on brokerage, as currently Negative weight – based on number of active subscriptions Kaushik De Mar 7, 2011

Data Deletions will be Very Important
Since we are caching everywhere (T1+T2), Victor plays equally important role as PD2P Asynchronously cleanup all caches Trigger deletion based on disk fullness threshold Deletion algorithm based on (age+popularity)&secondary Also automatic deletion of n-2 – by AKTR/Victor Kaushik De Mar 7, 2011

Other Misc. DDM Items Need zipping of RAW – could buy us factor of 2 in space Cleaning up 2010 data Do not put RAW from tape to disk for 2010 runs But keep some data ESD’s (see DP document)? Move MC ESD’s to tape? MC plan HITS to TAPE only Mark ESD’s as secondary ESD plan (data and MC) First copy, marked as primary, limited lifetime Second copy, marked as secondary Kaushik De Feb 4, 2011

PRODDISK at Tier 1 Setup PRODDISK at Tier 1 Algorithm
Allows Tier 1’s to help processing in other clouds Use it for tape buffer (managed by DDM) Algorithm If source is *TAPE, request subscription to local PRODDISK Put jobs in assigned state, wait for callback, use _dis blocks etc (same workflow as used now for Tier 2) Activate jobs when callback comes Increase priority when job is activated, so jobs run quickly Panda will set lifetime for _dis datasets on PRODDISK Lifetime is increased when files are reused from PRODDISK Cleaning via Victor: but different algorithm for PRODDISK (should make this change even for Tier 2 PRODDISK) Clean expired files, or when 80% full (dedicated high priority agent?) Kaushik De Feb 4, 2011

Special Panda Queue for Tape
Need to regulate workflow if we use tapes more Setup separate queue in Panda Simplest implementation – use new jobtype, if source is *TAPE Jobs pulled by priority as usual Different base priorities – highest for production, medium for group production, lowest for user event picking (with fair share) We can control this per Tier 1, throttle to balance load, use large queue depth to reduce number of tape mounts Kaushik De Feb 4, 2011

Removing Cloud Boundaries
Multi-cloud Tier 2 This will balance load, reduce stuck tasks… Allow some large Tier 2’s to take jobs from many clouds Only after setting up FTS channels and testing Put a list of clouds in schedconfigDB for a site (based on DQ2 topology – manual at first, automated later) PanDA will broker as usual – some jobs will come from each Tier 1 May need to add weight (fraction per cloud) if we see imbalance May even be possible to have Tier 1’s set up like this, to help other clouds by using PRODDISK! Kaushik De Feb 4, 2011

Conclusion As we run short of storage, caching becomes important
PD2P for Tier 1’s coming soon Many other tricks in resource limited environment Already started implementing some Implement others gradually We now have a plan to survive 2011 Ready for new data! Kaushik De Mar 7, 2011

Roadmap for Data Management and Caching

Similar presentations

Presentation on theme: "Roadmap for Data Management and Caching"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Roadmap for Data Management and Caching

Similar presentations

Presentation on theme: "Roadmap for Data Management and Caching"— Presentation transcript:

Similar presentations

About project

Feedback