Roadmap for Data Management and Caching

Slides:



Advertisements
Similar presentations
Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Computer Organization and Architecture
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Tier-0: Preparations for Run-2 Armin NAIRZ (CERN) ADC Technical Interchange Meeting Chicago, 29 October 2014.
MC, REPROCESSING, TRAINS EXPERIENCE FROM DATA PROCESSING.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
PanDA Summary Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Changes in PD2P replication strategy S. Campana (CERN IT/ES) on.
US ATLAS Computing Operations Kaushik De University of Texas At Arlington U.S. ATLAS Tier 2/Tier 3 Workshop, FNAL March 8, 2010.
EGI-InSPIRE EGI-InSPIRE RI DDM Site Services winter release Fernando H. Barreiro Megino (IT-ES-VOS) ATLAS SW&C Week November
DELETION SERVICE ISSUES ADC Development meeting
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
PD2P The DA Perspective Kaushik De Univ. of Texas at Arlington S&C Week, CERN Nov 30, 2010.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.
EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
LHCbComputing Resources requests : changes since LHCb-PUB (March 2013) m Assume no further reprocessing of Run I data o (In.
Storage Classes report GDB Oct Artem Trunov
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
PanDA Configurator and Network Aware Brokerage Fernando Barreiro Megino, Kaushik De, Tadashi Maeno 14 March 2015, US ATLAS Distributed Facilities Meeting,
PD2P Planning Kaushik De Univ. of Texas at Arlington S&C Week, CERN Dec 2, 2010.
PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.
LHCONE Workshop Richard P Mount February 10, 2014 Concerns from Experiments ATLAS Richard P Mount SLAC National Accelerator Laboratory.
PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
Supporting Analysis Users in U.S. ATLAS
Computing Operations Roadmap
University of Texas At Arlington Louisiana Tech University
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Simone Campana CERN-IT
U.S. ATLAS Tier 2 Computing Center
Data Challenge with the Grid in ATLAS
DCC Workshop Input from Computing Coordination
ATLAS activities in the IT cloud in April 2008
Status and Prospects of The LHC Experiments Computing
The Data Lifetime model
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
PanDA in a Federated Environment
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
The ADC Operations Story
CMS staging from tape Natalia Ratnikova, Fermilab
Evolution of the distributed computing model The case of CMS
Job Processing Database consolidation Task recovery De-cronification
Disk capacities in 2017 and 2018 ALICE Offline week 12/11/2017.
Artem Trunov and EKP team EPK – Uni Karlsruhe
ALICE Computing Upgrade Predrag Buncic
Cloud Computing R&D Proposal
Operating Systems.
ATLAS DC2 & Continuous production
The ATLAS Computing Model
Presentation transcript:

Roadmap for Data Management and Caching Kaushik De Univ. of Texas at Arlington US ATLAS Facility, Boston Mar 7, 2011

Introduction LHC is starting up again Are we ready with computing? First collisions already seen last week Physics data may start flowing in few weeks Optimistic scenario – could be >2 fb-1 this year! Could be the year of discovery! Are we ready with computing? Yes – but we may run out of Tier 1 storage space by summer Jim Shank showed preliminary resource estimate to CB on Friday Severely reduced data distribution plan (compared to 2010) Still need 25.8 PB (total for all tokens) assuming 200 Hz and no ESD Free space right now ~5 PB on DATADISK (need ~17 PB) Tier 2 storage should be ok – after some further PD2P refinements What can we do Need new data management and data distribution plan for Tier 1 Kaushik De Mar 7, 2011

DATADISK Status at Tier 1’s Kaushik De Mar 7, 2011

DATADISK at US Tier 2’s Kaushik De Mar 7, 2011

New DD Proposal from DP Group Jamie Boyd presented DP plan last Tuesday: Jim’s estimate of 25.8 PB needed for Tier 1 disk space is for this plan 1 copy of RAW among all Tier 1 disks (1 more copy on tape) ESD’s will be kept on Tier 1 disk only for ~5 weeks Some special stream ESD’s (~10%) will be kept longer 10 copies of AOD and DESD (basically 1 copy per cloud) Will this work? If we pre-place all data – we run out of space in a few months In addition, this plan assumes deleting almost all current data (about 11 PB more) – but 20-30 publications in the pipeline! Plan assumes 200 HZ – but trigger group wants 400 Hz Some physics groups are worried about ESD deletion after ~5 weeks ADC discussion since Tuesday How to fit all of this within current budget? How to reconcile this plan with Naples (ADC retreat) planning Kaushik De Mar 7, 2011

ADC Plan Still evolving Learn from 2010: Need flexible plan, since usage of formats change with time Plan should adjust automatically Do not fill up all space too early Clean up often Be prepared for 400 Hz RAW ESD AOD DESD T0 Tape 1 1? 0? T0->T1 2 T1->T1 T1 Tape T1 Disk 3 All additional copies at Tier 1 by PD2P All Tier 2 copies made by PD2P Kaushik De Mar 7, 2011

Caching – PD2P Caching at T2 using PD2P and Victor worked well in 2010 Have 6 months experience (>3 months with all clouds) Almost zero complaint from users Few operational headaches Some cases of disk full, datasets disappearing… Most issues addressed with incremental improvements like space checking, rebrokering, storage cleanup and consolidation Many positives No exponential growth in storage use Better use of Tier 2 sites for analysis Next step – PD2P for Tier 1 This is not a choice – but necessity We should treat part of Tier 1 storage as dynamic cache Kaushik De Mar 7, 2011

Advantages of Caching at Tier 1 If we do not fill all space with pre-placed data: We are not in disk crises all the time We can accommodate additional copies of ‘hot’ data We can accommodate some ESD’s (expired gracefully after n months, when no longer needed) We can accommodate large buffers during reprocessing and merging (when new release is available) We can accommodate higher trigger rate 200->300->400 Hz We can accommodate better than expected LHC running We can accommodate new physics driven requests Kaushik De Mar 7, 2011

Caching Requires Some DDM Changes To make room for dynamic caches Use DQ2 tags – custodial/primary/secondary – rigorously Custodial == LHC Data == Tape only Primary = minimal, disk at T1, so we have room for PD2P caching LHC Data primary == RAW (1 copy), AOD, DESD, NTUP (2 copies) ESD (1 copy) with limited lifetime (no lifetime for special ~10%) MC primary == Evgen, AOD, NTUP (2 copies only) Secondary == copies made by ProdSys (i.e. HITS, RDO, unmerged), PD2P and DaTri only Start with 1 additional secondary copy by AKTR Locations – custodial ≠ primary; primary ≠ secondary Deletions – any secondary copy can be deleted by Victor Kaushik De Mar 7, 2011

Additional Copies by PD2P Additional copies at Tier 1’s – always tagged secondary If dataset is ‘hot’ (defined on next slide) Use MoU share to decide which Tier 1 gets extra copy Simultaneously make a copy at Tier 2 Copies at Tier 2’s – always tagged secondary No changes for first copy – keep current algorithm (brokerage), use age requirement if we run into space shortage If dataset is ‘hot’ (see next slide) make extra copy Reminder – additional replicas are secondary == temporary by definition, may/will be removed by Victor Kaushik De Mar 7, 2011

What is ‘Hot’? ‘Hot’ decides when to make secondary replica Algorithm is based on additive weights w1 + w2 + w3 + wN… > N (tunable threshold) – make extra copy w1 – based on number of waiting jobs nwait/2*nrunning, averaged over all sites Currently disabled due to DB issues – need to re-enable Don’t base on number of reuse – did not work well w2 – inversely based on age Table proposed by Graeme, or continuous distribution normalized to 1 (newest data) w3 – inversely based on number of copies wN – other factors based on experience Kaushik De Mar 7, 2011

Where to Send ‘Hot’ Data? Tier 1 site selection Based on MoU share Exclude site if dataset size > 5% (proposed by Graeme) Exclude site if too many active subscriptions Other tuning based on experience Tier 2 site selection Based on brokerage, as currently Negative weight – based on number of active subscriptions Kaushik De Mar 7, 2011

Data Deletions will be Very Important Since we are caching everywhere (T1+T2), Victor plays equally important role as PD2P Asynchronously cleanup all caches Trigger deletion based on disk fullness threshold Deletion algorithm based on (age+popularity)&secondary Also automatic deletion of n-2 – by AKTR/Victor Kaushik De Mar 7, 2011

Other Misc. DDM Items Need zipping of RAW – could buy us factor of 2 in space Cleaning up 2010 data Do not put RAW from tape to disk for 2010 runs But keep some data ESD’s (see DP document)? Move MC ESD’s to tape? MC plan HITS to TAPE only Mark ESD’s as secondary ESD plan (data and MC) First copy, marked as primary, limited lifetime Second copy, marked as secondary Kaushik De Feb 4, 2011

PRODDISK at Tier 1 Setup PRODDISK at Tier 1 Algorithm Allows Tier 1’s to help processing in other clouds Use it for tape buffer (managed by DDM) Algorithm If source is *TAPE, request subscription to local PRODDISK Put jobs in assigned state, wait for callback, use _dis blocks etc (same workflow as used now for Tier 2) Activate jobs when callback comes Increase priority when job is activated, so jobs run quickly Panda will set lifetime for _dis datasets on PRODDISK Lifetime is increased when files are reused from PRODDISK Cleaning via Victor: but different algorithm for PRODDISK (should make this change even for Tier 2 PRODDISK) Clean expired files, or when 80% full (dedicated high priority agent?) Kaushik De Feb 4, 2011

Special Panda Queue for Tape Need to regulate workflow if we use tapes more Setup separate queue in Panda Simplest implementation – use new jobtype, if source is *TAPE Jobs pulled by priority as usual Different base priorities – highest for production, medium for group production, lowest for user event picking (with fair share) We can control this per Tier 1, throttle to balance load, use large queue depth to reduce number of tape mounts Kaushik De Feb 4, 2011

Removing Cloud Boundaries Multi-cloud Tier 2 This will balance load, reduce stuck tasks… Allow some large Tier 2’s to take jobs from many clouds Only after setting up FTS channels and testing Put a list of clouds in schedconfigDB for a site (based on DQ2 topology – manual at first, automated later) PanDA will broker as usual – some jobs will come from each Tier 1 May need to add weight (fraction per cloud) if we see imbalance May even be possible to have Tier 1’s set up like this, to help other clouds by using PRODDISK! Kaushik De Feb 4, 2011

Conclusion As we run short of storage, caching becomes important PD2P for Tier 1’s coming soon Many other tricks in resource limited environment Already started implementing some Implement others gradually We now have a plan to survive 2011 Ready for new data! Kaushik De Mar 7, 2011