Download presentation
Presentation is loading. Please wait.
Published byOsborne Holland Modified over 9 years ago
1
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
2
From Theory to Reality The original computing model: 10 copies of AODs at T1s 10 copies of AODs at T2s For AODs only, 13 PB of data 13PB x 20 = 260 PB + NTUP + RAW + ESDs +... x 2 (for the simulated data) So we would be running at Hexa-Scale. Cool. Unfortunately ATLAS has O(10) less space.. Managing space is a nightmare Physicists do not like cancelling data (even if they are useless) 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 2
3
ATLAS space occupancy 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 3
4
The Dynamic Data Placement model A minimal replication is made a priory Like in computing model, but many less replicas More replicas are created dynamically As jobs demand a particular type of data Unused replicas are cleaned automatically Guaranteeing the existence of a minimal amount of replicas 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 4
5
ATLAS dataset replicas classes 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 5
6
5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 6
7
Distributed Data Management System 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 7 DDM clients Common modular framework Repository Content Accounting Location Subscription Data usage Oracle Database Central Catalogues Site Services Transfer Consistency Deletion Replica Reduction WLCG Open Science Grid European Grid Initiative NorduGrid DDM Production system Analysis system Data placement End-users Tier 0
8
Panda Dynamic Data Placement: PD2P Two algorithms: one for Tier-1 and the other for Tier-2 Tier-1 sites are used as data repository while Tier-2 sites are used more for execution of analysis jobs PD2P considers only official datasets Users can submit jobs with private data but those data are not replicated by PD2P Replication policies for data types are defined by the ATLAS computing model PD2P triggered when users submit jobs analysis sites 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 8
9
The PD2P algorithm for T1s Primary copies of ATLAS data are placed at Tier-1 sites based on the Memorandum of Understanding (MoU) share MoU share specifies the contributions expected from the corresponding region PD2P makes secondary copies at Tier-1 sites when PD2P didn't make a replica of the data during the past week to a Tier-1 site The number of data replicas at Tier-1 sites is less than int(log10(Nused)) Nused = how many times the data was used per job set Nused = (10,100,1000,… ) => Nreplicas = 1,2,3, … 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 9
10
The PD2P algorithm for T1s: 2 copies One Tier-1 site is selected based on MoU share Replication request is sent to ATLAS Distributed Data Management (DDM) system When a copy is made at a Tier-1 site, another copy is made at a Tier-2 site at the same time Tier-2 site selected based on MoU share To have popular data not only at Tier1 sites but also at Tier2 sites 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 10
11
The PD2P algorithm for T2s Executed independently of the Tier-1 algorithm PD2P makes additional copies at Tier-2 sites when The number of the data replicas at Tier-2 sites is less than 5 … … and no more than two copies are concurrently being replicated … and one of the following: There is no replica within Tier-2 sites Not enough replicas are available while many jobs are waiting in the queue 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 11
12
The PD2P algorithm for T2s: 2 copies One copy quickly available at reliable Tier-2 site List-d: Tier-1 and Tier-2 sites where input is available List-c: Tier-2 sites with fast connection to sites in List-d W: the weight per site which is calculated using the number of active WNs at the site site reliability job statistics at the site # of replicas by PD2P at the site for the last 24 hours W is calculated for each site in List-c and the Tier-2 with the largest W is used One copy following MoU shares to balance long term data distribution 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 12
13
PD2P replication 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 13
14
Transfer Volume per Activity: Data Brokering vs Data Consolidation 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 14
15
Job Re-Brokerage PD2P relies on future reuse of data for its effectiveness The data copy triggered by the initial job is not used unless subsequent jobs reuse it The initial job remains at the original site although a new copy was replicated at free sites by PD2P The Re-Brokerage mechanism Periodically reassigns jobs to other sites if they are waiting in the queue for a while To increase reuse of PD2P replicas 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 15
16
PD2P replicas re-use 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 16
17
DDM Popularity 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 17 Provides information about the usage of files and datasets Collects traces at file level by ATLAS analysis tools DQ2 clients PanDA Ganga ~4M traces a day Aggregation of traces into daily statistics Provide information through web site, CLI and API
18
5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 18 Automatic site cleaning: Victor Victor DDM Accounting DDM Popularity DDM Deletion Service 1. Selection of full sites 2. Selection of unpopular replicas 3. Publication of decisions Space information Secondary replica popularity Full sites Replicas to delete Full sites Deleted replicas Optimize the utilization of storage resources Keep sites operationally full by deleting secondary, unpopular replicas Secondary replicas are guaranteed to be replicated on other sites! Reduces manual operations and accidents Once it is accepted, no more discussions: no merci nor regret
19
Victor Cleaning 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 19
20
The Victor cleaning algorithm A site (storage + space token) is cleaned If Free Space < 10% of Total Space Up to Free Space = 15% of Total Space Cleaning algorithm Datasets younger than 15 days are not touched Look for datasets used not more than once during the last 1 month … then 10 times in the last month … then 100 times in the last month … then give up Older datasets are cleaned first 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 20
21
Unsuccessful and Successful cleaning 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 21
22
Credits Lots of credits to Vincent Garonne Tadashi Maeno 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.