Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

From Theory to Reality  The original computing model:  10 copies of AODs at T1s  10 copies of AODs at T2s  For AODs only, 13 PB of data  13PB x 20 = 260 PB  + NTUP + RAW + ESDs +...  x 2 (for the simulated data)  So we would be running at Hexa-Scale. Cool.  Unfortunately ATLAS has O(10) less space..  Managing space is a nightmare  Physicists do not like cancelling data (even if they are useless) 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 2

ATLAS space occupancy 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 3

The Dynamic Data Placement model  A minimal replication is made a priory  Like in computing model, but many less replicas  More replicas are created dynamically  As jobs demand a particular type of data  Unused replicas are cleaned automatically  Guaranteeing the existence of a minimal amount of replicas 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 4

ATLAS dataset replicas classes 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 5

5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 6

Distributed Data Management System 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 7 DDM clients Common modular framework Repository Content Accounting Location Subscription Data usage Oracle Database Central Catalogues Site Services Transfer Consistency Deletion Replica Reduction WLCG Open Science Grid European Grid Initiative NorduGrid DDM Production system Analysis system Data placement End-users Tier 0

Panda Dynamic Data Placement: PD2P  Two algorithms: one for Tier-1 and the other for Tier-2  Tier-1 sites are used as data repository while Tier-2 sites are used more for execution of analysis jobs  PD2P considers only official datasets  Users can submit jobs with private data but those data are not replicated by PD2P  Replication policies for data types are defined by the ATLAS computing model  PD2P triggered when users submit jobs analysis sites 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 8

The PD2P algorithm for T1s  Primary copies of ATLAS data are placed at Tier-1 sites based on the Memorandum of Understanding (MoU) share  MoU share specifies the contributions expected from the corresponding region  PD2P makes secondary copies at Tier-1 sites when  PD2P didn't make a replica of the data during the past week to a Tier-1 site  The number of data replicas at Tier-1 sites is less than int(log10(Nused)) Nused = how many times the data was used per job set Nused = (10,100,1000,… ) => Nreplicas = 1,2,3, … 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 9

The PD2P algorithm for T1s: 2 copies  One Tier-1 site is selected based on MoU share  Replication request is sent to ATLAS Distributed Data Management (DDM) system  When a copy is made at a Tier-1 site, another copy is made at a Tier-2 site at the same time  Tier-2 site selected based on MoU share  To have popular data not only at Tier1 sites but also at Tier2 sites 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 10

The PD2P algorithm for T2s  Executed independently of the Tier-1 algorithm  PD2P makes additional copies at Tier-2 sites when  The number of the data replicas at Tier-2 sites is less than 5 …  … and no more than two copies are concurrently being replicated  … and one of the following: There is no replica within Tier-2 sites Not enough replicas are available while many jobs are waiting in the queue 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 11

The PD2P algorithm for T2s: 2 copies  One copy quickly available at reliable Tier-2 site  List-d: Tier-1 and Tier-2 sites where input is available  List-c: Tier-2 sites with fast connection to sites in List-d  W: the weight per site which is calculated using the number of active WNs at the site site reliability job statistics at the site # of replicas by PD2P at the site for the last 24 hours  W is calculated for each site in List-c and the Tier-2 with the largest W is used  One copy following MoU shares to balance long term data distribution 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 12

PD2P replication 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 13

Transfer Volume per Activity: Data Brokering vs Data Consolidation 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 14

Job Re-Brokerage  PD2P relies on future reuse of data for its effectiveness  The data copy triggered by the initial job is not used unless subsequent jobs reuse it  The initial job remains at the original site although a new copy was replicated at free sites by PD2P  The Re-Brokerage mechanism  Periodically reassigns jobs to other sites  if they are waiting in the queue for a while  To increase reuse of PD2P replicas 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 15

PD2P replicas re-use 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 16

DDM Popularity 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 17  Provides information about the usage of files and datasets  Collects traces at file level by ATLAS analysis tools  DQ2 clients  PanDA  Ganga  ~4M traces a day  Aggregation of traces into daily statistics  Provide information through web site, CLI and API

5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 18 Automatic site cleaning: Victor Victor DDM Accounting DDM Popularity DDM Deletion Service 1. Selection of full sites 2. Selection of unpopular replicas 3. Publication of decisions Space information Secondary replica popularity Full sites Replicas to delete Full sites Deleted replicas Optimize the utilization of storage resources Keep sites operationally full by deleting secondary, unpopular replicas Secondary replicas are guaranteed to be replicated on other sites! Reduces manual operations and accidents Once it is accepted, no more discussions: no merci nor regret

Victor Cleaning 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 19

The Victor cleaning algorithm  A site (storage + space token) is cleaned  If Free Space < 10% of Total Space  Up to Free Space = 15% of Total Space  Cleaning algorithm  Datasets younger than 15 days are not touched  Look for datasets used not more than once during the last 1 month … then 10 times in the last month … then 100 times in the last month … then give up  Older datasets are cleaned first 5 February 2013 Simone.Campana@cern.ch – IT-SDC White Area Lectures 20

Unsuccessful and Successful cleaning 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 21

Credits  Lots of credits to  Vincent Garonne  Tadashi Maeno 5 February 2013Simone.Campana@cern.ch – IT-SDC White Area Lectures 22

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

Similar presentations

Presentation on theme: "Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

Similar presentations

Presentation on theme: "Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)"— Presentation transcript:

Similar presentations

About project

Feedback