Presentation is loading. Please wait.

Presentation is loading. Please wait.

20-21 October 2015UK-T0 Workshop1 Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager.

Similar presentations


Presentation on theme: "20-21 October 2015UK-T0 Workshop1 Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager."— Presentation transcript:

1 20-21 October 2015UK-T0 Workshop1 Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager of the DiRAC-2 Data Centric Facility COSMA

2 20-21 October 2015UK-T0 Workshop2 Talk layout ● Introduction to DiRAC ? ● The DiRAC computing systems ● What is DiRAC ● What type of science is done on the DiRAC facility ? ● Why do we need to copy data to RAL? ● Copying data to RAL – network requirements ● Collaboration between DiRAC and RAL to produce the archive ● Setting up the archiving tools ● Archiving ● Open issues ● Conclusions

3 20-21 October 2015UK-T0 Workshop3 Introduction to DiRAC ● DIRAC -- Distributed Research utilising Advanced Computing established in 2009 with DiRAC-1 ● Support of research in theoretical astronomy, particle physics and nuclear physics ● Funded by STFC with infrastructure money allocated from the Department for Business, Innovation and Skills (BIS) ● The running costs, such as staff costs and electricity are funded by STFC

4 Introduction to DiRAC, cont’d ● 2009 – DiRAC-1 – 8 installations across the UK of which COSMA-4 at the ICC in Durham is one. Still a loose federation. ● 2011/2012 – DiRAC-2 – major funding of £15M for e-Infrastructure – in bidding to host – 5 installations identified – judged by peers – for successful bidders scrutiny and interview by representatives for BIS to see if we could deliver by a tight deadline 20-21 October 2015UK-T0 Workshop4

5 Introduction to DiRAC, cont’d ● DiRAC has full management structure. ● Computing time on the DiRAC facility is allocated through a peer-reviewed procedure. ● Current director: Dr Jeremy Yates, UCL ● Current technical director:Prof Peter Boyle, Edinburgh 20-21 October 2015UK-T0 Workshop5

6 The DiRAC computing systems 20-21 October 20156UK-T0 Workshop Blue Gene Edinburgh Cosmos Cambridge Complexity Leicester Data Centric Durham Data Analytic Cambridge

7 The Bluegene @ DiRAC ● Edinburgh – IBM Blue Gene – 98304 cores – 1 Pbyte of GPFS storage – designed around (Lattice)QCD applications 20-21 October 20157UK-T0 Workshop

8 COSMA @ DiRAC (Data Centric) ● Durham – Data Centric system –IBM IDataplex – 6720 Intel Sandy Bridge cores – 53.8 TB of RAM – FDR10 infiniband 2:1 blocking – 2.5 Pbyte of GPFS storage (2.2 Pbyte used!) 20-21 October 20158UK-T0 Workshop

9 Complexity @ DiRAC Leicester Complexity – HP system 4352 Intel Sandy Bridge cores 30 Tbyte of RAM FDR 1:1 non-blocking 0.8 Pbyte of Panasas storage 20-21 October 20159UK-T0 Workshop

10 Cosmos @ DiRAC (SMP) ● Cambridge COSMOS ● SGI shared memory system – 1856 Intel Sandy Bridge cores – 31 Intel Xeon Phi co- processors – 14.8 Tbyte of RAM – 146 Tbyte of storage 20-21 October 201510UK-T0 Workshop

11 HPCS @ DiRAC (Data Analytic) Cambridge Data Analytic – Dell 4800 Intel Sandy Bridge cores 19.2 TByte of RAM FDR Infiniband 1:1 non- blocking 0.75 PB of Lustre storage 20-21 October 201511UK-T0 Workshop

12 What is DiRAC ● A national service run/managed/allocated by the scientists who do the science funded by BIS and STFC ● The systems are built around and for the applications with which the science is done. ● We do not rival a facility like ARCHER, as we do not aspire to run a general national service. ● DiRAC is classed as a major research facility by STFC on a par with the big telescopes 20-21 October 201512UK-T0 Workshop

13 What is DiRAC, cont’d ● Long projects with significant amount of CPU hours allocated for 3 years typically on a specific system – for 2012 – 2015 with examples: – Cosmos - dp002 : ~20M cpu hours on Cambridge Cosmos – Virgo-dp004 : 63M cpu hours on Durham DC – UK-MHD-dp010 : 40.5M cpu hours on Durham DC – UK-QCD-dp008 : ~700M cpu hours on Edinburgh BG – Exeter – dp005: ~15M cpu hours on Leicester Complexity – HPQCD – dp019 : ~20M cpu hours on Cambridge Data Analytic 20-21 October 2015UK-T0 Workshop13

14 What type of Science is done on DiRAC ? ● For the highlights of science carried out on the DiRAC facility please see: http://www.dirac.ac.uk/science.html http://www.dirac.ac.uk/science.html ● Specific example: Large scale structure calculations with the Eagle run – 4096 cores – ~8 GB RAM/core – 47 days = 4,620,288 cpu hours – 200 TB of data 20-21 October 201514UK-T0 Workshop

15 Why do we need to copy data (to RAL) ? ● Original plan - each research project should make provisions for storing the research data – requires additional storage resource at researchers’ home institutions – Not enough provision – will require additional funds. – data creation considerably above expectation ? – if disaster struck many cpu hours of calculations would be lost. 20-21 October 201515UK-T0 Workshop

16 Why do we need to copy data (to RAL) ? ● Research data must now be shared with/available to interested parties ● Install DiRAC’s own archive – requires funds and currently there is no budget. ● we needed to get started: – Jeremy Yates negotiated access to the RAL archive system ● Acquire expertise ● Identify bottlenecks and technical challenges – submitted 2,000,000 files and created an issue at the file servers ● How can we collaborate and make use of previous experience. ● AND: copy data! 20-21 October 201516UK-T0 Workshop

17 Copying data to RAL – network requirements ● network bandwidth – situation for Durham – now: ● currently possible 300-400 Mbytes/sec ● required investment and collaboration from DU CIS ● upgrade to 6GBit/sec to JANET - Sep 2014 ● will be 10 Gbit/sec by end of 2015 – infra structure already installed – past:  identified Durham related bottlenecks - FIREWALL 20-21 October 201517UK-T0 Workshop

18 Copying data to RAL – network requirements ● network bandwidth – situation for Durham investment to by-pass of external campus firewall:  two new routers (~£80k) – configured for throughput with minimal ACL enough to safeguard site.  deploying internal firewalls – part of new security infrastructure, essential for such a venture  Security now relies on front-end system of Durham DiRAC and Durham GridPP. 20-21 October 201518UK-T0 Workshop

19 Copying data to RAL – network requirements Result for COSMA and GridPP in Durham guaranteed 2-3 Gbit/sec with bursts of up to 3-4Gbit/sec (3 Gbit/sec outside of term time)  pushed the network performance for Durham GridPP from bottom 3 in the country to top 5 of the UK GridPP sites  achieves up to 300 – 400 Mbyte/sec throughput to RAL on archiving depending on file sizes. 20-21 October 201519UK-T0 Workshop

20 Collaboration between DiRAC and GridPP/RAL ● Durham Institute for Computational Cosmology (ICC) volunteered to be the prototype installation ● Huge thanks to Jens Jensen and Brian Davies - there were many emails exchanged, many questions asked and many answers given. ● Resulting document “Setting up a system for data archiving using FTS3” by Lydia Heck, Jens Jensen and Brian Davies 20-21 October 201520UK-T0 Workshop

21 Setting up the archiving tools ● Identify appropriate hardware – could mean extra expense:  need freedom to modify and experiment with - cannot have HPC users logged in and working!  free to do very latest security updates  requires optimal connection to storage - infiniband card 20-21 October 201521UK-T0 Workshop

22 Setting up the archiving tools ● Create an interface to access the file/archving service at RAL using the GridPP tools – gridftp – Globus Toolkit – also provides Globus Connect – Trust anchors (egi-trustanchors) – voms tools (emi3-xxx) – fts3 (cern) 20-21 October 2015UK-T0 Workshop22

23 Archiving? ● long-lived voms proxy? – myproxy-init; myproxy-logon; voms-proxy-init; fts-transfer- delegation ● How to create a proxy and delegation that lasts weeks even months? – still an issue ● grid-proxy-init; fts-transfer-delegation – grid-proxy-init –valid HH:MM – fts-transfer-delegation –e time-in-seconds – creates proxy that lasts up to certificate life time. 20-21 October 2015UK-T0 Workshop23

24 Archiving ● Large files – optimal throughput limited by network bandwidth ● Many small files – limited by latency; using ‘-r’ flag to fts-transfer- submit to re-use connection ● Transferred: – ~40 Tbytes since 20 August – ~2M files – challenge to FTS service at RAL ● User education on creating lots of small files 20-21 October 2015UK-T0 Workshop24

25 Open issues ● ownership and permissions are not preserved ● depends on single admin to carry out. ● what happens when content in directories change? – complete new archive sessions? ● tries to archive all the files again but then ‘fails’ as file already exists – should be more like rsync 20-21 October 2015UK-T0 Workshop25

26 Conclusions ● With the right network speed we can archive the DiRAC data to RAL. ● The documentation has to be completed and shared with the system managers on the other DiRAC sites ● Each DiRAC site will have their own dirac0X account ● Start with and keep on archiving ● Collaboration between DiRAC and GridPP/RAL DOES work! ● Can we aspire to more? 20-21 October 2015UK-T0 Workshop26


Download ppt "20-21 October 2015UK-T0 Workshop1 Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager."

Similar presentations


Ads by Google