20-21 October 2015UK-T0 Workshop1 Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager.

Slides:



Advertisements
Similar presentations
Your university or experiment logo here What is it? What is it for? The Grid.
Advertisements

Founded in 2010: UCL, Southampton, Oxford and Bristol Key Objectives of the Consortium: Prove the concept of shared, regional e-infrastructure services.
Computing Infrastructure
STFC and the UK e-Infrastructure Initiative The Hartree Centre Prof. John Bancroft Project Director, the Hartree Centre Member, e-Infrastructure Leadership.
– please look at our wiki! Part of the new national e-Infrastructure
High Performance Computing Course Notes Grid Computing.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
UMF Cloud
Birmingham Particle Physics Masterclass 23 th April 2008 Birmingham Particle Physics Masterclass 23 th April 2008 The Grid What & Why? Presentation by:
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Virtual Desktop Infrastructure Solution Stack Cam Merrett – Demonstrator User device Connection Bandwidth Virtualisation Hardware Centralised desktops.
Minerva Infrastructure Meeting – October 04, 2011.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |
Integrating HPC and the Grid – the STFC experience Matthew Viljoen, STFC RAL EGEE 08 Istanbul.
1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
Ian Bird LHCC Referees’ meeting; CERN, 11 th June 2013 March 6, 2013
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Storage and data services eIRG Workshop Amsterdam Dr. ir. A. Osseyran Managing director SARA
VO Sandpit, November 2009 e-Infrastructure to enable EO and Climate Science Dr Victoria Bennett Centre for Environmental Data Archival (CEDA)
PPD Computing “Business Continuity” David Kelsey 3 May 2012.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
November 16, 2012 Seo-Young Noh Haengjin Jang {rsyoung, Status Updates on STAR Computing at KISTI.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Edinburgh Investment in e-Science Infrastructure Dr Arthur Trew.
The Birmingham Environment for Academic Research Setting the Scene Peter Watkins, School of Physics and Astronomy (on behalf of the Blue Bear team)
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Virtualisation & Cloud Computing at RAL Ian Collier- RAL Tier 1 HEPiX Prague 25 April 2012.
Southgrid Technical Meeting Pete Gronbech: 24 th October 2006 Cambridge.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Your university or experiment logo here What is it? What is it for? The Grid.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ITGS Network Architecture. ITGS Network architecture –The way computers are logically organized on a network, and the role each takes. Client/server network.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
Advanced Research Computing Projects & Services at U-M
DiRAC-3 – The future Jeremy Yates, STFC DiRAC HPC Facility.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
The Evolution of the Italian HPC Infrastructure Carlo Cavazzoni CINECA – Supercomputing Application & Innovation 31 Marzo 2015.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Accessing the VI-SEEM infrastructure
What is HPC? High Performance Computing (HPC)
Ian Bird WLCG Workshop San Francisco, 8th October 2016
An Introduction to Cloud Computing
Report from Computing Advisory Panel
UK Status and Plans Scientific Computing Forum 27th Oct 2017
MCSA VCE
Network Requirements Javier Orellana
The National Grid Service
Big-Data around the world
BusinessObjects IN Cloud ……InfoSol’s story
Jerald Overstreet, GISP Server Portal SQL Manager Admin
PLANNING A SECURE BASELINE INSTALLATION
Presentation transcript:

20-21 October 2015UK-T0 Workshop1 Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager of the DiRAC-2 Data Centric Facility COSMA

20-21 October 2015UK-T0 Workshop2 Talk layout ● Introduction to DiRAC ? ● The DiRAC computing systems ● What is DiRAC ● What type of science is done on the DiRAC facility ? ● Why do we need to copy data to RAL? ● Copying data to RAL – network requirements ● Collaboration between DiRAC and RAL to produce the archive ● Setting up the archiving tools ● Archiving ● Open issues ● Conclusions

20-21 October 2015UK-T0 Workshop3 Introduction to DiRAC ● DIRAC -- Distributed Research utilising Advanced Computing established in 2009 with DiRAC-1 ● Support of research in theoretical astronomy, particle physics and nuclear physics ● Funded by STFC with infrastructure money allocated from the Department for Business, Innovation and Skills (BIS) ● The running costs, such as staff costs and electricity are funded by STFC

Introduction to DiRAC, cont’d ● 2009 – DiRAC-1 – 8 installations across the UK of which COSMA-4 at the ICC in Durham is one. Still a loose federation. ● 2011/2012 – DiRAC-2 – major funding of £15M for e-Infrastructure – in bidding to host – 5 installations identified – judged by peers – for successful bidders scrutiny and interview by representatives for BIS to see if we could deliver by a tight deadline October 2015UK-T0 Workshop4

Introduction to DiRAC, cont’d ● DiRAC has full management structure. ● Computing time on the DiRAC facility is allocated through a peer-reviewed procedure. ● Current director: Dr Jeremy Yates, UCL ● Current technical director:Prof Peter Boyle, Edinburgh October 2015UK-T0 Workshop5

The DiRAC computing systems October 20156UK-T0 Workshop Blue Gene Edinburgh Cosmos Cambridge Complexity Leicester Data Centric Durham Data Analytic Cambridge

The DiRAC ● Edinburgh – IBM Blue Gene – cores – 1 Pbyte of GPFS storage – designed around (Lattice)QCD applications October 20157UK-T0 Workshop

DiRAC (Data Centric) ● Durham – Data Centric system –IBM IDataplex – 6720 Intel Sandy Bridge cores – 53.8 TB of RAM – FDR10 infiniband 2:1 blocking – 2.5 Pbyte of GPFS storage (2.2 Pbyte used!) October 20158UK-T0 Workshop

DiRAC Leicester Complexity – HP system 4352 Intel Sandy Bridge cores 30 Tbyte of RAM FDR 1:1 non-blocking 0.8 Pbyte of Panasas storage October 20159UK-T0 Workshop

DiRAC (SMP) ● Cambridge COSMOS ● SGI shared memory system – 1856 Intel Sandy Bridge cores – 31 Intel Xeon Phi co- processors – 14.8 Tbyte of RAM – 146 Tbyte of storage October UK-T0 Workshop

DiRAC (Data Analytic) Cambridge Data Analytic – Dell 4800 Intel Sandy Bridge cores 19.2 TByte of RAM FDR Infiniband 1:1 non- blocking 0.75 PB of Lustre storage October UK-T0 Workshop

What is DiRAC ● A national service run/managed/allocated by the scientists who do the science funded by BIS and STFC ● The systems are built around and for the applications with which the science is done. ● We do not rival a facility like ARCHER, as we do not aspire to run a general national service. ● DiRAC is classed as a major research facility by STFC on a par with the big telescopes October UK-T0 Workshop

What is DiRAC, cont’d ● Long projects with significant amount of CPU hours allocated for 3 years typically on a specific system – for 2012 – 2015 with examples: – Cosmos - dp002 : ~20M cpu hours on Cambridge Cosmos – Virgo-dp004 : 63M cpu hours on Durham DC – UK-MHD-dp010 : 40.5M cpu hours on Durham DC – UK-QCD-dp008 : ~700M cpu hours on Edinburgh BG – Exeter – dp005: ~15M cpu hours on Leicester Complexity – HPQCD – dp019 : ~20M cpu hours on Cambridge Data Analytic October 2015UK-T0 Workshop13

What type of Science is done on DiRAC ? ● For the highlights of science carried out on the DiRAC facility please see: ● Specific example: Large scale structure calculations with the Eagle run – 4096 cores – ~8 GB RAM/core – 47 days = 4,620,288 cpu hours – 200 TB of data October UK-T0 Workshop

Why do we need to copy data (to RAL) ? ● Original plan - each research project should make provisions for storing the research data – requires additional storage resource at researchers’ home institutions – Not enough provision – will require additional funds. – data creation considerably above expectation ? – if disaster struck many cpu hours of calculations would be lost October UK-T0 Workshop

Why do we need to copy data (to RAL) ? ● Research data must now be shared with/available to interested parties ● Install DiRAC’s own archive – requires funds and currently there is no budget. ● we needed to get started: – Jeremy Yates negotiated access to the RAL archive system ● Acquire expertise ● Identify bottlenecks and technical challenges – submitted 2,000,000 files and created an issue at the file servers ● How can we collaborate and make use of previous experience. ● AND: copy data! October UK-T0 Workshop

Copying data to RAL – network requirements ● network bandwidth – situation for Durham – now: ● currently possible Mbytes/sec ● required investment and collaboration from DU CIS ● upgrade to 6GBit/sec to JANET - Sep 2014 ● will be 10 Gbit/sec by end of 2015 – infra structure already installed – past:  identified Durham related bottlenecks - FIREWALL October UK-T0 Workshop

Copying data to RAL – network requirements ● network bandwidth – situation for Durham investment to by-pass of external campus firewall:  two new routers (~£80k) – configured for throughput with minimal ACL enough to safeguard site.  deploying internal firewalls – part of new security infrastructure, essential for such a venture  Security now relies on front-end system of Durham DiRAC and Durham GridPP October UK-T0 Workshop

Copying data to RAL – network requirements Result for COSMA and GridPP in Durham guaranteed 2-3 Gbit/sec with bursts of up to 3-4Gbit/sec (3 Gbit/sec outside of term time)  pushed the network performance for Durham GridPP from bottom 3 in the country to top 5 of the UK GridPP sites  achieves up to 300 – 400 Mbyte/sec throughput to RAL on archiving depending on file sizes October UK-T0 Workshop

Collaboration between DiRAC and GridPP/RAL ● Durham Institute for Computational Cosmology (ICC) volunteered to be the prototype installation ● Huge thanks to Jens Jensen and Brian Davies - there were many s exchanged, many questions asked and many answers given. ● Resulting document “Setting up a system for data archiving using FTS3” by Lydia Heck, Jens Jensen and Brian Davies October UK-T0 Workshop

Setting up the archiving tools ● Identify appropriate hardware – could mean extra expense:  need freedom to modify and experiment with - cannot have HPC users logged in and working!  free to do very latest security updates  requires optimal connection to storage - infiniband card October UK-T0 Workshop

Setting up the archiving tools ● Create an interface to access the file/archving service at RAL using the GridPP tools – gridftp – Globus Toolkit – also provides Globus Connect – Trust anchors (egi-trustanchors) – voms tools (emi3-xxx) – fts3 (cern) October 2015UK-T0 Workshop22

Archiving? ● long-lived voms proxy? – myproxy-init; myproxy-logon; voms-proxy-init; fts-transfer- delegation ● How to create a proxy and delegation that lasts weeks even months? – still an issue ● grid-proxy-init; fts-transfer-delegation – grid-proxy-init –valid HH:MM – fts-transfer-delegation –e time-in-seconds – creates proxy that lasts up to certificate life time October 2015UK-T0 Workshop23

Archiving ● Large files – optimal throughput limited by network bandwidth ● Many small files – limited by latency; using ‘-r’ flag to fts-transfer- submit to re-use connection ● Transferred: – ~40 Tbytes since 20 August – ~2M files – challenge to FTS service at RAL ● User education on creating lots of small files October 2015UK-T0 Workshop24

Open issues ● ownership and permissions are not preserved ● depends on single admin to carry out. ● what happens when content in directories change? – complete new archive sessions? ● tries to archive all the files again but then ‘fails’ as file already exists – should be more like rsync October 2015UK-T0 Workshop25

Conclusions ● With the right network speed we can archive the DiRAC data to RAL. ● The documentation has to be completed and shared with the system managers on the other DiRAC sites ● Each DiRAC site will have their own dirac0X account ● Start with and keep on archiving ● Collaboration between DiRAC and GridPP/RAL DOES work! ● Can we aspire to more? October 2015UK-T0 Workshop26