ESnet and Science DMZs: an update from the US Tony Hey Chief Data Scientist Rutherford Appleton Lab STFC
DOE’s Esnet6 Plans With thanks to Eli Dart, LBNL
ESnet – Connecting Facilities At Scale Eli Dart, Science Engagement Energy Sciences Network (ESnet) Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory April 13, 2018
ESnet - the basic facts High-speed international networking facility, optimized for data-intensive science: connecting 50 labs, plants and facilities with >150 networks, universities, research partners globally supporting every science office, and serving as an integral extension of many instruments 400Gbps transatlantic extension in production since Dec 2014 >1.3 Tbps of external connectivity, including high speed access to commercial partners such as Amazon growing number university connections to better serve LHC science (and eventually: Belle II) older than commercial Internet, growing ~twice as fast Areas of strategic focus: software, science engagement. Engagement effort now 12% of staff Software capability critical to next-generation network
ESnet is a dedicated mission network engineered to accelerate a broad range of science outcomes One way to think of ESnet is as it is – a network We do this by offering unique capabilities, and optimizing the network for data acquisition, data placement, data sharing, data mobility. 11/22/2018
Beyond the network core: The Science DMZ Friction-free network path Dedicated data transfer nodes (DTNs) Performance monitoring (perfSONAR) Over 120 labs and universities in the US have deployed this ESnet architecture NSF has invested >>$120M to accelerate adoption UK, Australia, Brazil, others following suit http://fasterdata.es.net/science-dmz/ I normally give entire talk on this idea, but will summarize in a single slide.
Data and HPC: The Petascale DTN Project Built on top of the Science DMZ Effort to improve data transfer performance between the DOE ASCR HPC facilities at ANL, LBNL, and ORNL, and also NCSA. Multiple current and future science projects need to transfer data between HPC facilities Performance goal of 15 gigabits per second (equivalent to 1PB/week) Realize performance goal for routine Globus transfers without special tuning Reference data set is 4.4TB of cosmology simulation data Use performant, easy-to-use tools with production options on Globus Transfer service (previously Globus Online) Use GUI just like a user would, with default options E.g. integrity checksums enabled, as they should be 11/22/2018
DTN Cluster Performance – HPC Facilities (2017) Final state at successful project conclusion 11/22/2018
Improvements in Multiple Places … DTN clusters expanded ALCF: 8 DTNs to 12 NERSC: 4 DTNs to 8 ORNL: 8 DTNs to 16 (for Globus pool) Network Upgrades ORNL: Nx10G to Nx40G NCSA: Nx10G to Nx100G NERSC: Nx10G to Nx40G Unrelated Globus change gave a big performance boost Change in behavior of transfers with large file counts This beneficially affects many production transfers at HPC facilities 11/22/2018
Petascale DTN Lifts All Boats Petascale DTN project benefits all projects which use the HPC facility DTNs Example: multi-facility users Many projects use multiple HPC facilities At-scale data movement between DOE HPC facilities is a huge benefit Example: Modern Research Data Portal architecture Data portals which use modern architecture benefit from DTN improvements DTN scaling/improvements benefit all data portals which use the same DTN cluster 11/22/2018
Legacy Portal Design Very difficult to improve performance without architectural change Software components all tangled together Difficult to put the whole portal in a Science DMZ because of security Even if you could put it in a DMZ, many components aren’t scalable How do we change the architecture? Legacy portal model: Web server with filesystem-level access to data store CGI, perl/python, database, etc. for portal logic all run on the web server Evolved in place over 15-ish years Need to work smarter not harder – re-think the architecture 11/22/2018
Next-Generation Portal Leverages Science DMZ Portal logic still runs in the enterprise network. However, instead of handing out pointers to data objects stored within the portal server, the portal hands out pointers to data objects stored on the big filesystem and accessible via the DTNs. https://peerj.com/articles/cs-144/ 11/22/2018
NSF’s Campus Cyberinfrastructure (CC NSF’s Campus Cyberinfrastructure (CC*) Plans With thanks to Kevin Thompson, NSF
Networking Programs in CISE/OAC Cognizant PO: Kevin Thompson kthompso@nsf.gov Networking as a fundamental layer and underpinning of CI CC* - Campus Cyberinfrastructure Campus networking upgrade (re-design to scienceDMZ at campus border, 10/100Gbps) and innovation program IRNC – International R&E Network Connections Scientific discovery as a global collaborative endeavor Provide network connections linking U.S. research with peer networks in other parts of the world – 100Gbps links, software defined exchanges Supports all R&E US data flows (not just NSF-funded) Includes performance flow measurement, monitoring, training November 15, 2017 Office of Advanced Cyberinfrastructure - Supercomputing 2017 Birds-of-a-Feather
CC* - Campus Cyberinfrastructure Cognizant PO: Kevin Thompson kthompso@nsf.gov NSF 18-508 – proposals due January 30, 2018 CC* awards will be supported in four program areas: (1) Data Driven Networking Infrastructure for the Campus and Researcher awards will be supported at up to $500,000 total for up to 2 years; (2) Network Design and Implementation for Small Institutions awards will be supported at up to $750,000 total for up to 2 years; (3) Network Integration and Applied Innovation awards will be supported at up to $1,000,000 total for up to 2 years; and (4) Network Performance Engineering and Outreach awards will be supported at up to $3,500,000 total for up to 4 years. [NEW for 2018] November 15, 2017 Office of Advanced Cyberinfrastructure - Supercomputing 2017 Birds-of-a-Feather
CC* Program Awards 2012-2017: ~225 awards Cognizant PO: Kevin Thompson kthompso@nsf.gov November 15, 2017 Office of Advanced Cyberinfrastructure - Supercomputing 2017 Birds-of-a-Feather
2017 OAC Networking Program Examples – Broadening Participation Guam – CC* award to U of Guam and IRNC-supported open exchange point #1659182: “CC* Network Design: Upgrading the University of Guam Network to Connect to Internet2 and Create a Science DMZ” Tribal Colleges and Universities CI Study - #1655185 – “TCU Cyberinfrastructure Initiative: A Study of Tribal College and University Cyberinfrastructure and Supported STEM Programs:, PI: Al Kuslikis, AIHEC 37 TCUs November 15, 2017 Office of Advanced Cyberinfrastructure - Supercomputing 2017 Birds-of-a-Feather