Download presentation
Presentation is loading. Please wait.
Published bySuzan Francis Modified over 6 years ago
1
High Speed Networking What has worked, and what hasn’t
Eli Dart, Network Engineer ESnet Science Engagement Lawrence Berkeley National Laboratory National Research Platform Workshop Bozeman, MT August 8, 2017
2
Disclaimer, Context These are my opinions, told from my perspective
Others have their own experience – I continue to learn from others, and I value others’ views of history and of current practice. I’m not trying to trash anybody’s stuff It’s important for us to examine what has worked and what hasn’t A clear-eyed understanding of flaws is as important as a clear-eyed understanding of successes My goal here is to contribute to the community understanding so we can be smart about next steps I’m going to stir a few pots to get the discussion going 11/21/2018
3
Outline High Performance Everywhere Science DMZ Virtual Circuits
perfSONAR Engagement 11/21/2018
4
High Performance Everywhere
Back in the ‘90s and 2000s, this is what we all tried to do Basic idea: any host should be able to get good end to end performance to any other host NxN mesh of high performance Lots of attempts to educate users about TCP Lots of effort spent deploying jumbo frames Lots of effort spent tuning systems This was before kernel autotuning, so the config was box-global Really hard to get large-scale infrastructure providers to change settings And for good reason – stuff wasn’t stable Example: AIX panicked when SACK turned on – this was in 2001 or 2002 Lots of effort spent on tools High performance tools (e.g. GridFTP) Non-standard tools (remember Tsunami?) 11/21/2018
5
High Performance Everywhere (2)
High Performance Everywhere was mostly a failure Networks became commodity, and subject to cost pressure Security requirements increased – big perimeter firewalls, etc. Big networks tended to have architectural problems from a performance perspective in most places Not all bad – several good things came from this era and this effort Kernel autotuning – this is very important Experimentation with and investment in high-performance tools Grid/Globus had enormous impact, and still does New TCP implementations – also very important However, as a performance philosophy, High Performance Everywhere did not achieve its goals Problem too big Constantly thwarted by cheap switches, firewalls, etc. etc. 11/21/2018
6
Science DMZ In many ways, the Science DMZ was a response to High Performance Everywhere Instead of trying to boil the ocean, concentrate on a specific location Easy to deploy Easy to scale Lots of effort spent on evangelizing Huge effort by NSF – years of work, over $100M Huge effort by many campuses and researchers – the community owns this now Huge effort spent on education and training – e.g. OIN workshops All that combined effort really paid off Lots of success in this space So far, much of that success appears durable One key takeaway – we can do high performance TCP Over routed layer3 networks, no less! In production, at scale 11/21/2018
7
Science DMZ (2) The Science DMZ also has problems
Security story is complex, out of step with checkbox-bandit best practice Many shops don’t have the expertise to run a sophisticated IDS Even if they have the expertise, often they don’t have the time Some shops have tens of FTEs on enterprise applications, and six people on networking – and the security people won’t touch anything that isn’t a firewall While this seems bizarre, it’s something we have to deal with Many shops simply won’t deploy anything outside their firewall Nobody willing to sign off on the risk (often because they don’t have people who can quantify or mitigate the risk) High enough monetary and reputation risk that they just won’t go there E.g. HIPAA shops 11/21/2018
8
Science DMZ (3) More Science DMZ problems
User interaction can be challenging Often people aren’t asking for a Science DMZ They don’t know that things could be better Science DMZs are not always easily discoverable Prominent researchers at universities often don’t trust IT for various reasons They have their own stuff run by their own people whom they do trust. A DMZ run by Campus IT doesn’t matter to these people even though they would benefit and could be important advocates (see Engagement) Users need higher-level tools File transfer often isn’t enough (necessary but not sufficient) Orchestration can use tools in the Science DMZ – but someone has to help the scientist with the integration (see Engagement) Still more work to do… 11/21/2018
9
Virtual Circuits Lots of stuff in this space – I’ll talk about three aspects Ethernet VLANs OSCARS SDN Make the devices link-local to each other My DTN has an ARP entry for your DTN and vice versa Ethernet VLANs – the good: Easy to understand and reason about conceptually (basic LAN stuff) Inexpensive to deploy on waves (you just need switches) Ethernet VLANs – the bad: Very difficult to troubleshoot Very human-intensive to deploy Scientists can’t do this themselves (don’t have skills or permission) 11/21/2018
10
Virtual Circuits (2) OSCARS
Scheduled virtual circuits with isolation, bandwidth and service guarantees Interdomain demarc is still a VLAN tag OSCARS – the good: Isolation, QoS are hugely powerful Enables services that routed Layer3 networks cannot provide Used routinely in production (ESnet, Internet2, others) OSCARS – the bad: Attempts at making OSCARS a user-facing service have largely failed Most scientists don’t want to reason about the network Most campuses aren’t interested in letting scientists reconfigure networks Interdomain automation is still a significant challenge due to lack of universal deployment Impacts ability to provision end to end without manual intervention 11/21/2018
11
Virtual Circuits (3) Virtual circuits in general – the good:
Provide services that are not available in routed Layer3 networks Bandwidth, service, path, guarantees Isolation Provide a way to virtualize networks Huge promise in the SDN space Virtual circuits in general – the bad: Inappropriate for exposure to domain scientists (this is important) We don’t have a good way for production hosts to consume virtual circuits Most network applications only reason about the network via sockets API Still large effort to set up interdomain virtual circuits SDN Lots of promise, lots of hype Used in production in some networks Still an active area of research People are working on solving a lot of the problems SENSE project, many others Harvey’s keynote described science needs and research trajectory 11/21/2018
12
perfSONAR perfSONAR – the good:
Widely deployed – wide enough that network effects are real Stable software package, with dev support from key organizations Provides necessary capabilities for keeping science networks running well perfSONAR – the bad: Multiple platforms never interoperated well No longer a problem, but it was for a time Lesson here – interoperability must be seamless in order for it to be real Interaction with security is challenging – no fix for this, we just have to deal with it Difficulty with sensor calibration and test repeatability – hard to measure networks in the same way a scientist measures nature 11/21/2018
13
Engagement Critical for helping scientists transition to modern cyberinfrastructure Critical for helping establish and promote CI best practice Engagement – the good: Demonstrated success in many environments Performance improved Workflows transformed Building a community and a profession for engagement engineers Models for requirements collection Engagement – the bad: People scale terribly We need to collaborate on best practices for scaling engagement engineers Engagement needs a sustainable career path within institutions People need to feel safe investing in this career 11/21/2018
14
Discussion Framing There are many things we could talk about
I discussed some topics: High Performance Everywhere Science DMZ Virtual Circuits perfSONAR Engagement These are not the only topics we could talk about We have a lot of accumulated experience in this community, from different organizations with different capabilities What do we need to get a National Research Platform off the ground? 11/21/2018
15
OK – Throw Fruit! Eli Dart Energy Sciences Network (ESnet)
Lawrence Berkeley National Laboratory
16
11/21/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.