Download presentation
Presentation is loading. Please wait.
Published byAndrew Hawkins Modified over 6 years ago
1
Azher Mughal Caltech UCSD PRP Conference (2/21/2017)
Review of Terabit/sec SDN demonstrations at Supercomputing 2016 and plans for SC17 Azher Mughal Caltech UCSD PRP Conference (2/21/2017)
2
SC16 – The Demonstration Goals
SDN Traffic Flows Network should solely be controlled by the SDN application. Relying mostly on the North Bound for network visibility and troubleshooting (aka. operator panel) Install flows among a pair of DTN nodes Re-engineer flows crossing alternate routes across the ring (shortest or with more bandwidth) High Speed DTN Transfers 100G to 100G (Network to Network) 100G to 100G (Disk to Disk) NVMe over Fabrics at 100G 1 Tbps to 1Tbps (Network to Network using RoCE, Direct NIC or with TCP) ODL (PhEDEx + ALTO) Substantially extended OpenDaylight controller using a unified multilevel control plane programming framework to drive the new network paradigm Advanced integration functions with the data management applications of the CMS experiment Plan for the Supercomputing 2017 Conference 200G Network Interfaces NVMe over Fabrics LHC CMS Use Cases demonstration on intelligent networks
3
The Power of Collaboration Connections spread across 4 Time zones
4
Enhanced Infrastructure using SCinet OTN monitored Connections
All the connections to booths are through the OTN Metro DCI Connections 1Tbps Booths Caltech StarLight SCinet 100GE Booths UMich Vanderbilt UCSD Dell 2CRSi Ethernet Alliance Connections Total 5 x WAN 7 x Dark Fiber 2 x 1Tbps
5
Bandwidth explosions by Caltech at SC
SC05 (Seattle): ~155Gbps SC11 (Seattle): ~100Gbps SC12 (Salt Lake): ~350Gbps SC13 (Denver): ~800Gbps SC14 (Louisiana): ~1.5Tbps SC15 (Austin): ~ 500Gbps SC16 (Salt Lake): ~2Tbps Multiple 100G connections 2Tbps 800G Using 10G connections 350G 100G Fully SDN enabled
6
SC16 across CENIC Pacific Research Platform (based on top of CENIC network backbone). Energize the Science teams, to take benefits of the high speed networks in place
7
Caltech Booth Network Layout
100GE Switches 3 x Dell Z9100 (OF Fabric) 1 x Arista 7280 1 x Arista 7060 1 x Inventec (OF Fabric) 3 x Mellanox SN (OF Fabric) 1 x Spirent Tester NICs, Cables & Optics 50 x 100GE Mellanox NICs/Cables 2 x 25GE NICs 50 LR4/CLR4 Optics
8
Caltech Booths 2437, 2537 Multiple Projects SND-NGenIA
Super SDN Paradigm ExaO LHC Orchestrator Complex Flows Machine Learning LHC Data Traversal Immersive VR Caltech Booths 2437, 2537
9
Spirent Network Tester
Fiber patch panel Spirent Network Tester Infinera Cloud Xpress (DCI to Booth 2611) NVMe over Fabrics Servers VM Server for various demonstrations Coriant (WAN DCI) 1Tbps RDMA Server Dell R930 for PRP demonstrations SM SC5 for Kisti 1 x Arista 7280 2 x Dell Z9100 1 x Arista 7060 1 x Mellanox SN2700 1 x Cisco NCS 1002 OpenFlow Dell Z9100 switch PhEDEx Exao Servers PhEDEx / Exao Server HGST Storage Node Dell R Gbps (Caltech 2437) HGST Storage PhEDEx / Exao Server
10
4 x SM blades for MAPLE/FAST IDE
Rack Management Server Cisco NCS 1002 (5 x 100G links) 1 x Dell Z9100 1 x Inventec 2 x Pica8 3920 2 x Dell s4810 SM GPU Server (GTX1080) Dell R Gbps (Caltech 2537)
11
OpenDaylight & Caltech SDN Initiatives
Supporting: Northbound and South bound interfaces Starting with Lithium, Intelligent services likes ALTO, SPCE OVSDB for OpenVSwitch Configuration, including the northbound interface NetIDE: Rapid application development platform for OpenDaylight and also to re-use modules written for other projects to OpenDaylight OFNG – ODL: NB libraries for Helium/Lithium OFNG - ALTO High Level application integration OLiMPs – ODL: Migrated to Hydrogen Boron Beryllium (2016/2) Lithium (2015/6) Helium (2014/9) OLiMPs – FloodLight: Link-layer MultiPath Switching Hydrogen (2014/2) Start (2013)
12
Actual SDN Topology used for larger demonstrations
Intelligent traffic Engineering Host / Node discovery Live OpenFlow Statistics Can Provision end-to-end Paths: Layer2 / Layer2 Edge Provider port to LAN Edge Port to Edge Port (tunnel) Local flows in a node
13
1Tbps Booth to Booth Network Transfer
14
System/Network Architecture
15
Expansion Board X9DRG-O-PCI-E (full x16 version)
SuperMicro Server Design (A GPU Chassis) SYS-4028GR-TR2 Expansion Board X9DRG-O-PCI-E (full x16 version)
16
SuperMicro - SYS-4028GR-TR(T2) PCI-e Lane Routing
Single CPU, two PCIe x16 bus splits each among 5 slots. 10 CX-4 NICs per server. Effective data rate across the CPU-0 is 256Gbps Full Duplex. CPU1 CPU0
17
Results, consistent network throughput - 800~900Gbps
18
System Design Considerations for 200GE / 400GE and beyond … 1Tbps
19
100GE Switches (compact form factor)
32 x 100GE Ports All ports are configurable 10 / 25 / 40 / 50 / 100GE Arista, Dell, and Inventec are based on Broadcom Tomahawk (TH) chip, while Mellanox is using their own spectrum chipset TH: Common 16MB packet buffer memory among 4 quadrants 3.2 Tbps Full Duplex switching capacity support ONIE Boot loader Dell /Arista supports two additional 10GE ports OpenFlow 1.3+, with multi-tenant support
20
NVMe Drive Options, what to choose ?
PCIe Format M.2 Format 2.8 GB/s write DC P3608 (x8) 1.75 GB/s write Samsung M.2 PCIe Add-on card with PCIe bridge chip DC P3700 (x4) U.2 Format 5.7 GB/s write DC P3700 2.4 GB/s write MX 6300 (x8) LIQID/Kingston DCP1000 HGST SN100
21
Let’s build a Low cost NVMe storage server (~100Gbps)
Total Ingredients: 2U SuperMicro Server (with 3 x x16 slots) Dual Dell quad M.2 adapter card 8 Samsung 960 PRO M.2 drives (1TB) FIO Results with: 4 x 1TB M.2 CPU Idle: 90%
22
Design options for High Throughput DTN Server
1U SuperMicro Server (Single CPU) Single 40/100GE NIC Dual NVME Storage (LIQID 3.2TB each) ~90 Gbps disk I/O using NVME over Fabrics 2U SuperMicro Server (Dual CPU) Single 40/100GE NIC Three NVME Storage (LIQID 3.2TB each) ~100 Gbps disk I/O using FDT/NVME over Fabrics 2U SuperMicro (Dual CPU) Single/Dual 40/100GE NICs 24 NVME front loaded 2.5” drives (U.2) ~200Gbps of disk I/O using FDT/NVME over Fabrics
23
2CRSI Server with 24 NVMe drives
Max throughput reached at 14 drives (7 drives per processor) A limitation due to combination of single PCIe x16 bus (128Gbps), processor utilization and application overheads.
24
Beyond 100GE -> 200/400GE, Component readiness ?
Server Readiness: 1) Current PCIe Bus limitations - PCIe Gen 3.0 (x16 can reach 128Gbs Full Duplex) - PCIe Gen 4.0 x16 can reach double the capacity, i.e. 256Gbps Targetting 200G NICs - PCIe Gen 4.0 x32 can reach double the capacity, i.e. 512Gbps 2) Increased number of PCIe lanes within processor Latest Broadwell (2016) - PCIe lanes per processor = 40 - Supports PCIe Gen 3.0 (8GT/sec) - Up to DDR4 2400MHz memory Skylake (2017) - Supports PCIe Gen 4.0 (16GT/sec) - PCIe Gen4 lanes per processor = 48 AMD (2017) - PCIe lanes per processor = 128 (Could be used of single socket solutions) - Can provide 8 x Gen3 x16 slots = Maximum of 8 NICs per socket
25
RoCE - 400GE Network Throughput
Transmission across 4 Mellnox VPI NICs. Only 4 CPU cores are used out of 24 cores. 389Gbps
26
Collaboration Partners
Special thanks to … Research Partners Univ of Michigan UCSD iCAIR / StarLight Stanford Venderbilt UNESP / ANSP RNP Internet2 ESnet CENIC FLR / FIU PacWave Industry Partners Brocade (OpenFlow capable Switches) Dell (OpenFlow capable Switches) Dell (Server systems) Echostreams (Server systems) Intel (NVME SSD Drives) Mellanox (NICs and Cables) Spirent (100GE Tester) 2CRSI (NVME Storage) HGST Storage (NVME Storage) LIQID 26
27
Plans for SC17 (Denver, Nov 2017)
East West integration with other controllers along with state, recovery, provisioning, monitoring Demonstrating SENSE project for DTN auto tuning NVMe over Fabrics across the WAN DTN design using 200G NICs (Mellanox/Chelsio)
28
For more details, please visit
Thank you ! Questions ? For more details, please visit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.