Presentation is loading. Please wait.

Presentation is loading. Please wait.

Azher Mughal Caltech (12/8/2016)

Similar presentations


Presentation on theme: "Azher Mughal Caltech (12/8/2016)"— Presentation transcript:

1 Azher Mughal Caltech (12/8/2016)
Review of Terabit/sec SDN demonstrations at Supercomputing 2016 and plans for SC17 Azher Mughal Caltech (12/8/2016)

2 SC16 Demonstration Goals
SDN Traffic Flows Network should solely be controlled by the SDN application. Relying mostly on the North Bound for network visibility and troubleshooting (aka. operator panel) Install flows among a pair of DTN nodes Re-engineer flows crossing alternate routes across the ring (shortest or with more bandwidth) High Speed DTN Transfers 100G to 100G (Network to Network) 100G to 100G (Disk to Disk) NVMe over Fabrics at 100G 1 Tbps to 1Tbps (Network to Network using RoCE, Direct NIC or with TCP) ODL (PhEDEx + ALTO (RSA + SPCE) Substantially extended OpenDaylight controller using a unified multilevel control plane programming framework to drive the new network paradigm Advanced integration functions with the data management applications of the CMS experiment

3

4 Bandwidth explosions by Caltech at SC
SC05 (Seattle): Gbps SC11 (Seattle): Gbps SC12 (Salt Lake): 350Gbps SC13 (Denver): Gbps SC14 (Louisiana): 1.5Tbps SC15 (Austin): ~ 500Gbps SC16 (Salt Lake): 2Tbps 1.5T Multiple 100G connections 2Tbps 800G Using 10G connections 350G 100G Fully SDN enabled First ever 100G OTU-4 trials using Ciena laid over multiple Using 10GE connections in 2008

5 SCinet OTN Connection All the connections to booths are through the OTN Metro DCI Connections 1Tbps Booths Caltech StarLight SCinet 100GE Booths UMich Vanderbilt UCSD Dell 2CRSi Ethernet Alliance Connections 5 x WAN 7 x Dark Fiber 2 x 1Tbps

6 Pacific Research Platform (based on top of CENIC network backbone)
Pacific Research Platform (based on top of CENIC network backbone). Energize the teams, now to take benefits of the high speed networks in place

7 Booth Internal Network Layout
3 x Dell Z9100 3 x Dell S6100 1 x Arista 7280 1 x Arista 7060 1 x Inventec 3 x Mellanox SN 2700 1 x Spirent Tester 2 x 25GE NICs 50 x 100GE Mellanox NICs/Cables 50 LR4/CLR4 Optics

8 Multiple Tbps Complex Flows
Multiple Projects SND-NGenIA Super SDN Paradigm ExaO LHC Orchestrator Multiple Tbps Complex Flows Machine Learning LHC Data Traversal Immersive VR Caltech Booths 2437, 2537

9 Spirent Network Tester
Fiber patch panel Spirent Network Tester Infinera Cloud Xpress (DCI to Booth 2611) NVMe over Fabrics Servers VM Server for various demonstrations Coriant (WAN DCI) 1Tbps RDMA Server Dell R930 for PRP demonstrations SM SC5 for Kisti 1 x Arista 7280 2 x Dell Z9100 1 x Arista 7060 1 x Mellanox SN2700 1 x Cisco NCS 1002 OpenFlow Dell Z9100 switch PhEDEx Exao Servers PhEDEx / Exao Server HGST Storage Node Dell R Gbps (Caltech 2437) HGST Storage PhEDEx / Exao Server

10 4 x SM blades for MAPLE/FAST IDE
Rack Management Server Cisco NCS 1002 (5 x 100G links) 1 x Dell Z9100 1 x Inventec 2 x Pica8 3920 2 x Dell s4810 SM GPU Server (GTX1080) Dell R Gbps (Caltech 2537)

11 Actual SDN Topology at the SC16 Show Floor
Intelligent traffic Engineering Host / Node discovery Live OpenFlow Statistics Can Provision end-to-end Paths: Layer2 / Layer2 Edge Provider port to LAN Edge Port to Edge Port (tunnel) Local flows in a node

12 1Tbps Booth to Booth Network Transfer

13 Network Architecture

14 Expansion Board X9DRG-O-PCI-E (full x16 version)
SuperMicro Server Design SYS-4028GR-TR2 Expansion Board X9DRG-O-PCI-E (full x16 version)

15 SuperMicro - SYS-4028GR-TR(T2)
From each CPU, one PCIe x16 bus splits among 5 x16 slots. One x16 slot directly with processor. 10 CX-4 NICs per server. CPU1 CPU0

16 Results, consistent 800~900Gbps

17 History of Caltech SDN (OSCARS/NSI OpenFlow)

18 SDN Demonstration at FTW Workshop
SDN Demonstration at FTW Workshop. Partners: Caltech, UMich, Amlight/FIU, Internet2, ESnet, ANSP+RNP Dynamic Path creation: Caltech – UMich Caltech – RNP Umich – AmLight Path initiation by the FDT Agent using OSCARS API Calls OESS for OpenFlow data plane provisioning over Internet2/AL2S MonALISA agents at the end-sites provide detailed monitoring information Caltech UMich AmLight 5 Dynamic SDN Paths SPRACE RNP & NSP (Sao Paulo)

19 FTW Demo: SDN-Driven Multipath Circuits OpenDaylight/OpenFlow Controller Int’l Demo
A. Mughal, J. Bezerra Hardened OESS and OSCARS installations at Caltech, Umich, AmLight Updated Dell switch firmware to operate stably with OpenFlow Dynamic circuit paths under SDN control Prelude to the ANSE architecture: SDN load-balanced, moderated flows Caltech, Michigan, FIU, Rio and Sao Paulo, with Network Partners: Internet2, CENIC, Merit, FLR, AmLight, RNP and ANSP in Brazil

20 OpenDaylight & Caltech SDN Initiatives
Supporting: Northbound and South bound interfaces Starting with Lithium, Intelligent services likes ALTO, SPCE OVSDB for OpenVSwitch Configuration, including the northbound interface NetIDE: Rapid application development platform for OpenDaylight and also to re-use modules written for other projects to OpenDaylight OFNG – ODL: NB libraries for Helium/Lithium OFNG - ALTO High Level application integration OLiMPs – ODL: Migrated to Hydrogen Boron Beryllium (2016/2) Lithium (2015/6) Helium (2014/9) OLiMPs – FloodLight: Link-layer MultiPath Switching Hydrogen (2014/2) Start (2013)

21 System Design Considerations for 200GE / 400GE and beyond … 1Tbps

22 100GE Switches (compact form factor)
32 x 100GE Ports All ports are configurable 10 / 25 / 40 / 50 / 100GE Arista, Dell, and Inventec are based on Broadcom Tomahawk (TH) chip, while Mellanox is using their own spectrum chipset TH: Common 16MB packet buffer memory among 4 quadrants 3.2 Tbps Full Duplex switching capacity support ONIE Boot loader Dell /Arista supports two additional 10GE ports OpenFlow 1.3+, with multi-tenant support

23 100G Switches, Optics, Cables (confusions …)
100GE Optics (QSFP28 iLR4/LR4) Reliable vendors: InnoLight, Source Photonics Not supported by Mellanox NICs Supported by QLogic (using special knobs) Link Auto Negotiation QLogic NIC (B0) Mellanox does support Forward Error Correction (FEC) (RS, Firecode) Mellanox does support FEC

24 2CRSI Server with 24 NVMe drives
Max throughput reached at 14 drives (7 drives per processor) A limitation due to combination of single PCIe x16 bus (128Gbps), processor utilization and application overheads.

25 Beyond 100GE -> 200/400GE, Component readiness ?
Server Readiness: 1) Current PCIe Bus limitations - PCIe Gen 3.0 (x16 can reach 128Gbs Full Duplex) - PCIe Gen 4.0 (x16 can reach double the capacity, i.e. 256Gbps - PCIe Gen 4.0 (x32 can reach double the capacity, i.e. 512Gbps 2) Increased number of PCIe lanes within processor Haswell/Broadwell (2015/2016) - PCIe lanes per processor = 40 - Supports PCIe Gen 3.0 (8GT/sec) - Up to DDR4 2400MHz memory Skylake (2017) - PCIe lanes per processor = 48 - Supports PCIe Gen 4.0 (16GT/sec) 3) Faster core rates, or Over clocking (what’s best for production systems) 4) Increased memory controllers at higher clock rate reaching 3000MHz 5) TCP / UDP / RDMA over Ethernet

26 400GE Network Testing Transmission across 4 Mellnox VPI NICs. Only 4 CPU cores are used out of 24 cores. 389Gbps

27 Collaboration Partners
Special thanks to … Research Partners Univ of Michigan UCSD iCAIR / StarLight Stanford Venderbilt UNESP / ANSP RNP Internet2 ESnet CENIC FLR / FIU PacWave Industry Partners Brocade (OpenFlow capable Switches) Dell (OpenFlow capable Switches) Dell (Server systems) Echostreams (Server systems) Intel (NVME SSD Drives) Mellanox (NICs and Cables) Spirent (100GE Tester) 2CRSI (NVME Storage) HGST Storage (NVME Storage) LIQID 27

28 Plans for SC17 (Denver, Nov 2017)
East West integration with other controllers along with state, recovery, provisioning, monitoring (think OSCARS/NSI …) More …… leaving for the group discussion

29 For more details, please visit
Thank you ! Questions ? For more details, please visit


Download ppt "Azher Mughal Caltech (12/8/2016)"

Similar presentations


Ads by Google