XFabric: a Reconfigurable In-Rack Network for Rack-Scale Computers Sergey Legtchenko, Nicholas Chen, Daniel Cletheroe, Antony Rowstron, Hugh Williams,

Slides:



Advertisements
Similar presentations
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
60 GHz Flyways: Adding multi-Gbps wireless links to data centers
Reconfigurable Network Topologies at Rack Scale
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Chuanxiong Guo, Haitao Wu, Kun Tan,
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Center Traffic Management COS 597E: Software Defined Networking.
Storage area network and System area network (SAN)
Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.
Practical TDMA for Datacenter Ethernet
IETF 90: VNF PERFORMANCE BENCHMARKING METHODOLOGY Contributors: Sarah Muhammad Durrani: Mike Chen:
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:
The NE010 iWARP Adapter Gary Montry Senior Scientist
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
A.SATHEESH Department of Software Engineering Periyar Maniammai University Tamil Nadu.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Software Defined Networks for Dynamic Datacenter and Cloud Environments.
Dual Centric Data Center Network Architectures DAWEI LI, JIE WU (TEMPLE UNIVERSITY) ZHIYONG LIU, AND FA ZHANG (CHINESE ACADEMY OF SCIENCES) ICPP 2015.
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
Subways: A Case for Redundant, Inexpensive Data Center Edge Links Vincent Liu, Danyang Zhuo, Simon Peter, Arvind Krishnamurthy, Thomas Anderson University.
Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan.
C-Through: Part-time Optics in Data centers Aditi Bose, Sarah Alsulaiman.
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
SketchVisor: Robust Network Measurement for Software Packet Processing
VL2: A Scalable and Flexible Data Center Network
Data Center Architectures
Energy Aware Network Operations
Ready-to-Deploy Service Function Chaining for Mobile Networks
Yiting Xia, T. S. Eugene Ng Rice University
Xin Li, Chen Qian University of Kentucky
Problem: Internet diagnostics and forensics
Data Center Routing Challenges - LinkedIn
CIS 700-5: The Design and Implementation of Cloud Networks
Data Center Network Topologies II
Modeling and Evaluation of Fibre Channel Storage Area Networks
Architecture and Algorithms for an IEEE 802
Flamingo: Enabling Evolvable HDD-based Near-Line Storage
Interconnect Networks
Pelican: A building block for exascale cold data storage
Alternative Switching Technologies: Wireless Datacenters
Sebastian Solbach Consulting Member of Technical Staff
Chapter 4: Routing Concepts
ElasticTree Michael Fruchtman.
Low Latency Analytics HPC Clusters
The University of Adelaide, School of Computer Science
Cloud Computing Data Centers
Chuanxiong Guo, Haitao Wu, Kun Tan,
Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+,
NTHU CS5421 Cloud Computing
Storage area network and System area network (SAN)
Internet and Web Simple client-server model
Specialized Cloud Architectures
Cloud Computing Data Centers
RDMA over Commodity Ethernet at Scale
Data Center Architectures
Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
CMPE 252A : Computer Networks
In-network computation
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Towards Predictable Datacenter Networks
NetWarden: Mitigating Network Covert Channels without Performance Loss
Intelligent Network Services through Active Flow Manipulation
Data Center Traffic Engineering
Presentation transcript:

XFabric: a Reconfigurable In-Rack Network for Rack-Scale Computers Sergey Legtchenko, Nicholas Chen, Daniel Cletheroe, Antony Rowstron, Hugh Williams, Xiaohan Zhao

Increasing Performance per $ in Data Centers Hardware designed for data centers Racks as units of deployment & operation Google Jupiter (data center fabric) Pelican Cold Storage SoC CPU NIC/Packet switch d ports Controllers: IO, memory... Systems on Chip (SoCs) Rack scale computer e.g. Boston Viridis Server = Calxeda SoC 900 (wimpy) CPUs 80 servers 160 CPUs Open CloudServer (OCS) rack Standard rack 40 servers 80 CPUs In-rack consolidation $$$/server$/server

In-rack Networks for Rack-Scale Computers Challenge: reducing in-rack network cost Full bisection bandwidth: 9 Tbps Cost: $$$$$ 900 ports Multi-tiered? High power draw/cost >900 ports ToR switch? Direct connect topology (e.g. mesh) SoCs with packet switches Low cost Oversubscribed d ports/SoC Rack scale computer e.g. Boston Viridis Server = Calxeda SoC 900 (wimpy) CPUs

Oversubscription in Direct-Connect Topologies SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch Multi-hop routing Path length impacts performance – Higher, less predictable latency – Lower goodput … … … … A->D Example: 3D Torus with 512 SoCs – Average hop count = 6 – 6x oversubscription Path length is low if the topology is adapted to traffic

XFabric: a Reconfigurable Topology SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Adapting topology to traffic – Lower path length – Reduced oversubscription

XFabric: a Reconfigurable Topology SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Adapting topology to traffic – Lower path length – Reduced oversubscription Circuit switched fabric – Electrical signal forwarding – No queuing, no packet inspection Physical circuit

XFabric: a Reconfigurable Topology SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Adapting topology to traffic – Lower path length – Reduced oversubscription Circuit switched fabric – Electrical signal forwarding – No queuing, no packet inspection A->D

XFabric: a Reconfigurable Topology SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Adapting topology to traffic – Lower path length – Reduced oversubscription Circuit switched fabric – Electrical signal forwarding – No queuing, no packet inspection A->D

XFabric Architecture SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Uplinks Controller (process on one SoC in the rack) Generate topology Minimize path length Configure data plane Assign circuits Update SoC routing Estimate demand Control plane Periodic topology reconfiguration Dynamic uplink placement Data center aggregation switch

Circuit-Switching Fabric Cost SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Challenge: high port count – Too high port count for one ASIC Commodity ASICs – Gbps – Max size: ~350 ports Cost : $3/port e.g. 300 SoCs, 6 ports/SoC: 1,800 ports

Circuit-Switching Fabric Cost SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Challenge: high port count – Too high port count for one ASIC – Folded Clos total cost: $27K Commodity ASICs – Gbps – Max size: ~350 ports Cost : $3/port e.g. 300 SoCs, 6 ports/SoC: 1,800 ports x port ASICs

Reducing Circuit-Switching Fabric Cost SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Trading off reconfigurability for cost Full reconfigurability: – Any 2 ports can be connected

Reducing Circuit-Switching Fabric Cost SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Trading off reconfigurability for cost Full reconfigurability: – Any 2 ports can be connected Partial reconfigurability: – Port can connected to subset of ports

Reducing Circuit-Switching Fabric Cost SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack Connected to port 0 on all SoCs … Connected to port N on all SoCs Trading off reconfigurability for cost Full reconfigurability: – Any 2 ports can be connected Partial reconfigurability: – Port can connected to subset of ports

Reducing Circuit-Switching Fabric Cost SoC A CPU Packet switch SoC C CPU Packet switch SoC B CPU Packet switch SoC D CPU Packet switch … … … … Rack … Trading off reconfigurability for cost Full reconfigurability: – Any 2 ports can be connected Partial reconfigurability: – Port can connected to subset of ports x5 Lower cost compared to full reconfigurability: $5.4K x6 300-port ASICs

XFabric Performance at Rack Scale Flow-based simulation, 343 SoCs, 6 ports/SoC

XFabric Performance at Rack Scale Flow-based simulation, 343 SoCs, 6 ports/SoC Varying traffic skew Skewed Uniform (7x7x7)

XFabric Performance at Rack Scale Flow-based simulation, 343 SoCs, 6 ports/SoC Varying traffic skew Production cluster workload – Traffic matrix from TCP flow trace Lower is better Path length (#hops) Skewed Uniform (7x7x7)

XFabric Prototype Performance XFabric prototype – SoC emulated by server Gen2: Gbps Server Software Packet Switch Filter driver Application 6 NICs Gen1: 32 1Gbps – 27 servers – Unmodified TCP/IP applications – 6 circuit switches, custom PCB design

XFabric Prototype Performance XFabric prototype – SoC emulated by server – 27 servers – Unmodified TCP/IP applications – 6 circuit switches, custom PCB design Gen1: 32 1GbpsGen2: Gbps Server Software Packet Switch Filter driver Application 6 NICs 23% improvement 3DTorus Completion time (normalized to 3DTorus) Reconfiguration period (sec)

Conclusion Rack-Scale Computers – Higher performance per $ – Up to hundreds of SoCs/rack XFabric: in-rack network with reconfigurable topology – Dynamic adaptation to traffic demand – Low cost Deploying new circuit switch hardware – Electrical circuit switching, Gbps